Clone this wiki locally
VW's weight vector has (2^b)
float (4-byte) weights (where (b) is specified by the
-b option) and each example's features are hash to an index in ([0,2^b-1]). The weight vector is also used to store other vectors needed by more sophisticated learning algorithms, such as the conjugate gradient method (
--conjugate_gradient), or adaptive gradient descent (
--normalized). In these more sophisticated cases, some small integer multiplier will be used on the size of the weight vector so there's enough room to store all these auxilliary weights side-by-side, in the same 'hash-bucket'.
In other words: when more than one vector is stored in the same global (2^b) space, every hash-value slot will store multiple "weights". The size (number of floats) in the hash-bucket is called the
stride in the
-b 18 by default. 2^18 is 262144 meaning if you have much less than 262144 distinct features in your training set you should be relatively safe from hash-collisions. If you auto-generate many new features of the fly, like when you use
-c (cubic), or
--nn, you may want to increase the default by requesting a bigger -b value to avoid hash collisions.
-b 18 and normalized/adaptive/invariant SGD. So the overall size allocated for the weight vector is 2^18 * weights_per_stride * (sizeof float) = (262144 * 3 * 4) = 3,145,728 bytes. A bit over 3MB.