Skip to content
Jack Gerrits edited this page Jun 26, 2020 · 9 revisions

VW's weight vector has formula=2^b float (4-byte) weights (where formula=b is specified by the -b option) and each example's features are hashed to an index in formula=[0,2^b-1] . The weight vector is also used to store other vectors needed by more sophisticated learning algorithms, such as the conjugate gradient method (--conjugate_gradient), or adaptive gradient descent (--adaptive --invariant, and --normalized). In these more sophisticated cases, some small integer multiplier will be used on the size of the weight vector so there's enough room to store all these auxilliary weights side-by-side, in the same 'hash-bucket'.

In other words: when more than one vector is stored in the same global formula=2^b space, every hash-value slot will store multiple "weights". The size (number of floats) in the hash-bucket is called the stride in the vw source.

VW uses -b 18 by default. 2^18 is 262144 meaning if you have much less than 262144 distinct features in your training set you should be relatively safe from hash-collisions. If you auto-generate many new features of the fly, like when you use -q (quadratic), -c (cubic), or --nn, you may want to increase the default by requesting a bigger -b value to avoid hash collisions.

By default, vw uses -b 18 and normalized/adaptive/invariant SGD. So the overall size allocated for the weight vector is:

= 2^18 * weights_per_stride * (sizeof float) bytes
= 262144 * 3 * 4 bytes
= 3,145,728 bytes
= A bit over 3MB
Clone this wiki locally