Weight vector

Ariel Faigon edited this page Jul 22, 2014 · 5 revisions

VW's weight vector has (2^b) float (4-byte) weights (where (b) is specified by the -b option) and each example's features are hash to an index in ([0,2^b-1]). The weight vector is also used to store other vectors needed by more sophisticated learning algorithms, such as the conjugate gradient method (--conjugate_gradient), or adaptive gradient descent (--adaptive --invariant, and --normalized). In these more sophisticated cases, some small integer multiplier will be used on the size of the weight vector so there's enough room to store all these auxilliary weights side-by-side, in the same 'hash-bucket'.

In other words: when more than one vector is stored in the same global (2^b) space, every hash-value slot will store multiple "weights". The size (number of floats) in the hash-bucket is called the stride in the vw source.

VW uses -b 18 by default. 2^18 is 262144 meaning if you have much less than 262144 distinct features in your training set you should be relatively safe from hash-collisions. If you auto-generate many new features of the fly, like when you use -q (quadratic), -c (cubic), or --nn, you may want to increase the default by requesting a bigger -b value to avoid hash collisions.

By default, vw uses -b 18 and normalized/adaptive/invariant SGD. So the overall size allocated for the weight vector is 2^18 * weights_per_stride * (sizeof float) = (262144 * 3 * 4) = 3,145,728 bytes. A bit over 3MB.