Loss functions

MLWave edited this page Jan 23, 2017 · 37 revisions

Given a prediction (p) and a label (y), a loss function loss function measures the discrepancy between the algorithm's prediction and the desired output. VW currently supports the following loss functions, with squared loss being the default:

Loss Function Minimizer Example usage
Squared squared loss function Expectation (mean) Regression
Expected return on stock
Quantile \ell(p,y)=\tau(y-p)\mathbb{I}(y \ge p) +(1-\tau)(p-y)\mathbb{I}(y \leq p) Median Regression
What is a typical price for a house?
Logistic logistic loss function Probability Classification
Probability of click on ad
Hinge hinge loss function 0-1 approximation Classification
Is the digit a 7?
Classic Squared loss without
importance weight aware updates
Expectation (mean) Regression
squared loss often performs better than classic.

To select a loss function in VW see the Command line arguments guide. The Logistic and Hinge loss are for binary classification only, and thus all samples must have class "-1" or "1". More information on loss function semantics in these slides (pdf) from an online learning course.

The Python wrapper overrides the default squared loss with logistic loss when using VWClassifier.

Which loss function should I use?

  • If the problem is a binary classification (i.e. labels are -1 and +1) your choices should be Logistic or Hinge loss (although Squared loss may work as well). If you want VW to report the 0-1 loss instead of the logistic/hinge loss, add --binary. Example: spam vs non-spam, odds of click vs no-click.
  • For binary classification where you need to know the posterior probabilities, use --loss_function logistic --link logistic.
  • If the problem is a regression problem, meaning the target label you're trying to predict is a real value -- you should be using Squared or Quantile loss. Example: revenue, height, weight. If you're trying to minimize the mean error, use squared-loss. See: http://en.wikipedia.org/wiki/Least_squares . If OTOH you're trying to predict rank/order and you don't mind the mean error to increase as long as you get the relative order correct, you need to minimize the error vs the median (or any other quantile), in this case, you should use quantile-loss. See: http://en.wikipedia.org/wiki/Quantile_regression