Skip to content


John edited this page Jul 8, 2014 · 2 revisions


Start vw in active learning mode and listening on a port together with all the other options you need

vw --active --port 6075

Invoking talks to vw in plain text via a TCP socket. It needs to know where vw is listening on and a set of unlabeled examples. These should be in text VW format except that the label should be missing. Suppose vw is running on the same machine, invoked as above, and the unlabeled data is in file unlabeled.dat. Then

python localhost 6075 unlabeled.dat

will connect to vw and start sending unlabeled examples in order. For each example that vw wants labeled, the user can specify the label (0,1) or 'skip' in which case no label is provided and it's (almost) as if the unlabeled dataset did not contain the example.


The following options are currently supported by

  • -s labeled_set. Uses the labeled examples in labeled_set to seed vw's model prior to sending any unlabeled examples.
  • -v. Verbose mode. Prints the unlabeled example before asking for the label (otherwise only a line number and a tag are used). This makes sense for text classification with data in close to raw form.
  • -m. Interprets a 0 as -1
  • -o output. Writes the user provided labels (and corresponding examples) to a (line buffered) file. The absence of full buffering allows one to hit Ctrl-C anytime and still have all their labels saved in the output file.


Choice of hyperparameters is tricky. If you have a labeled subsample, use it together with --active --simulation to discover a good value for --mellowness and the learning rate.

Clone this wiki locally
You can’t perform that action at this time.