normalized online learning demo

These demos show the ability of the normalized learning rule to adapt to varying feature scales, relative to an unnormalized learning rule. For more details on the normalized learning rule see the paper.

Instructions

make all.results: eventually produces a nice table outlining performance of normalized adaptive gradient (NAG) vs. unnormalized adaptive gradient (AG) for a variety of data sets.
- WARNING: Please be aware that these demos can be network I/O, disk space, and/or CPU intensive.
- The complete set of demos can take hours to compute the first time, although the results are cached for subsequent reproduction.
- You will see lower progressive loss (regret) for NAG than AG.
- You will also see that the optimal learning rate eta^* varies less across datasets for NAG than AG.

Details

This is organized into individual demos which process a single dataset. The different individual demos are associated with unique make targets.

An individual demo will download a data set and learn a predictor under two conditions: using the normalized learning rule aka NAG (this the vw default), and using adaptive gradient without normalization aka AG (invoked via vw arguments --adaptive --invariant). For both conditions it will do a hyper-parameter sweep to find the optimal in-hindsight learning rate eta^*. Note the context here is online learning, so there is no train/test split; rather what is optimized is progressive loss over the input data.

About the datasets

Note when data is pre-normalized or otherwise does not exhibit varying scales, the normalized learning rule has essentially no effect. Therefore the data sets used in this demo have been selected because they exhibit varying scales. The data set covertype exemplifies how this arises in practice, as it consists of multiple physical measurements with different units.

Just doing one dataset

There are individual makefile targets that will just download and compare one dataset. You can compute a subset using a combination of make targets, e.g., make only.{covertype,shuttle}.

Invocation	Dataset	Time	Disk	Network
`make only.bank`	bank	2 minutes	1Mb	1Mb
`make only.census`	census	10 minutes	13Mb	7Mb
`make only.covertype`	covertype	12 minutes	24Mb	11Mb
`make only.CTslice`	CT Slice	30 minutes	40Mb	18Mb
`make only.MSD`	MSD	30 minutes	500Mb	256Mb
`make only.shuttle`	Shuttle	1 minute	600Kb	300Kb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

normalized online learning demo

Instructions

Details

About the datasets

Just doing one dataset

Files

README.md

Latest commit

History

README.md

File metadata and controls

normalized online learning demo

Instructions

Details

About the datasets

Just doing one dataset