Deriving importance factors on a per-week basis #16

mourner · 2016-08-22T10:54:31Z

Daily weight changes are considered a very unreliable metric, prone to unpredictable fluctuations. Additionally, some factors may not be reflected immediately within 24h, but have a more long-term effect.

It would be great to take the same data and aggregate both factors and weight delta on a per-week basis, and then see whether the resulting factors are different from the existing daily results. Using the data you already have, it may give some new insights.

cc @arielf

arielf · 2016-08-23T04:51:55Z

Thanks, yes this is a good idea. It should increase accuracy.

Pretty easy to implement: in the pre-processing stage we can generate all deltas up to N days and include all the input features spanning N-days.

However, I think there should be some decay applied to the weight of older features, because the further you go in time, once effect peaks, the less relevant inputs should become (just a hunch).

Also: I think the accuracy can benefit from some random shuffling of sample-orders and stacking them. In online learning, early examples have an advantage because learning rate decays with time. Currently I sort by abs(delta) which makes the big delta examples more important.

I'll try to tackle this when I get some more free time.

oskarizu · 2016-09-07T18:39:00Z

; porque solo propio pollino ooo looo oooooooiio oi ooooooiio lo ido;

arielf · 2016-10-10T05:44:28Z

Added support for any number of days history. This increases the number of data-points to train on, and hopefully reduces variance and random daily-noise.

Currently the default history is set to 3 days (NDAYS variable in Makefile) meaning for every day user-entered data, 3 data-points will be created for training: last-day, last-two-days, and last-3-days. Any "last N days" period includes net weight change in the last N days overall, as well as the factors for all N days combined (day1, day2, ..., dayN). I felt more than 3 days is excessive since we also run over all overlaps using a N-day sliding window.

I also restored the --bootstrap N parameter to vowpal-wabbit since I found it helpful for decreasing variance. If your vw doesn't support --bootstrap N you may either:

Upgrade vw to a more recent version OR
Remove the --bootstrap N from VW_ARGS in the Makefile

arielf closed this as completed Oct 10, 2016

arielf added the enhancement label Oct 14, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deriving importance factors on a per-week basis #16

Deriving importance factors on a per-week basis #16

mourner commented Aug 22, 2016 •

edited

Loading

arielf commented Aug 23, 2016

oskarizu commented Sep 7, 2016

arielf commented Oct 10, 2016

Deriving importance factors on a per-week basis #16

Deriving importance factors on a per-week basis #16

Comments

mourner commented Aug 22, 2016 • edited Loading

arielf commented Aug 23, 2016

oskarizu commented Sep 7, 2016

arielf commented Oct 10, 2016

mourner commented Aug 22, 2016 •

edited

Loading