Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deriving importance factors on a per-week basis #16

Closed
mourner opened this issue Aug 22, 2016 · 3 comments
Closed

Deriving importance factors on a per-week basis #16

mourner opened this issue Aug 22, 2016 · 3 comments

Comments

@mourner
Copy link

mourner commented Aug 22, 2016

Daily weight changes are considered a very unreliable metric, prone to unpredictable fluctuations. Additionally, some factors may not be reflected immediately within 24h, but have a more long-term effect.

It would be great to take the same data and aggregate both factors and weight delta on a per-week basis, and then see whether the resulting factors are different from the existing daily results. Using the data you already have, it may give some new insights.

cc @arielf

@arielf
Copy link
Owner

arielf commented Aug 23, 2016

Thanks, yes this is a good idea. It should increase accuracy.

Pretty easy to implement: in the pre-processing stage we can generate all deltas up to N days and include all the input features spanning N-days.

However, I think there should be some decay applied to the weight of older features, because the further you go in time, once effect peaks, the less relevant inputs should become (just a hunch).

Also: I think the accuracy can benefit from some random shuffling of sample-orders and stacking them. In online learning, early examples have an advantage because learning rate decays with time. Currently I sort by abs(delta) which makes the big delta examples more important.

I'll try to tackle this when I get some more free time.

@oskarizu
Copy link

oskarizu commented Sep 7, 2016

; porque solo propio pollino ooo looo oooooooiio oi ooooooiio lo ido;

@arielf
Copy link
Owner

arielf commented Oct 10, 2016

Added support for any number of days history. This increases the number of data-points to train on, and hopefully reduces variance and random daily-noise.

Currently the default history is set to 3 days (NDAYS variable in Makefile) meaning for every day user-entered data, 3 data-points will be created for training: last-day, last-two-days, and last-3-days. Any "last N days" period includes net weight change in the last N days overall, as well as the factors for all N days combined (day1, day2, ..., dayN). I felt more than 3 days is excessive since we also run over all overlaps using a N-day sliding window.

I also restored the --bootstrap N parameter to vowpal-wabbit since I found it helpful for decreasing variance. If your vw doesn't support --bootstrap N you may either:

  • Upgrade vw to a more recent version OR
  • Remove the --bootstrap N from VW_ARGS in the Makefile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants