-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test model on all labels from the future #378
Comments
This is supported if you use the correct temporal config , isn't ? (but it is a pain) |
I propose an alternative solution:
|
I thinnnnk the experiment implementation of # 1 is fairly trivial. It should just be a timechop change, as timechop is already written to pass lists of test matrices out to the other components (it just only puts one item in the list), so if the other components are anticipating lists with loops and not just grabbing the first element, the tests would likely be hardest part. |
This commit addresses #663, #378, #223 by allowing a model to be evaluated multiple times and thereby allowing users to see whether performance of single trained model degrades over the time following training. Users must now set a timechop parameter, `test_evaluation_frequency` that will add multiple test matrices to a time split. A model will be tested once on each matrix in its list. Matrices are added until they reach the label time limit, testing all models on the final test period (assuming that you make model_update_frequency evenly dividable by test_evaluation_frequency). This initial commit only makes changes to timechop proper. Remaining work includes: - Write tests for the new behavior - Make timechop plotting work with new behavior New issues that I do not plan to address in the forthcoming PR: - Incorporate multiple evaluation times into audition and/or postmodeling - Maybe users should be able to set a maximum evaluation horizon so that early models are not tested for, say, 100 time periods - Evaluation time-splitting could (or should) eventually not be done with pre-made matrices but on-the-fly atevaluation time
Triage should have an option to assess the performance of a model on any future test set, i.e. any test set that begins after the train set ends. For an annual-prediction example, a model trained on data ending December 31, 2009, should be tested on all test sets that begin January 1, 2010; January 2, 2011, January 3, 2012; and so on.
This would help us understand how often the model should be retrained and what the partner loses as models get older. It might also help identify problems or interesting patterns, where model performance degrades after a while but increases again.
The user should be able to set a parameter (what to call it, "max time difference"?) that would only test the model on labels within x time of the end of the train data. In the above example, the user might limit testing a model trained on 2009 data to test sets from 2010, 2011, and 2012 and not 2013 on.
The text was updated successfully, but these errors were encountered: