New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Delay trimming in ForecastingGridSearchCV until after transforming #3132
Conversation
As a note -tests are currently mostly passing, (minus weird CI/CD bug error) but I only because I have commented out one of the tests in evaluate though. I did this because not trimming X was causing some issues in ARIMA. Personally I think it makes more sense to test evaluate on a grid search object than ARIMA, (as my understanding is that is what evaluate was designed to test) so my plan is to restructure that test with something other than ARIMA. (Also need to update the docs to make sure that its clear we are no longer trimming the same way as before) |
@miraep8, disagreed, There is nothing specific about grid search in the context of I think your understanding of what It does evaluation of an estimator, not tuning. Tuning and evaluation are two different things, although they are frequently confused. |
Yep, I do think I thought it was a tuning specific thing, thanks for catching that! :) I will put forward another solution that doesn't break |
checks seem to pass - is this ready? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, this might solve the problem, but it introduces other problems in its stead:
- the grid search gets a problematic
trim_X
argument, which changes nothing about the "formal" behaviour of the algorithm evaluate
gets the same argument, it does not seem to add to the specification semantics
Why I think this is a problem: think of the user journey. There does not seem to be a clear use of that parameter, as in expressing what the user wants. Instead, it is a fiddle/hack/magic argument, that the user needs to "know" how to set.
Also, it seems to introduce behavioural coupling between the grid search estimators and the evaluation function, not a good idea as it dilutes separated interfaces.
I would suggest instead: think along the lines of "all estimators do XYZ", don't introduce parameters that break the strategy pattern, or that are not directly useful as a clear element of semantic specification. |
I agree @fkiraly ! Thanks for looking this over + feedback |
Hm, looks simple enough, and localized. Could we add a test that fails currently, but would be made to succeed after this change, i.e., a representative for the issue fixed? |
Hmm, the test that you introduced does not break on |
... does it break now? |
Had been trying some experimental work to delay trimming for both X and y, but now planning to switch into separate PRs. |
Let us know when you feel this is ready for review. |
I would appreciate a look at it now! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
In the end, not too much of a change, but of course the difficulty lies in the "where".
@anthonygiorgio97 noticed + helpfully pointed out a strange bug that came up when using Differencer within TransformedTargetForecaster in ForecastingGridSearchCV. (see #2807 and #2880)
From what I could tell this bug was caused by the fact that Differencer reduces the size of X. Usually this isn't a problem, but because ForecastingGridSearchCV was pre-trimming all X input before passing it into the underlying forecaster, X ended up being too small by the time it got to the final forecaster and was causing problems.
My proposed solution is thus simply - avoid pre-trimming X in the _split function and instead trim it to the right size right before the actual forecasting step. (I do this specifically in _predict_last_window, but there is a decent chance it might need to be changed elsewhere as well).