New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TimeSeriesRegression problem type #1386
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1386 +/- ##
=========================================
+ Coverage 100.0% 100.0% +0.1%
=========================================
Files 213 213
Lines 13940 13946 +6
=========================================
+ Hits 13933 13939 +6
Misses 7 7
Continue to review full report at Codecov.
|
06caaa9
to
5b586b0
Compare
@@ -16,7 +16,7 @@ def correct_problem_types(): | |||
|
|||
|
|||
def test_handle_string(correct_problem_types): | |||
problem_types = ['regression', ProblemTypes.MULTICLASS, 'binary'] | |||
problem_types = ['regression', ProblemTypes.MULTICLASS, 'binary', ProblemTypes.TIME_SERIES_REGRESSION] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how did you add ProblemTypes.TIME_SERIES_REGRESSION
here without adding anything to correct_problem_types
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I think we actually need to add to correct_problem_types
for this test to be fully updated. Reason why it passes right now is because zip(problem_types, correct_problem_types)
will only iterate when both lists still have items, so it only goes through correct_problem_types
(3 elements)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch guys! I'll update this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just a few small comments about updating the test completely and the docstrings for each of the objective classes, but otherwise looks good
* Added a problem type for time series regression :pr:`1386` | ||
* Added a ``is_defined_for_problem_type`` method to ``ObjectiveBase`` :pr:`1386` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise ValueError("Additional objective {} is not compatible with a {} problem.".format(obj.name, self.problem_type.value)) | ||
|
||
for pipeline in self.allowed_pipelines or []: | ||
if not pipeline.problem_type == self.problem_type: | ||
if pipeline.problem_type != self.problem_type: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -16,7 +16,7 @@ def correct_problem_types(): | |||
|
|||
|
|||
def test_handle_string(correct_problem_types): | |||
problem_types = ['regression', ProblemTypes.MULTICLASS, 'binary'] | |||
problem_types = ['regression', ProblemTypes.MULTICLASS, 'binary', ProblemTypes.TIME_SERIES_REGRESSION] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I think we actually need to add to correct_problem_types
for this test to be fully updated. Reason why it passes right now is because zip(problem_types, correct_problem_types)
will only iterate when both lists still have items, so it only goes through correct_problem_types
(3 elements)
@@ -9,5 +9,5 @@ class RegressionObjective(ObjectiveBase): | |||
problem_type (ProblemTypes): Type of problem this objective is. Set to ProblemTypes.REGRESSION. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should update the docstring for each of the objective classes!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point!
@@ -9,5 +9,5 @@ class RegressionObjective(ObjectiveBase): | |||
problem_type (ProblemTypes): Type of problem this objective is. Set to ProblemTypes.REGRESSION. | |||
""" | |||
|
|||
problem_type = ProblemTypes.REGRESSION | |||
problem_types = [ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my own curiosity: Are all regression objectives time-series applicable? Is there ever the case where an objective should be ProblemTypes.TIME_SERIES_REGRESSION
and not ProblemTypes.REGRESSION
? :o
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say all of our current regression objectives would also work for a time series problem. But in the future, we will add objectives that are time series specific, e.g:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the right call.
All our current regression objectives are valid for timeseries regression. So allowing REGRESSION
and TIME_SERIES_REGRESSION
is great.
When we add the first time-series-only objective, we can override problem_types
in each of those impls to only allow TIME_SERIES_REGRESSION
. We could also then choose to define TimeseriesRegressionObjective
to facilitate this, but probably not necessary.
We can follow the same pattern for binary/multiclass.
2a32c8c
to
7921feb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can_optimize_threshold (bool): Determines if threshold used by objective can be optimized or not. | ||
""" | ||
|
||
problem_type = ProblemTypes.BINARY | ||
problem_types = [ProblemTypes.BINARY] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -9,5 +9,5 @@ class RegressionObjective(ObjectiveBase): | |||
problem_type (ProblemTypes): Type of problem this objective is. Set to ProblemTypes.REGRESSION. | |||
""" | |||
|
|||
problem_type = ProblemTypes.REGRESSION | |||
problem_types = [ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the right call.
All our current regression objectives are valid for timeseries regression. So allowing REGRESSION
and TIME_SERIES_REGRESSION
is great.
When we add the first time-series-only objective, we can override problem_types
in each of those impls to only allow TIME_SERIES_REGRESSION
. We could also then choose to define TimeseriesRegressionObjective
to facilitate this, but probably not necessary.
We can follow the same pattern for binary/multiclass.
@@ -11,11 +11,14 @@ class ProblemTypes(Enum): | |||
"""Multiclass classification problem.""" | |||
REGRESSION = 'regression' | |||
"""Regression problem.""" | |||
TIME_SERIES_REGRESSION = 'time_series_regression' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nit-pick, but: in our docs examples we typically type out problem_type='multiclass'
etc. Is it easier for people to type/remember problem_type='time series regression'
or problem_type='time_series_regression'
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Avoiding the underscores seems easier to type so I changed it to that hehe.
if isinstance(objective, RegressionObjective): | ||
objective_type = ProblemTypes.REGRESSION | ||
elif isinstance(objective, MulticlassClassificationObjective): | ||
objective_type = ProblemTypes.MULTICLASS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine for now. But once we add objectives which aren't specific to one problem type (like timeseries), we'll have to remember to change this. Is there anything we can do to future-proof this test?
One option: add problem_type
to the parametrize
. Then add
if not objective.is_defined_for_problem_type(problem_type):
pytest.skip()
(syntax prob wrong)
No prob if you don't get to this. I hope that in the wild, we'll always know the problem_type
when we're working with objectives, so this doesn't worry me outside of our tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made this change! Good point - it does make the test simpler! And when we add time series to automl, we just need to update the list of values that's parametrized so that's great.
@@ -16,7 +16,7 @@ def correct_problem_types(): | |||
|
|||
|
|||
def test_handle_string(correct_problem_types): | |||
problem_types = ['regression', ProblemTypes.MULTICLASS, 'binary'] | |||
problem_types = ['regression', ProblemTypes.MULTICLASS, 'binary', ProblemTypes.TIME_SERIES_REGRESSION] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…m types to support multiple problem types.
…d of problem_type. Updating problem types pytest fixture.
7921feb
to
f4ba7d8
Compare
Pull Request Description
Fix #1378 .
After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of
docs/source/release_notes.rst
to include this pull request by adding :pr:123
.