Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TimeSeriesRegression problem type #1386

Merged
merged 7 commits into from
Nov 6, 2020

Conversation

freddyaboulton
Copy link
Contributor

Pull Request Description

Fix #1378 .


After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

@codecov
Copy link

codecov bot commented Oct 30, 2020

Codecov Report

Merging #1386 (f4ba7d8) into main (6a80b40) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@            Coverage Diff            @@
##             main    #1386     +/-   ##
=========================================
+ Coverage   100.0%   100.0%   +0.1%     
=========================================
  Files         213      213             
  Lines       13940    13946      +6     
=========================================
+ Hits        13933    13939      +6     
  Misses          7        7             
Impacted Files Coverage Δ
evalml/automl/automl_search.py 99.7% <100.0%> (ø)
evalml/model_understanding/graphs.py 100.0% <100.0%> (ø)
...alml/objectives/binary_classification_objective.py 100.0% <100.0%> (ø)
.../objectives/multiclass_classification_objective.py 100.0% <100.0%> (ø)
evalml/objectives/objective_base.py 100.0% <100.0%> (ø)
evalml/objectives/regression_objective.py 100.0% <100.0%> (ø)
evalml/objectives/utils.py 100.0% <100.0%> (ø)
evalml/pipelines/binary_classification_pipeline.py 100.0% <100.0%> (ø)
evalml/pipelines/pipeline_base.py 100.0% <100.0%> (ø)
evalml/problem_types/problem_types.py 100.0% <100.0%> (ø)
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6a80b40...f4ba7d8. Read the comment docs.

@@ -16,7 +16,7 @@ def correct_problem_types():


def test_handle_string(correct_problem_types):
problem_types = ['regression', ProblemTypes.MULTICLASS, 'binary']
problem_types = ['regression', ProblemTypes.MULTICLASS, 'binary', ProblemTypes.TIME_SERIES_REGRESSION]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how did you add ProblemTypes.TIME_SERIES_REGRESSION here without adding anything to correct_problem_types?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I think we actually need to add to correct_problem_types for this test to be fully updated. Reason why it passes right now is because zip(problem_types, correct_problem_types) will only iterate when both lists still have items, so it only goes through correct_problem_types (3 elements)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch guys! I'll update this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏 great reviewing @bchen1116 @angela97lin !

Copy link
Contributor

@angela97lin angela97lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just a few small comments about updating the test completely and the docstrings for each of the objective classes, but otherwise looks good 😁

Comment on lines +7 to +10
* Added a problem type for time series regression :pr:`1386`
* Added a ``is_defined_for_problem_type`` method to ``ObjectiveBase`` :pr:`1386`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

raise ValueError("Additional objective {} is not compatible with a {} problem.".format(obj.name, self.problem_type.value))

for pipeline in self.allowed_pipelines or []:
if not pipeline.problem_type == self.problem_type:
if pipeline.problem_type != self.problem_type:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -16,7 +16,7 @@ def correct_problem_types():


def test_handle_string(correct_problem_types):
problem_types = ['regression', ProblemTypes.MULTICLASS, 'binary']
problem_types = ['regression', ProblemTypes.MULTICLASS, 'binary', ProblemTypes.TIME_SERIES_REGRESSION]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I think we actually need to add to correct_problem_types for this test to be fully updated. Reason why it passes right now is because zip(problem_types, correct_problem_types) will only iterate when both lists still have items, so it only goes through correct_problem_types (3 elements)

@@ -9,5 +9,5 @@ class RegressionObjective(ObjectiveBase):
problem_type (ProblemTypes): Type of problem this objective is. Set to ProblemTypes.REGRESSION.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should update the docstring for each of the objective classes!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point!

@@ -9,5 +9,5 @@ class RegressionObjective(ObjectiveBase):
problem_type (ProblemTypes): Type of problem this objective is. Set to ProblemTypes.REGRESSION.
"""

problem_type = ProblemTypes.REGRESSION
problem_types = [ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my own curiosity: Are all regression objectives time-series applicable? Is there ever the case where an objective should be ProblemTypes.TIME_SERIES_REGRESSION and not ProblemTypes.REGRESSION? :o

Copy link
Contributor Author

@freddyaboulton freddyaboulton Nov 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say all of our current regression objectives would also work for a time series problem. But in the future, we will add objectives that are time series specific, e.g:

https://en.wikipedia.org/wiki/Mean_absolute_scaled_error

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the right call.

All our current regression objectives are valid for timeseries regression. So allowing REGRESSION and TIME_SERIES_REGRESSION is great.

When we add the first time-series-only objective, we can override problem_types in each of those impls to only allow TIME_SERIES_REGRESSION. We could also then choose to define TimeseriesRegressionObjective to facilitate this, but probably not necessary.

We can follow the same pattern for binary/multiclass.

@CLAassistant
Copy link

CLAassistant commented Nov 4, 2020

CLA assistant check
All committers have signed the CLA.

@freddyaboulton freddyaboulton force-pushed the 1378-time-series-regression-problem-type branch 2 times, most recently from 2a32c8c to 7921feb Compare November 5, 2020 16:35
Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢 !

can_optimize_threshold (bool): Determines if threshold used by objective can be optimized or not.
"""

problem_type = ProblemTypes.BINARY
problem_types = [ProblemTypes.BINARY]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -9,5 +9,5 @@ class RegressionObjective(ObjectiveBase):
problem_type (ProblemTypes): Type of problem this objective is. Set to ProblemTypes.REGRESSION.
"""

problem_type = ProblemTypes.REGRESSION
problem_types = [ProblemTypes.REGRESSION, ProblemTypes.TIME_SERIES_REGRESSION]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the right call.

All our current regression objectives are valid for timeseries regression. So allowing REGRESSION and TIME_SERIES_REGRESSION is great.

When we add the first time-series-only objective, we can override problem_types in each of those impls to only allow TIME_SERIES_REGRESSION. We could also then choose to define TimeseriesRegressionObjective to facilitate this, but probably not necessary.

We can follow the same pattern for binary/multiclass.

@@ -11,11 +11,14 @@ class ProblemTypes(Enum):
"""Multiclass classification problem."""
REGRESSION = 'regression'
"""Regression problem."""
TIME_SERIES_REGRESSION = 'time_series_regression'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nit-pick, but: in our docs examples we typically type out problem_type='multiclass' etc. Is it easier for people to type/remember problem_type='time series regression' or problem_type='time_series_regression'?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Avoiding the underscores seems easier to type so I changed it to that hehe.

if isinstance(objective, RegressionObjective):
objective_type = ProblemTypes.REGRESSION
elif isinstance(objective, MulticlassClassificationObjective):
objective_type = ProblemTypes.MULTICLASS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine for now. But once we add objectives which aren't specific to one problem type (like timeseries), we'll have to remember to change this. Is there anything we can do to future-proof this test?

One option: add problem_type to the parametrize. Then add

if not objective.is_defined_for_problem_type(problem_type):
    pytest.skip()

(syntax prob wrong)

No prob if you don't get to this. I hope that in the wild, we'll always know the problem_type when we're working with objectives, so this doesn't worry me outside of our tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made this change! Good point - it does make the test simpler! And when we add time series to automl, we just need to update the list of values that's parametrized so that's great.

@@ -16,7 +16,7 @@ def correct_problem_types():


def test_handle_string(correct_problem_types):
problem_types = ['regression', ProblemTypes.MULTICLASS, 'binary']
problem_types = ['regression', ProblemTypes.MULTICLASS, 'binary', ProblemTypes.TIME_SERIES_REGRESSION]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏 great reviewing @bchen1116 @angela97lin !

@freddyaboulton freddyaboulton force-pushed the 1378-time-series-regression-problem-type branch from 7921feb to f4ba7d8 Compare November 6, 2020 20:32
@freddyaboulton freddyaboulton merged commit 216c8a1 into main Nov 6, 2020
@freddyaboulton freddyaboulton deleted the 1378-time-series-regression-problem-type branch November 6, 2020 20:48
@dsherry dsherry mentioned this pull request Nov 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement problem type for time series regression
5 participants