Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Thresholding for Time Series Binary #3140

Merged
merged 22 commits into from Dec 15, 2021

Conversation

ParthivNaresh
Copy link
Contributor

Fixes #3095

@codecov
Copy link

codecov bot commented Dec 9, 2021

Codecov Report

Merging #3140 (44dcab5) into main (80aa901) will decrease coverage by 0.1%.
The diff coverage is 98.2%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #3140     +/-   ##
=======================================
- Coverage   99.7%   99.7%   -0.0%     
=======================================
  Files        318     318             
  Lines      30908   30948     +40     
=======================================
+ Hits       30804   30843     +39     
- Misses       104     105      +1     
Impacted Files Coverage Δ
evalml/pipelines/pipeline_base.py 98.5% <ø> (ø)
evalml/pipelines/time_series_pipeline_base.py 100.0% <ø> (ø)
.../tests/pipeline_tests/test_time_series_pipeline.py 99.8% <ø> (ø)
evalml/tests/conftest.py 96.2% <97.4%> (+0.1%) ⬆️
evalml/automl/engine/engine_base.py 100.0% <100.0%> (ø)
evalml/automl/utils.py 100.0% <100.0%> (ø)
evalml/tests/automl_tests/test_automl_utils.py 100.0% <100.0%> (ø)
evalml/tests/automl_tests/test_engine_base.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 80aa901...44dcab5. Read the comment docs.

# Conflicts:
#	evalml/pipelines/time_series_pipeline_base.py
#	evalml/tests/automl_tests/test_automl_utils.py
#	evalml/tests/pipeline_tests/test_time_series_pipeline.py
Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ParthivNaresh Looks good to me! Thank you for making this change. The one thing I want to square away before merge is whether we should be tuning on forecast_horizon number of obs or use predict_proba_in_sample.

@@ -129,8 +134,17 @@ def train_pipeline(pipeline, X, y, automl_config, schema=True):
automl_config.optimize_thresholds
and pipeline.can_tune_threshold_with_objective(threshold_tuning_objective)
):
test_size_ = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should use predict_proba_in_sample rather than predict_proba.

My thoughts are:

  • The target is known during search so we don't have to worry about the forecast horizon
  • forecast horizon is probably less than 20% of the data and usually will be small I think. I wonder if that's enough data to find a good threshold.

@@ -849,7 +849,9 @@ class MockBinaryClassificationPipeline(TimeSeriesBinaryClassificationPipeline):
estimator = MockEstimator
component_graph = [MockEstimator]

def __init__(self, parameters, random_seed=0):
def __init__(
self, parameters, custom_name=None, component_graph=None, random_seed=0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a cosmetic change right?

Copy link
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ParthivNaresh ParthivNaresh merged commit 93444ec into main Dec 15, 2021
@angela97lin angela97lin mentioned this pull request Dec 22, 2021
@freddyaboulton freddyaboulton deleted the EnableThresholdingForBinaryTS branch May 13, 2022 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable Threshold Optimization for Binary Time Series Pipelines
3 participants