Skip to content

Test that all pipelines get the same data splits#2513

Merged
freddyaboulton merged 3 commits intomainfrom
1982-test-all-pipelines-get-the-same-splits
Jul 19, 2021
Merged

Test that all pipelines get the same data splits#2513
freddyaboulton merged 3 commits intomainfrom
1982-test-all-pipelines-get-the-same-splits

Conversation

@freddyaboulton
Copy link
Contributor

Pull Request Description

Fixes #1982


After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

@freddyaboulton freddyaboulton self-assigned this Jul 14, 2021
@codecov
Copy link

codecov bot commented Jul 14, 2021

Codecov Report

Merging #2513 (6d62b02) into main (67bc92b) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #2513     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        283     283             
  Lines      25674   25709     +35     
=======================================
+ Hits       25573   25608     +35     
  Misses       101     101             
Impacted Files Coverage Δ
evalml/tests/automl_tests/test_automl.py 99.8% <100.0%> (+0.1%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 67bc92b...6d62b02. Read the comment docs.

ProblemTypes.TIME_SERIES_MULTICLASS,
],
)
def test_data_splitter_gives_pipelines_same_data(
Copy link
Contributor Author

@freddyaboulton freddyaboulton Jul 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inspired from the unit tests @dsherry added in #2210

@freddyaboulton freddyaboulton marked this pull request as ready for review July 14, 2021 19:25
@freddyaboulton freddyaboulton force-pushed the 1982-test-all-pipelines-get-the-same-splits branch from 65b4711 to b6bdf5d Compare July 14, 2021 20:47
Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Nice test coverage!


# current automl algo trains each pipeline using 3-fold CV for "small" datasets (i.e. test data above)
# therefore, each pipeline should recieve an identical set of three training-validation splits
X_fit_hashes, y_fit_hashes = defaultdict(set), defaultdict(set)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is super cool!

Copy link
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was a super big brained move with the hashes and the assertion. It took me longer than I care to admit to understand how this worked, lol.

)
n_splits = automl.data_splitter.n_splits
env = AutoMLTestEnv(automl_type)
with env.test_context(score_return_value={automl.objective.name: 1.0}):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drinking your own koolaid. respect

Copy link
Contributor

@angela97lin angela97lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this addition, LGTM 👍

@freddyaboulton freddyaboulton force-pushed the 1982-test-all-pipelines-get-the-same-splits branch from b6bdf5d to 6d62b02 Compare July 19, 2021 13:30
@freddyaboulton freddyaboulton merged commit 019de6e into main Jul 19, 2021
@freddyaboulton freddyaboulton deleted the 1982-test-all-pipelines-get-the-same-splits branch July 19, 2021 13:53
@chukarsten chukarsten mentioned this pull request Jul 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ensure all pipelines in AutoMLSearch receive the same data splits

4 participants