Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass X_train, y_train in Engine.submit_scoring_job for time series #2786

Merged
merged 4 commits into from
Sep 16, 2021

Conversation

freddyaboulton
Copy link
Contributor

Pull Request Description

Fixes #2785


After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

@codecov
Copy link

codecov bot commented Sep 15, 2021

Codecov Report

Merging #2786 (23aeeb5) into main (d54173a) will increase coverage by 0.8%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #2786     +/-   ##
=======================================
+ Coverage   99.0%   99.8%   +0.8%     
=======================================
  Files        298     298             
  Lines      27646   27681     +35     
=======================================
+ Hits       27364   27613    +249     
+ Misses       282      68    -214     
Impacted Files Coverage Δ
evalml/automl/automl_search.py 99.9% <100.0%> (+0.2%) ⬆️
evalml/automl/engine/cf_engine.py 100.0% <100.0%> (ø)
evalml/automl/engine/dask_engine.py 100.0% <100.0%> (ø)
evalml/automl/engine/engine_base.py 100.0% <100.0%> (ø)
evalml/automl/engine/sequential_engine.py 100.0% <100.0%> (ø)
evalml/tests/automl_tests/dask_test_utils.py 100.0% <100.0%> (ø)
...ts/automl_tests/parallel_tests/test_automl_dask.py 100.0% <100.0%> (ø)
evalml/tests/automl_tests/test_automl.py 99.7% <0.0%> (+0.1%) ⬆️
evalml/automl/utils.py 100.0% <0.0%> (+1.7%) ⬆️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d54173a...23aeeb5. Read the comment docs.


@pytest.mark.parametrize(
"engine_str",
engine_strs + ["sequential"],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chukarsten Check it out - threaded engines respect mocks

Copy link
Contributor

@ParthivNaresh ParthivNaresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid catch, looks good!

Copy link
Contributor

@angela97lin angela97lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! 😁

@@ -159,6 +163,7 @@ def submit_scoring_job(self, automl_config, pipeline, X, y, objectives):
X_schema = X.ww.schema
y_schema = y.ww.schema
X, y = self.send_data_to_cluster(X, y)
X_train, y_train = self.send_data_to_cluster(X_train, y_train)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my own curiosity: theoretically, if send_data_to_cluster supported more arguments, this could have been combined with the line above, right? 🤔

Comment on lines 293 to 296
X_train, y_train = X[:50], y[:50]
X_test, y_test = X[50:], y[50:]
X_train, y_train = pd.DataFrame(X_train), pd.Series(y_train)
X_test, y_test = pd.DataFrame(X_test), pd.Series(y_test)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Omega nitpick: could probably combine these lines to:

X_train, y_train = pd.DataFrame(X[:50]), pd.Series(y[:50])
X_test, y_test = pd.DataFrame(X[50:]), pd.Series(y[50:])

But might just be personal preference 😅

Side note: This is probably a task for the larger test refactoring/cleanup PR, but I wonder if it's worth making our fixtures dataframes, since we've slowly been moving away from explicitly supporting numpy arrays lol

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angela97lin Agreed that having our most common fixtures be numpy arrays is not ideal - maybe tests our code with unrepresentative inputs compared to what a user would pass in!

@freddyaboulton freddyaboulton force-pushed the 2785-fix-score-pipelines-for-automl-search branch from 2ecef68 to a654370 Compare September 16, 2021 14:35
@freddyaboulton freddyaboulton force-pushed the 2785-fix-score-pipelines-for-automl-search branch from a654370 to 23aeeb5 Compare September 16, 2021 16:25
@freddyaboulton freddyaboulton merged commit d1e6afb into main Sep 16, 2021
@freddyaboulton freddyaboulton deleted the 2785-fix-score-pipelines-for-automl-search branch September 16, 2021 17:08
@chukarsten chukarsten mentioned this pull request Oct 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AutoMLSearch.score_pipelines does not work for time series pipelines
3 participants