Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add validation_score as 1st CV fold score into rankings #1221

Merged
merged 11 commits into from
Sep 29, 2020

Conversation

jeremyliweishih
Copy link
Collaborator

Fixes #1115.

@jeremyliweishih jeremyliweishih changed the title add validation_score as 1st CV fold score into rankings Add validation_score as 1st CV fold score into rankings Sep 24, 2020
@codecov
Copy link

codecov bot commented Sep 24, 2020

Codecov Report

Merging #1221 into main will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1221   +/-   ##
=======================================
  Coverage   99.92%   99.92%           
=======================================
  Files         200      200           
  Lines       12365    12369    +4     
=======================================
+ Hits        12356    12360    +4     
  Misses          9        9           
Impacted Files Coverage Δ
evalml/automl/automl_search.py 99.58% <100.00%> (ø)
evalml/tests/automl_tests/test_automl.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4387fe0...07e8e1d. Read the comment docs.

@jeremyliweishih jeremyliweishih marked this pull request as ready for review September 24, 2020 15:21
@jeremyliweishih jeremyliweishih self-assigned this Sep 24, 2020
@bchen1116
Copy link
Contributor

If we're using the first CV fold score as the validation score, shouldn't the score reported then be the scores of the remaining elements, excluding this first fold score?

Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeremyliweishih Looks good to me! Are you planning on updating the docs/user guide? Might be worth explaining the difference between score and validation_score ?

@jeremyliweishih
Copy link
Collaborator Author

@freddyaboulton good idea, i'll take a look.

Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

@@ -697,7 +697,8 @@ def _add_result(self, trained_pipeline, parameters, training_time, cv_data, cv_s
"high_variance_cv": high_variance_cv,
"training_time": training_time,
"cv_data": cv_data,
"percent_better_than_baseline": percent_better
"percent_better_than_baseline": percent_better,
"validation_score": cv_scores[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, I didn't realize how simple this code change would be! 👍

@@ -236,7 +236,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.2"
"version": "3.7.4"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can delete

@@ -134,7 +134,7 @@
"metadata": {},
"source": [
"## View Rankings\n",
"A summary of all the pipelines built can be returned as a pandas DataFrame which is sorted by score."
"A summary of all the pipelines built can be returned as a pandas DataFrame which is sorted by score. The score column contains the average score across all cross-validation folds while the validation_score column is computed from the first cross-validation fold."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -75,14 +75,15 @@ def test_search_results(X_y_regression, X_y_binary, X_y_multi, automl_type):
for score in all_objective_scores.values():
assert score is not None
assert automl.get_pipeline(pipeline_id).parameters == results['parameters']
assert results['validation_score'] == pd.Series([fold['score'] for fold in results['cv_data']])[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

docs/source/release_notes.rst Outdated Show resolved Hide resolved
@jeremyliweishih jeremyliweishih merged commit e9cba15 into main Sep 29, 2020
@angela97lin angela97lin mentioned this pull request Sep 29, 2020
@freddyaboulton freddyaboulton deleted the js_1115_validation branch May 13, 2022 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add validation column to rankings and full_rankings computed from 1st CV fold
4 participants