Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make utils to handle the logic for threshold tuning objective and resplitting data #3888

Merged
merged 8 commits into from
Dec 15, 2022

Conversation

tamargrey
Copy link
Contributor

closes #3885
closes #3863

Allows us to determine for pipelines produced by automl search whether the full training data was used to train them.

@@ -261,3 +262,62 @@ def get_pipelines_from_component_graphs(
),
)
return created_pipelines


def get_threshold_tuning_info(automl_config, pipeline):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I stuck these in automl utils instead of putting them in the engine_base file/making a new engine utils file, because part of their use is in determining after automl search whether pipelines were trained on the full data. So since they can be used in the wider context of automl, I thought it made sense to put them there. If anyone has any strong feelings to the contrary, definitely let me know!

@@ -103,7 +103,7 @@ def test_train_and_score_pipelines_error(
assert "yeet" in caplog.text


@patch("evalml.automl.engine.engine_base.split_data")
@patch("evalml.automl.utils.split_data")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic isn't changing, so I didn't add any tests that specifically test these utils on their own, but I'd be open to adding some if that's a necessary part of pulling this logic out into public utils

@codecov
Copy link

codecov bot commented Dec 13, 2022

Codecov Report

Merging #3888 (9f7d393) into main (1ab688d) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #3888     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        346     346             
  Lines      36352   36358      +6     
=======================================
+ Hits       36221   36227      +6     
  Misses       131     131             
Impacted Files Coverage Δ
evalml/automl/__init__.py 100.0% <ø> (ø)
evalml/automl/engine/engine_base.py 100.0% <100.0%> (ø)
evalml/automl/utils.py 97.3% <100.0%> (+0.5%) ⬆️
evalml/tests/automl_tests/test_automl.py 99.5% <100.0%> (ø)
.../automl_tests/test_automl_search_classification.py 96.4% <100.0%> (ø)
evalml/tests/automl_tests/test_engine_base.py 100.0% <100.0%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@tamargrey tamargrey marked this pull request as ready for review December 14, 2022 14:03
Copy link
Contributor

@christopherbunn christopherbunn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! RE: additional tests for the new functions, I also agree that it doesn't seem necessary given that the logic was covered before.

Copy link
Contributor

@eccabay eccabay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks for taking care of this!

@@ -2,6 +2,7 @@ Release Notes
-------------
**Future Releases**
* Enhancements
* Make utils to handle the logic for threshold tuning objective and resplitting data :pr:`3888`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have decided past tense release notes are the hill I'm going to die on. Can we do make --> made? Or even really "added" instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah damn! I had noticed that and had been doing past tense to fit in. I'll make the change!

For what it's worth, my commit message (and release note since I mostly treat them the same) writing style comes from this article that I was forced asked to read at my first internship. The thing that made the biggest impression on me was this section
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, I see the internet is kind of split between what tense release notes should be in, but more of them say past tense than present tense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, my personal ethos has always been past tense for release notes and present tense for commits, and that seems to be what we generally follow. Super interesting article though, lots to think about there!

Copy link
Collaborator

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM as well - thanks for handling this!

@tamargrey tamargrey merged commit a37d089 into main Dec 15, 2022
@tamargrey tamargrey deleted the make-resplit-utils branch December 15, 2022 17:11
@christopherbunn christopherbunn mentioned this pull request Jan 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants