-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix data splitting errors if target is float for classification problems #2050
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2050 +/- ##
=========================================
+ Coverage 100.0% 100.0% +0.1%
=========================================
Files 280 280
Lines 22907 22934 +27
=========================================
+ Hits 22898 22925 +27
Misses 9 9
Continue to review full report at Codecov.
|
@patch('evalml.pipelines.ClassificationPipeline.score', return_value={"Log Loss Multiclass": 0.3}) | ||
@patch('evalml.pipelines.BinaryClassificationPipeline.score', return_value={"Log Loss Binary": 0.3}) | ||
@patch('evalml.automl.engine.EngineBase.train_pipeline') | ||
def test_automl_supports_float_targets_for_classification(mock_train, mock_binary_score, mock_multi_score, mock_regression_score, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing for regression too just to make sure we're not messing things up re passing the right target values c:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I really like this test. Very clear. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good to go and very nice and concise. Love the tests. My only question, which I think is possibly made obsolete by the upcoming engine redesign will be where does the test belong, either an engine test or an automl test. I don't consider it a blocking concern, though, so whichever way you decide to go is good.
@patch('evalml.pipelines.ClassificationPipeline.score', return_value={"Log Loss Multiclass": 0.3}) | ||
@patch('evalml.pipelines.BinaryClassificationPipeline.score', return_value={"Log Loss Binary": 0.3}) | ||
@patch('evalml.automl.engine.EngineBase.train_pipeline') | ||
def test_automl_supports_float_targets_for_classification(mock_train, mock_binary_score, mock_multi_score, mock_regression_score, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I really like this test. Very clear. 👍
automl = AutoMLSearch(X_train=X, y_train=y, problem_type=automl_type, random_seed=0, n_jobs=1) | ||
automl.search() | ||
|
||
# Assert that we train pipeline on the original target, not the encoded one used in EngineBase for data splitting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be an EngineBase test? @freddyaboulton
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hear ya! I think I want to keep this as an AutoML test though, because this is something that should work for a user interfacing with AutoMLSearch. I think right now, because EngineBase is where the data is being passed to, we're mocking and checking but that doesn't have to be the case, in which case we would just update what we're mocking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks excellent to me!
y_pd_encoded = y_pd | ||
# Encode target for classification problems so that we can support float targets. This is okay because we only use split to get the indices to split on | ||
if is_classification(automl.problem_type): | ||
y_mapping = {original_target: encoded_target for (encoded_target, original_target) in enumerate(y_pd.value_counts().index)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice and clean!
Closes #1954