Fix data splitting errors if target is float for classification problems#2050
Fix data splitting errors if target is float for classification problems#2050angela97lin merged 3 commits intomainfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2050 +/- ##
=========================================
+ Coverage 100.0% 100.0% +0.1%
=========================================
Files 280 280
Lines 22907 22934 +27
=========================================
+ Hits 22898 22925 +27
Misses 9 9
Continue to review full report at Codecov.
|
| @patch('evalml.pipelines.ClassificationPipeline.score', return_value={"Log Loss Multiclass": 0.3}) | ||
| @patch('evalml.pipelines.BinaryClassificationPipeline.score', return_value={"Log Loss Binary": 0.3}) | ||
| @patch('evalml.automl.engine.EngineBase.train_pipeline') | ||
| def test_automl_supports_float_targets_for_classification(mock_train, mock_binary_score, mock_multi_score, mock_regression_score, |
There was a problem hiding this comment.
Testing for regression too just to make sure we're not messing things up re passing the right target values c:
There was a problem hiding this comment.
Nice, I really like this test. Very clear. 👍
chukarsten
left a comment
There was a problem hiding this comment.
I think this is good to go and very nice and concise. Love the tests. My only question, which I think is possibly made obsolete by the upcoming engine redesign will be where does the test belong, either an engine test or an automl test. I don't consider it a blocking concern, though, so whichever way you decide to go is good.
| @patch('evalml.pipelines.ClassificationPipeline.score', return_value={"Log Loss Multiclass": 0.3}) | ||
| @patch('evalml.pipelines.BinaryClassificationPipeline.score', return_value={"Log Loss Binary": 0.3}) | ||
| @patch('evalml.automl.engine.EngineBase.train_pipeline') | ||
| def test_automl_supports_float_targets_for_classification(mock_train, mock_binary_score, mock_multi_score, mock_regression_score, |
There was a problem hiding this comment.
Nice, I really like this test. Very clear. 👍
| automl = AutoMLSearch(X_train=X, y_train=y, problem_type=automl_type, random_seed=0, n_jobs=1) | ||
| automl.search() | ||
|
|
||
| # Assert that we train pipeline on the original target, not the encoded one used in EngineBase for data splitting |
There was a problem hiding this comment.
Should this be an EngineBase test? @freddyaboulton
There was a problem hiding this comment.
I hear ya! I think I want to keep this as an AutoML test though, because this is something that should work for a user interfacing with AutoMLSearch. I think right now, because EngineBase is where the data is being passed to, we're mocking and checking but that doesn't have to be the case, in which case we would just update what we're mocking.
ParthivNaresh
left a comment
There was a problem hiding this comment.
Looks excellent to me!
| y_pd_encoded = y_pd | ||
| # Encode target for classification problems so that we can support float targets. This is okay because we only use split to get the indices to split on | ||
| if is_classification(automl.problem_type): | ||
| y_mapping = {original_target: encoded_target for (encoded_target, original_target) in enumerate(y_pd.value_counts().index)} |
Closes #1954