Standardize component predict/predict_proba in/out to pandas series/dataframe #853

dsherry · 2020-06-15T14:19:39Z

This PR adds test coverage which checks that for any component listed in all_components(), predict outputs a pd.Series and predict_proba outputs a pd.DataFrame. It also fixes specific components' impls to conform to this expectation.

It would be great if we could find a way in the code to avoid having to covert to/from pd datastructures all over the place. We could define a component metaclass to help with this, to wrap subclasses' predict and predict_proba methods in standardization code. But for now, this PR was easier and adds value.

We should add similar coverage for pipelines, but I'd like us to do that separately. Its unclear how necessary that is if we have good component-level coverage.

codecov · 2020-06-15T14:20:24Z

Codecov Report

Merging #853 into master will increase coverage by 0.04%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #853      +/-   ##
==========================================
+ Coverage   99.70%   99.74%   +0.04%     
==========================================
  Files         195      195              
  Lines        8181     8365     +184     
==========================================
+ Hits         8157     8344     +187     
+ Misses         24       21       -3

Impacted Files	Coverage Δ
evalml/objectives/objective_base.py	`100.00% <100.00%> (ø)`
evalml/pipelines/binary_classification_pipeline.py	`100.00% <100.00%> (ø)`
...ents/estimators/classifiers/catboost_classifier.py	`100.00% <100.00%> (ø)`
...valml/pipelines/components/estimators/estimator.py	`100.00% <100.00%> (ø)`
...onents/estimators/regressors/catboost_regressor.py	`100.00% <100.00%> (ø)`
.../pipelines/components/transformers/drop_columns.py	`100.00% <100.00%> (ø)`
...transformers/feature_selection/feature_selector.py	`83.78% <100.00%> (+5.40%)`	⬆️
...onents/transformers/imputers/per_column_imputer.py	`100.00% <100.00%> (ø)`
...components/transformers/imputers/simple_imputer.py	`100.00% <100.00%> (ø)`
...l/pipelines/components/transformers/transformer.py	`100.00% <100.00%> (ø)`
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2a56753...3006d3d. Read the comment docs.

kmax12 · 2020-06-17T10:54:50Z

docs/source/automl/search_results.ipynb

@@ -145,7 +145,7 @@
   "outputs": [],
   "source": [
    "# get the predicted probabilities associated with the \"true\" label\n",
-    "y_pred_proba = pipeline.predict_proba(X)[:, 1]\n",
+    "y_pred_proba = pipeline.predict_proba(X)[1]\n",


can we make this code pipeline.predict_proba(X)["true"] now?

Yeah, that's up next, but not in this PR--that's tracked by #645 which will get unblocked by this work.

dsherry · 2020-06-17T20:54:53Z

Still battling codecov but this is ready for review

dsherry · 2020-06-17T20:57:53Z

evalml/pipelines/components/estimators/classifiers/catboost_classifier.py

+        if not isinstance(X, pd.DataFrame):
+            X = pd.DataFrame(X)
+        if not isinstance(y, pd.Series):
+            y = pd.Series(y)


The solution in this PR is not elegant: it adds this sort of code to a bunch of components, rather than putting a general pattern in place.

I still feel good about this PR, because the test coverage iterates over anything in all_components, meaning new components will have consistent coverage.

I will file something to track finding a permanent solution for how to avoid scattering this code around our codebase. One idea is to use a metaclass on components, which wraps fit/predict/etc and ensures the inputs and return type are consistent. We could use the same solution for #851

dsherry · 2020-06-17T20:58:37Z

evalml/pipelines/components/transformers/drop_columns.py

@@ -40,7 +40,7 @@ def transform(self, X, y=None):
        Returns:
            pd.DataFrame: Transformed X
        """
-        cols = self.parameters.get("columns", [])
+        cols = self.parameters.get("columns") or []


Hm I don't remember if this change was necessary, will try backing out

dsherry · 2020-06-17T20:58:49Z

evalml/pipelines/components/transformers/feature_selection/feature_selector.py

@@ -36,18 +36,19 @@ def transform(self, X, y=None):
        if isinstance(X, pd.DataFrame):
            self.input_feature_names = list(X.columns.values)
        else:
-            self.input_feature_names = range(len(X.shape[1]))


This was a bug!

dsherry · 2020-06-17T20:59:48Z

evalml/pipelines/components/transformers/imputers/simple_imputer.py

-            X_t = pd.DataFrame(X_t, columns=X_null_dropped.columns).astype(X_null_dropped.dtypes.to_dict())
-        return X_t
+            return pd.DataFrame(X_t, columns=X_null_dropped.columns).astype(X_null_dropped.dtypes.to_dict())
+        return pd.DataFrame(X_t)


This isn't great. Suggestions appreciated

angela97lin

Glad that you're doing this, even if it's just a short-term solution! 👍

angela97lin · 2020-06-18T02:23:58Z

evalml/tests/pipeline_tests/test_graph_utils.py

@@ -186,11 +186,15 @@ def test_normalize_confusion_matrix_error():
        normalize_confusion_matrix(conf_mat, 'all')


-def test_graph_roc_curve_binary(X_y):
+@pytest.mark.parametrize("data_type", ['np', 'pd'])


Oo cool, didn't know this was a thing!

Yeah! Pretty fun :) and you can run individual tests from the cmdline with pytest evalml/tests/pipeline_tests/test_graph_utils.py::test_graph_roc_curve_binary\[np\] (backslash escaping needed for unix shell)

angela97lin · 2020-06-18T02:27:11Z

evalml/pipelines/components/estimators/classifiers/catboost_classifier.py

@@ -64,9 +68,12 @@ def fit(self, X, y=None):

    def predict(self, X):
        predictions = self._component_obj.predict(X)
+        if predictions.ndim == 2 and predictions.shape[1] == 1:


Is this related to the standardization or an unrelated change? :o

Related. If predictions is a 2d input with one column, inverse_transform will fail.

angela97lin · 2020-06-18T02:36:15Z

evalml/tests/pipeline_tests/test_graph_utils.py

    y_true = [2, 0, 2, 2, 0, 1]
    y_predicted = [0, 0, 2, 2, 0, 2]
+    expected_return_type = pd.DataFrame


Just wondering, why not just check assert isinstance(conf_mat, pd.DataFrame)? :o

Oops, thanks, yeah this was left over from an earlier state

Just pushed this too.

angela97lin · 2020-06-18T02:43:49Z

evalml/tests/component_tests/test_components.py

+    range_index = pd.RangeIndex(start=0, stop=X_np.shape[1], step=1)
+    assert (X_df_no_col_names.columns == range_index).all()


I both like this check but am also conflicted because it seems more like we're testing pandas and what pandas DataFrame does when we don't specify a column parameter 🤷‍♀️

You're right. This was a sanity check I left in from when I was setting up the test. I can clean this up. Thanks.

Not sure if I'm looking at an outdated version but just FYI!

Augh! Yes, thanks for the reminder

angela97lin · 2020-06-18T02:44:39Z

evalml/tests/component_tests/test_components.py

+    components = all_components()
+    transformers = dict(filter(lambda el: issubclass(el[1], Transformer), components.items()))
+    for component_name, component_class in transformers.items():
+        print('Testing transformer {}'.format(component_class.name))


Is this intentionally left here o:

I've found leaving prints in these tests which iterate over each component/pipeline is helpful when there are failures, because the last printed value is the one for which the failure occurred.

I guess this particular print is redundant because there's another below. Will remove

Ah that makes sense, but do we want to keep them in when we push to master? :O

Yeah. Is there a reason we shouldn't? My understanding is pytest will suppress print output unless there's a failure, which is when this information would become helpful. I say this because I don't see this print (or others) in the current test output!

ctduffy · 2020-06-18T13:27:08Z

evalml/tests/component_tests/test_xgboost_classifier.py

@@ -59,4 +59,4 @@ def test_xgboost_feature_name_with_random_ascii(X_y):
    predictions = clf.predict(X)
    assert len(predictions) == len(y)
    assert not np.isnan(predictions).all()
-    assert not np.isnan(clf.predict_proba(X)).all()
+    assert not np.isnan(clf.predict_proba(X)).all().all()


Why the second .all() call? I see that when doing the tests, this generates a higher dimensional array so the second is necessary to get something that assert will accept, but why take np.isnan(clf.predict_proba(X)) instead of np.isnan(clf.predict_proba(X).to_numpy()), which is generally what you have in the other files? Just wondering in terms of consistency if there was any reason for not doing the to_numpy in this case as opposed to in the other cases

Just in terms of my other comment, I think it just makes sense to me to choose one or the other (numpy array or pandas dataframes) for most tests or at least for classes of tests to standardize between them

First, predict_proba returns a 2d array of predicted probabilities, class vs row. Next, all estimators' predict() methods now return pd.Series, and all estimators' predict_proba() methods now return pd.DataFrame. Finally, in order to check if there are no nans in a 2d array, we have to do .all().all() to AND the check across every row and column. Does that make sense?

RE your second comment, I agree, and that's what this PR does! We've standardized component and pipeline predictions to pandas types. So all the tests check that the output is pandas. See the main new tests I added in test_components.py for type checking across all components

I think my question was more along the lines of why do you do .all() twice instead of converting to numpy and just doing .all() once, which is more similar to the way that some other tests are done?
And then re my second comment it seems like that change made sense, but then there were other tests that first converted to numpy and then checked the closeness of the arrays rather than just using the pandas assert_frame_equal-it was more of a consistency question (for both)

Oh, got it, np.isnan(clf.predict_proba(X).to_numpy()).all(), yep that'd work.

RE assert_frame_equal, yeah I think whatever gets the job done! In this case, assert_frame_equal worked and didn't cause platform issues.

I guess for future, does it matter if there is continuity in the testing between pd.testing.assert_frame_equal(a, b) vs np.assert_almost_equal(a.as_numpy(), b.as_numpy()), or is either ok in regard to style/efficiency?

Good question. As long as our tests provide good coverage, I think we're doing well. What I mean by that is that if we need the "almost" equal functionality to avoid numerical precision errors, we should use that, but otherwise its just a style thing.

ctduffy · 2020-06-18T13:32:01Z

evalml/tests/pipeline_tests/test_pipelines.py

@@ -716,4 +716,4 @@ def test_clone_fitted(X_y, lr_pipeline):
    pipeline_clone.fit(X, y)
    X_t_clone = pipeline_clone.predict_proba(X)

-    np.testing.assert_almost_equal(X_t, X_t_clone)
+    pd.testing.assert_frame_equal(X_t, X_t_clone)


Is there a reason you use pd.testing.assert_frame_equal here and in other places convert to numpy array and then use the similar numpy version, np.assert_almost_equal first? I think it might make sense to use this particular call elsewhere instead of converting to numpy arrays each time, especially since this PR was standardizing to pandas, it might make sense to test in that environment as well. It seems like this can also support the "almost equal" part of numpy comparisons, as there is an ability to specify how close you want the threshold for equal to be in the documentation of this function.

So, this test isn't doing any conversion to np array -- predict and predict_proba now always return pd dataframes. In light of that, we need to use pandas' comparison method, because np's method doesn't work with pd types. Try it out on the cmdline if you're curious.

Ah sorry my comment wasn't that clear-I think I was wondering why you do it this way here, but then convert to a numpy array and then compare with the numpy method in other places instead of doing it in this more simple way. I just left another comment at an example of using the numpy way of doing it (with the extra conversion) rather than the pandas way like you do here. I think the one here (this file, test_pipelines) makes more sense

ctduffy · 2020-06-18T14:40:28Z

evalml/tests/pipeline_tests/classification_pipeline_tests/test_catboost_classification.py

@@ -64,13 +64,13 @@ def test_catboost_objective_tuning(X_y):
    # testing objective parameter passed in does not change results
    objective = Precision()
    y_pred_with_objective = clf.predict(X, objective)
-    np.testing.assert_almost_equal(y_pred, y_pred_with_objective, decimal=5)
+    np.testing.assert_almost_equal(y_pred.to_numpy(), y_pred_with_objective.to_numpy(), decimal=5)


@dsherry here is an example of the place where you convert to numpy and then use the numpy function instead of just using the pandas assertion. (sorry probably should have commented on this one instead of the one with the pandas version of testing array equality)

…unit test coverage for pd+np dtypes with objective scoring

…g scoring

angela97lin

Left a few small comments but otherwise LGTM!

angela97lin · 2020-06-19T16:33:43Z

evalml/pipelines/graph_utils.py

+    if isinstance(y_true, pd.Series):
+        y_true = y_true.to_numpy()
+    if isinstance(y_pred_proba, (pd.Series, pd.DataFrame)):
+        y_pred_proba = y_pred_proba.to_numpy()


This, along with the new static method you added in ObjectiveBase, makes me wonder if it'd be worth extracting into some util method but 🤷‍♀️

I agree! It would definitely be worth it. In this PR I just wanted to get solid test coverage in place and apply minimal changes to get them passing. Once this is merged, yeah, we should get a better fix in place!

angela97lin · 2020-06-19T16:37:53Z

evalml/tests/component_tests/test_components.py

@@ -238,7 +240,7 @@ class MockTransformerWithFitAndTransform(Transformer):
        hyperparameter_ranges = {}

        def fit(self, X, y=None):
-            return X
+            return self


angela97lin · 2020-06-19T16:42:11Z

evalml/tests/component_tests/test_components.py

+    components = all_components()
+    transformers = dict(filter(lambda el: issubclass(el[1], Transformer), components.items()))
+    for component_name, component_class in transformers.items():
+        print('Testing transformer {}'.format(component_class.name))


Ah that makes sense, but do we want to keep them in when we push to master? :O

angela97lin · 2020-06-19T16:43:50Z

evalml/tests/component_tests/test_components.py

+    range_index = pd.RangeIndex(start=0, stop=X_np.shape[1], step=1)
+    assert (X_df_no_col_names.columns == range_index).all()


Not sure if I'm looking at an outdated version but just FYI!

dsherry force-pushed the ds_236_standardize_component_output_type branch from 2b1e2ee to 31c7796 Compare June 17, 2020 04:06

kmax12 reviewed Jun 17, 2020

View reviewed changes

dsherry marked this pull request as ready for review June 17, 2020 20:54

auto-assign bot assigned dsherry Jun 17, 2020

dsherry requested review from angela97lin, jeremyliweishih, eccabay and ctduffy June 17, 2020 20:54

dsherry commented Jun 17, 2020

View reviewed changes

angela97lin reviewed Jun 18, 2020

View reviewed changes

ctduffy reviewed Jun 18, 2020

View reviewed changes

dsherry requested review from kmax12, angela97lin and ctduffy June 19, 2020 13:23

dsherry added 8 commits June 19, 2020 09:24

Update all components to return pandas DataFrames

f9d0352

Changelog

a608599

Convert types for graphing libs

a3944d1

Update graph utils tests

fd92918

Lint

159123e

Update one failing test

7400202

Add unit test to cover dtype checking in objectives, for codecov

e8ae86c

Codecov

e6199a7

dsherry force-pushed the ds_236_standardize_component_output_type branch from fcfdf0e to e6199a7 Compare June 19, 2020 13:24

Add fix for multiclass validation--wasn't working with pd types. Add …

a02708f

…unit test coverage for pd+np dtypes with objective scoring

dsherry added 2 commits June 19, 2020 11:23

Remove dead code, now that we're standardizing inputs to pandas durin…

7d81a3d

…g scoring

Add test coverage for missing fraud_cost case

29d7ff0

angela97lin approved these changes Jun 19, 2020

View reviewed changes

Code review comments

3006d3d

dsherry merged commit da28481 into master Jun 19, 2020

dsherry deleted the ds_236_standardize_component_output_type branch June 19, 2020 18:09

dsherry mentioned this pull request Jun 19, 2020

Standardize component outputs to be either dataframes or series #236

Closed

angela97lin mentioned this pull request Jun 30, 2020

Release v0.11.0 #901

Merged

		range_index = pd.RangeIndex(start=0, stop=X_np.shape[1], step=1)
		assert (X_df_no_col_names.columns == range_index).all()

Standardize component predict/predict_proba in/out to pandas series/dataframe #853

Standardize component predict/predict_proba in/out to pandas series/dataframe #853

Conversation

dsherry commented Jun 15, 2020

codecov bot commented Jun 15, 2020 • edited

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dsherry commented Jun 17, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

angela97lin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dsherry Jun 18, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ctduffy Jun 18, 2020 • edited

Choose a reason for hiding this comment

angela97lin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jun 15, 2020 •

edited

dsherry Jun 18, 2020 •

edited

ctduffy Jun 18, 2020 •

edited