Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up and add woodwork support for more methods #1544

Merged
merged 14 commits into from
Dec 14, 2020

Conversation

angela97lin
Copy link
Contributor

Closes #1292

@angela97lin angela97lin added this to the December 2020 milestone Dec 10, 2020
@angela97lin angela97lin self-assigned this Dec 10, 2020
return conf_mat


def graph_confusion_matrix(y_true, y_pred, normalize_method='true', title_addition=None):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just moved this up, to be spatially connected with the other confusion matrix-related methods 😆

@codecov
Copy link

codecov bot commented Dec 10, 2020

Codecov Report

Merging #1544 (a37ab4f) into main (0f1f1c6) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@            Coverage Diff            @@
##             main    #1544     +/-   ##
=========================================
+ Coverage   100.0%   100.0%   +0.1%     
=========================================
  Files         234      234             
  Lines       16742    16773     +31     
=========================================
+ Hits        16734    16765     +31     
  Misses          8        8             
Impacted Files Coverage Δ
evalml/objectives/cost_benefit_matrix.py 100.0% <ø> (ø)
evalml/objectives/fraud_cost.py 100.0% <ø> (ø)
evalml/objectives/lead_scoring.py 100.0% <ø> (ø)
evalml/tests/conftest.py 100.0% <ø> (ø)
evalml/model_understanding/graphs.py 99.7% <100.0%> (+0.1%) ⬆️
...lml/tests/model_understanding_tests/test_graphs.py 100.0% <100.0%> (ø)
.../tests/objective_tests/test_cost_benefit_matrix.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0f1f1c6...a37ab4f. Read the comment docs.

@angela97lin angela97lin marked this pull request as ready for review December 11, 2020 16:42
@@ -74,9 +74,21 @@
"from evalml.demos.churn import load_churn\n",
"from evalml.preprocessing import split_data\n",
"X, y = load_churn()\n",
"X = X.set_types({'PaymentMethod':'Categorical', 'Contract': 'Categorical'}) # Update data types Woodwork did not correctly infer\n",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woodwork inferred these as natural language, causing a text featurizer to be added and the runtime to significantly increase so manually updating here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I don't think there are any other places left in the docs where we unintentionally use a text featurizer.

@@ -45,13 +45,8 @@ def confusion_matrix(y_true, y_predicted, normalize_method='true'):
"""
y_true = _convert_to_woodwork_structure(y_true)
y_predicted = _convert_to_woodwork_structure(y_predicted)
y_true = _convert_woodwork_types_wrapper(y_true.to_series())
y_predicted = _convert_woodwork_types_wrapper(y_predicted.to_series())
if isinstance(y_true, pd.Series):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, since _convert_woodwork_types_wrapper will always return pandas so no need for if statement

assert isinstance(conf_mat, pd.DataFrame)

conf_mat = confusion_matrix(y_true, y_predicted, normalize_method='pred')
conf_mat_expected = np.array([[2 / 3.0, np.nan, 0], [0, np.nan, 1 / 3.0], [1 / 3.0, np.nan, 2 / 3.0]])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why this wasn't caught / if this was expected behavior? But based on our code for confusion_matrix, if any of the axis calculations while normalizing have a sum of zero, then we should raise an error (test_normalize_confusion_matrix_error); Updated values of matrix here so no error is thrown.

Copy link
Contributor

@freddyaboulton freddyaboulton Dec 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a bug in the implementation/test 😬 The warning is only raised if the input to normalize_confusion_matrix is a numpy array but in confusion_matrix we always convert to a dataframe. This means there is no warning to catch so we don't raise the ValueError.

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah you're absolutely right, it's only numpy that only displayed this warning 😢

Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angela97lin Thanks! I think you found a bug in our implementation of normalize_confusion_matrix hehe

@@ -74,9 +74,21 @@
"from evalml.demos.churn import load_churn\n",
"from evalml.preprocessing import split_data\n",
"X, y = load_churn()\n",
"X = X.set_types({'PaymentMethod':'Categorical', 'Contract': 'Categorical'}) # Update data types Woodwork did not correctly infer\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I don't think there are any other places left in the docs where we unintentionally use a text featurizer.

dates (pd.Series): Dates corresponding to target values and predictions.
X (ww.DataTable, pd.DataFrame): Features used to generate new predictions.
y (ww.DataColumn, pd.Series): Target values to compare predictions against.
dates (ww.DataColumn, pd.Series): Dates corresponding to target values and predictions.

Returns:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to convert dates and y to pandas here. I don't think reset_index will work on a DataColumn.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@freddyaboulton Ah, you're right. Looks like there's no tests for this 🤔 I'll add something and check for ww!

assert isinstance(conf_mat, pd.DataFrame)

conf_mat = confusion_matrix(y_true, y_predicted, normalize_method='pred')
conf_mat_expected = np.array([[2 / 3.0, np.nan, 0], [0, np.nan, 1 / 3.0], [1 / 3.0, np.nan, 2 / 3.0]])
Copy link
Contributor

@freddyaboulton freddyaboulton Dec 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a bug in the implementation/test 😬 The warning is only raised if the input to normalize_confusion_matrix is a numpy array but in confusion_matrix we always convert to a dataframe. This means there is no warning to catch so we don't raise the ValueError.

image

@angela97lin angela97lin merged commit 3698425 into main Dec 14, 2020
@angela97lin angela97lin deleted the 1292_utils_graphs_ww branch December 14, 2020 21:57
@dsherry dsherry mentioned this pull request Dec 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update all graph methods, util methods which accept data, to support DataTables
2 participants