-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up and add woodwork support for more methods #1544
Conversation
return conf_mat | ||
|
||
|
||
def graph_confusion_matrix(y_true, y_pred, normalize_method='true', title_addition=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just moved this up, to be spatially connected with the other confusion matrix-related methods 😆
Codecov Report
@@ Coverage Diff @@
## main #1544 +/- ##
=========================================
+ Coverage 100.0% 100.0% +0.1%
=========================================
Files 234 234
Lines 16742 16773 +31
=========================================
+ Hits 16734 16765 +31
Misses 8 8
Continue to review full report at Codecov.
|
@@ -74,9 +74,21 @@ | |||
"from evalml.demos.churn import load_churn\n", | |||
"from evalml.preprocessing import split_data\n", | |||
"X, y = load_churn()\n", | |||
"X = X.set_types({'PaymentMethod':'Categorical', 'Contract': 'Categorical'}) # Update data types Woodwork did not correctly infer\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woodwork inferred these as natural language, causing a text featurizer to be added and the runtime to significantly increase so manually updating here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I don't think there are any other places left in the docs where we unintentionally use a text featurizer.
@@ -45,13 +45,8 @@ def confusion_matrix(y_true, y_predicted, normalize_method='true'): | |||
""" | |||
y_true = _convert_to_woodwork_structure(y_true) | |||
y_predicted = _convert_to_woodwork_structure(y_predicted) | |||
y_true = _convert_woodwork_types_wrapper(y_true.to_series()) | |||
y_predicted = _convert_woodwork_types_wrapper(y_predicted.to_series()) | |||
if isinstance(y_true, pd.Series): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed, since _convert_woodwork_types_wrapper
will always return pandas so no need for if statement
assert isinstance(conf_mat, pd.DataFrame) | ||
|
||
conf_mat = confusion_matrix(y_true, y_predicted, normalize_method='pred') | ||
conf_mat_expected = np.array([[2 / 3.0, np.nan, 0], [0, np.nan, 1 / 3.0], [1 / 3.0, np.nan, 2 / 3.0]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why this wasn't caught / if this was expected behavior? But based on our code for confusion_matrix
, if any of the axis calculations while normalizing have a sum of zero, then we should raise an error (test_normalize_confusion_matrix_error
); Updated values of matrix here so no error is thrown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah you're absolutely right, it's only numpy that only displayed this warning 😢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@angela97lin Thanks! I think you found a bug in our implementation of normalize_confusion_matrix
hehe
@@ -74,9 +74,21 @@ | |||
"from evalml.demos.churn import load_churn\n", | |||
"from evalml.preprocessing import split_data\n", | |||
"X, y = load_churn()\n", | |||
"X = X.set_types({'PaymentMethod':'Categorical', 'Contract': 'Categorical'}) # Update data types Woodwork did not correctly infer\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I don't think there are any other places left in the docs where we unintentionally use a text featurizer.
dates (pd.Series): Dates corresponding to target values and predictions. | ||
X (ww.DataTable, pd.DataFrame): Features used to generate new predictions. | ||
y (ww.DataColumn, pd.Series): Target values to compare predictions against. | ||
dates (ww.DataColumn, pd.Series): Dates corresponding to target values and predictions. | ||
|
||
Returns: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to convert dates and y to pandas here. I don't think reset_index
will work on a DataColumn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@freddyaboulton Ah, you're right. Looks like there's no tests for this 🤔 I'll add something and check for ww!
assert isinstance(conf_mat, pd.DataFrame) | ||
|
||
conf_mat = confusion_matrix(y_true, y_predicted, normalize_method='pred') | ||
conf_mat_expected = np.array([[2 / 3.0, np.nan, 0], [0, np.nan, 1 / 3.0], [1 / 3.0, np.nan, 2 / 3.0]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
… 1292_utils_graphs_ww
… 1292_utils_graphs_ww
Closes #1292