-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add normalization option for confusion matrix #484
Conversation
Note: sklearn adds a normalization parameter in versions >0.22(.2?), but since our current dependencies allow for anything greater than 0.21.3 and since it was easier to just add our own normalization calculations rather than changing how |
Codecov Report
@@ Coverage Diff @@
## master #484 +/- ##
==========================================
+ Coverage 98.26% 98.26% +<.01%
==========================================
Files 104 104
Lines 3292 3352 +60
==========================================
+ Hits 3235 3294 +59
- Misses 57 58 +1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall, the output looks good to me. some comments on implementation
A normalized version of the input confusion matrix. | ||
|
||
""" | ||
return conf_mat.astype('float') / conf_mat.sum(axis=1)[:, np.newaxis] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm. i see. looks like that's need for supporting numpy arrays, but perhaps not needed for dataframes (my example).
i'm fine leaving it in to be safe, but perhaps leave a comment? This stack overflow resource was helpful for me: https://stackoverflow.com/questions/8904694/how-to-normalize-a-2-dimensional-numpy-array-in-python-less-verbose
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oo, nice resource! Just took a closer look at the example you posted but it seems like even with dataframes, it makes a difference? The columns are flipped but the numbers outputted are also different 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking good. just some minor comments
evalml/utils/gen_utils.py
Outdated
|
||
""" | ||
column_names = None | ||
if isinstance(conf_mat, pd.DataFrame): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you do this because of the call to np.nan_to_num(conf_mat)
later? if so, I'd just end the function with
if isinstance(conf_mat, pd.DataFrame):
conf_mat = conf_mat.fillna(9)
else:
conf_mat = np.nan_to_num(conf_mat)
this would be cleaner than storing the column names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you're right. Thanks for the suggestion, updated!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
Closes #432.
Without normalization:
With normalization:
===
Since coloration is the same there, I inputted some hard-coded data to show the difference between normalization / not-normalized colors:
Without normalization:
With normalization: