Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add precision/recall curve #794

Merged
merged 13 commits into from May 28, 2020
Merged

Add precision/recall curve #794

merged 13 commits into from May 28, 2020

Conversation

eccabay
Copy link
Contributor

@eccabay eccabay commented May 21, 2020

Closes #792

@codecov
Copy link

codecov bot commented May 22, 2020

Codecov Report

Merging #794 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #794   +/-   ##
=======================================
  Coverage   99.52%   99.52%           
=======================================
  Files         159      159           
  Lines        6257     6306   +49     
=======================================
+ Hits         6227     6276   +49     
  Misses         30       30           
Impacted Files Coverage Δ
evalml/pipelines/__init__.py 100.00% <ø> (ø)
evalml/pipelines/graph_utils.py 100.00% <100.00%> (ø)
evalml/tests/pipeline_tests/test_graph_utils.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 68b9ef4...9b16c7d. Read the comment docs.

@eccabay eccabay marked this pull request as ready for review May 22, 2020
@eccabay eccabay requested review from dsherry and angela97lin May 22, 2020
Copy link
Contributor

@angela97lin angela97lin left a comment

This is really neat! :D

I took a quick glance and this is looking pretty good. Would you be able to add an example to the documentation? We currently have the other graph utils listed under search_results.ipynb so that'd probably be a good place (https://evalml.featurelabs.com/en/latest/automl/search_results.html)

Copy link
Contributor

@angela97lin angela97lin left a comment

Just left a comment on the docs formatting but otherwise, LGTM! Great work :D

"source": [
"## Precision-Recall Curve\n",
"\n",
"For binary classification, you can view the precision-recall curve of a classifier"
Copy link
Contributor

@angela97lin angela97lin May 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

dict: Dictionary containing metrics used to generate a precision-recall plot, with the following keys:
* `precision`: Precision values.
* `recall`: Recall values.
* `thresholds`: Threshold values used to produce the precision and recall.
* `auc_score`: The area under the ROC curve.
"""
precision, recall, thresholds = sklearn_precision_recall_curve(y_true, y_pred_proba)
auc_score = sklearn_auc(recall, precision)
return {'precision': precision,
'recall': recall,
'thresholds': thresholds,
'auc_score': auc_score}
Copy link
Contributor

@angela97lin angela97lin May 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a little bit funky in the docs (just the line for "Dictionary containing metrics used to generate a precision-recall plot, with the following keys") being bolded the same way "Returns"/other headers are. I think maybe using - for bullets might fix this but not sure off the top of my head?

image

Copy link
Contributor Author

@eccabay eccabay May 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angela97lin is there a spot in the docs I can use as a model? I can't seem to find anywhere that looks the way you describe, the other functions in graph_utils.py look the same to me

Copy link
Contributor

@angela97lin angela97lin May 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, not sure if there's an example in the docs currently. Let me try to create one, but otherwise this isn't a big deal at all 😆

Copy link
Contributor

@angela97lin angela97lin May 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, I just recreated it in a branch that I'm working on: https://evalml.featurelabs.com/en/710_target_check/generated/methods/evalml.data_checks.DetectInvalidTargetsDataCheck.validate.html#evalml.data_checks.DetectInvalidTargetsDataCheck.validate

(Adding a screenshot in case this disappears when I update my branch)
image

Here's the docstring I used to create this:

        Returns:
            list (DataCheckError): list with DataCheckErrors if any invalid data is found in target labels.

                - abc
                - def

I think the newline between the paragraph and the bullet list is important to it rendering correctly!

Copy link
Contributor Author

@eccabay eccabay May 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works perfectly, thanks!

@eccabay eccabay merged commit fdae537 into master May 28, 2020
2 checks passed
@dsherry dsherry deleted the 792_precision_recall_curve branch Oct 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add precision/recall curve for binary classification
2 participants