Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove calculating plot metrics from AutoML #615

Merged
merged 6 commits into from
Apr 11, 2020

Conversation

angela97lin
Copy link
Contributor

As part of resolving #607, removing calculating plot metrics from AutoML.

Note that this will currently break all calls to plotting methods!

@angela97lin angela97lin changed the base branch from master to improved_objectives April 10, 2020 17:23
@codecov
Copy link

codecov bot commented Apr 10, 2020

Codecov Report

Merging #615 into improved_objectives will increase coverage by 0.08%.
The diff coverage is 100.00%.

Impacted file tree graph

@@                   Coverage Diff                   @@
##           improved_objectives     #615      +/-   ##
=======================================================
+ Coverage                98.81%   98.89%   +0.08%     
=======================================================
  Files                      130      131       +1     
  Lines                     4713     4452     -261     
=======================================================
- Hits                      4657     4403     -254     
+ Misses                      56       49       -7     
Impacted Files Coverage Δ
evalml/automl/auto_base.py 96.02% <ø> (-0.02%) ⬇️
evalml/pipelines/binary_classification_pipeline.py 100.00% <ø> (ø)
evalml/pipelines/pipeline_base.py 98.22% <ø> (+1.48%) ⬆️
evalml/tests/automl_tests/test_autobase.py 100.00% <ø> (ø)
evalml/automl/pipeline_search_plots.py 100.00% <100.00%> (ø)
...l/tests/automl_tests/test_pipeline_search_plots.py 100.00% <100.00%> (+2.68%) ⬆️
evalml/tests/objective_tests/test_plot_metrics.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c3b45dd...318d7f8. Read the comment docs.

@angela97lin angela97lin requested a review from dsherry April 10, 2020 18:31
@dsherry dsherry added enhancement An improvement to an existing feature. performance Issues tracking performance improvements. labels Apr 10, 2020
fpr, tpr, thresholds = roc_metric.score(y_predict_proba, y_true)
assert not np.isnan(fpr).any()
assert not np.isnan(tpr).any()
assert not np.isnan(thresholds).any()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this was to satisfy codecov?

There's a lot of negative testing we should do here: try passing in some nans/infs, all 0s/all 1s, probabilities outside the range 0 to 1, etc. To keep things moving, rather than doing it now, let's just include that in the issue we have to file to add ROC/confusion back in once this is on master.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, because we removed calls to calculating confusion matrix and ROC so I added this test to satisfy codecov. I guess we had this talk about adding accuracy as well, but I think we should better scope out what exactly what type of test coverage we want to provide for these metrics (NaNs, inf, etc) and update that for all of our metrics then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, for sure. We could even have a test which tries certain input on all classification objectives and other input on all regression objectives, and then does so with varieties of nans and infs and stuff.

However, ROC and classification aren't objectives, they're plot data util methods, so they don't need to get identical treatment, and shouldn't be grouped in with the objectives.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, the way I was thinking is so pre-improved-objectives :P

@dsherry
Copy link
Contributor

dsherry commented Apr 10, 2020

@angela97lin you missed one spot in the docs. From readthedocs:

WARNING: [autosummary] failed to import 'evalml.AutoClassificationSearch.plot.generate_confusion_matrix': no module named evalml.AutoClassificationSearch.plot.generate_confusion_matrix
WARNING: [autosummary] failed to import 'evalml.AutoClassificationSearch.plot.generate_roc_plot': no module named evalml.AutoClassificationSearch.plot.generate_roc_plot
WARNING: [autosummary] failed to import 'evalml.AutoClassificationSearch.plot.get_confusion_matrix_data': no module named evalml.AutoClassificationSearch.plot.get_confusion_matrix_data
WARNING: [autosummary] failed to import 'evalml.AutoClassificationSearch.plot.get_roc_data': no module named evalml.AutoClassificationSearch.plot.get_roc_data

git grep "generate_confusion_matrix" shows:

docs/source/api_reference.rst:58:    AutoClassificationSearch.plot.generate_confusion_matrix

We gotta find a way to have warnings fail the docs checkin test! (#586)

Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Once you fix the readthedocs warnings I just commented about, good to merge.

Please file an issue to add ROC and confusion matrix back in, so we can do that next week.

Here's how long the unit tests took to run on linux python 3.8 for each relevant branch:
master: 3m57s
improved_objectives: 7m22s 💣
this branch: 4m40s

Muuuch better! 😄

@angela97lin
Copy link
Contributor Author

Wow, thanks for catching the warnings! Filed #619, #620, #621 to address each of your comments and merging now!

@angela97lin angela97lin merged commit a65ab45 into improved_objectives Apr 11, 2020
angela97lin added a commit that referenced this pull request Apr 13, 2020
* Objectives API: Create new binary / multiclass pipeline classes and remove objectives from pipeline classes (#405)

* Objectives API: Remove ROC and confusion matrix as objectives (#422)

* Change `score` output to return one dictionary (#429)

* Create binary and multiclass objective classes  (#504)

* Update dependencies  (#412)

* Hide features with zero importance in plot by default (#413)

* Update dependencies check: package whitelist (#417)

* Add fixes necessary for docs to build for improved objectives project (#605)

* Remove calculating plot metrics from AutoML  (#615)
@angela97lin angela97lin deleted the 607_remove_plot_metrics branch April 17, 2020 18:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement An improvement to an existing feature. performance Issues tracking performance improvements.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants