Skip to content

Conversation

@chukarsten
Copy link
Contributor

Pull Request Description

First crack at adding KNN classifier(/regressor) to the baseline.


After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

@alteryx alteryx deleted a comment from CLAassistant Jan 6, 2021
@chukarsten chukarsten force-pushed the 1640-knn_classifier branch from 4d8deac to cdefc9f Compare January 6, 2021 20:40
@CLAassistant
Copy link

CLAassistant commented Jan 6, 2021

CLA assistant check
All committers have signed the CLA.

@codecov
Copy link

codecov bot commented Jan 6, 2021

Codecov Report

Merging #1650 (5003d25) into main (f727942) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@            Coverage Diff            @@
##             main    #1650     +/-   ##
=========================================
+ Coverage   100.0%   100.0%   +0.1%     
=========================================
  Files         240      242      +2     
  Lines       18853    18942     +89     
=========================================
+ Hits        18845    18934     +89     
  Misses          8        8             
Impacted Files Coverage Δ
evalml/pipelines/__init__.py 100.0% <ø> (ø)
evalml/pipelines/components/__init__.py 100.0% <ø> (ø)
evalml/pipelines/components/estimators/__init__.py 100.0% <ø> (ø)
...valml/pipelines/components/estimators/estimator.py 100.0% <ø> (ø)
...alml/tests/model_family_tests/test_model_family.py 100.0% <ø> (ø)
evalml/utils/gen_utils.py 100.0% <ø> (ø)
evalml/model_family/model_family.py 100.0% <100.0%> (ø)
...ines/components/estimators/classifiers/__init__.py 100.0% <100.0%> (ø)
...ts/estimators/classifiers/kneighbors_classifier.py 100.0% <100.0%> (ø)
evalml/tests/automl_tests/test_automl.py 100.0% <100.0%> (ø)
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f727942...5003d25. Read the comment docs.

@chukarsten chukarsten self-assigned this Jan 8, 2021
@chukarsten chukarsten force-pushed the 1640-knn_classifier branch from bc458cd to 7f3b1c3 Compare January 8, 2021 17:44
@chukarsten chukarsten marked this pull request as ready for review January 13, 2021 21:06
Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chukarsten This looks good to me! I want to resolve the discussion on prediction explanations before merging though.

Also, we typically exclude estimators from AutoMLSearch until we run the performance tests on them. We do this by adding the class name to _not_used_in_automl in gen_utils.py. I think we should follow the same pattern here but what do you think @dsherry ?

assert msg in caplog.text


@patch('evalml.pipelines.BinaryClassificationPipeline.score', return_value={"Log Loss Binary": 0.8})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come splitting this test into many tests fixes the crashing worker problem?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dreaded windows test bug D:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That I'm not sure. We can't say conclusively that it does. It fixed it "this time", but as @bchen1116 knows, sometimes just modifying something seems to affect the presence of the crashing worker. We were discussing maybe having a nightly testing of main to see if main, outside of tests via merging, exhibits this crash going forward. But really I guess this will be a question of whether we accept the additional lines due to splitting the test in the hopes of fixing the bug going forward, or not.

Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments. Don't forget to add the KNN classifier to the api_references.rst file and make sure the corresponding docs are updated to include the classifier.

Otherwise, agree with @freddyaboulton that this should be excluded from AutoMLSearch until perf tests are run.

assert msg in caplog.text


@patch('evalml.pipelines.BinaryClassificationPipeline.score', return_value={"Log Loss Binary": 0.8})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dreaded windows test bug D:

Copy link
Contributor

@angela97lin angela97lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with what's already been said about perf testing and adding to AutoML, but otherwise just left some comments for cleanup! 😁

Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one note, but looks good to me!

One thing I want to bring to attention would be the possibility of standardizing the names. I brought up the idea to change the string name to KNN Classifier, but I see the value of keeping the naming in line with how SKLearn names it, which was KNeighborsClassifier. I think we should make these names the same, like we do for all other components (ie class LogisticRegressionClassifier with name Logistic Regression Classifier). On this note, I think it might make sense to change the name to K Neighbors Classifier (no dash), but let me know what you think!

@chukarsten chukarsten merged commit 7fc9783 into main Jan 19, 2021
@chukarsten chukarsten deleted the 1640-knn_classifier branch January 19, 2021 18:21
@bchen1116 bchen1116 mentioned this pull request Jan 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants