Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial KNN classifier add. #1650

Merged
merged 29 commits into from Jan 19, 2021
Merged

Initial KNN classifier add. #1650

merged 29 commits into from Jan 19, 2021

Conversation

chukarsten
Copy link
Collaborator

Pull Request Description

First crack at adding KNN classifier(/regressor) to the baseline.


After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

@CLAassistant
Copy link

CLAassistant commented Jan 6, 2021

CLA assistant check
All committers have signed the CLA.

@codecov
Copy link

codecov bot commented Jan 6, 2021

Codecov Report

Merging #1650 (5003d25) into main (f727942) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@            Coverage Diff            @@
##             main    #1650     +/-   ##
=========================================
+ Coverage   100.0%   100.0%   +0.1%     
=========================================
  Files         240      242      +2     
  Lines       18853    18942     +89     
=========================================
+ Hits        18845    18934     +89     
  Misses          8        8             
Impacted Files Coverage Δ
evalml/pipelines/__init__.py 100.0% <ø> (ø)
evalml/pipelines/components/__init__.py 100.0% <ø> (ø)
evalml/pipelines/components/estimators/__init__.py 100.0% <ø> (ø)
...valml/pipelines/components/estimators/estimator.py 100.0% <ø> (ø)
...alml/tests/model_family_tests/test_model_family.py 100.0% <ø> (ø)
evalml/utils/gen_utils.py 100.0% <ø> (ø)
evalml/model_family/model_family.py 100.0% <100.0%> (ø)
...ines/components/estimators/classifiers/__init__.py 100.0% <100.0%> (ø)
...ts/estimators/classifiers/kneighbors_classifier.py 100.0% <100.0%> (ø)
evalml/tests/automl_tests/test_automl.py 100.0% <100.0%> (ø)
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f727942...5003d25. Read the comment docs.

@chukarsten chukarsten self-assigned this Jan 8, 2021
@chukarsten chukarsten marked this pull request as ready for review January 13, 2021 21:06
Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chukarsten This looks good to me! I want to resolve the discussion on prediction explanations before merging though.

Also, we typically exclude estimators from AutoMLSearch until we run the performance tests on them. We do this by adding the class name to _not_used_in_automl in gen_utils.py. I think we should follow the same pattern here but what do you think @dsherry ?

X, y = X_y_binary
msg = 'all your model are belong to us'
mock_fit.side_effect = Exception(msg)
automl = AutoMLSearch(X_train=X, y_train=y, problem_type="binary", error_callback=None, train_best_pipeline=False, n_jobs=1)
automl.search()
assert msg in caplog.text


@patch('evalml.pipelines.BinaryClassificationPipeline.score', return_value={"Log Loss Binary": 0.8})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come splitting this test into many tests fixes the crashing worker problem?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dreaded windows test bug D:

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That I'm not sure. We can't say conclusively that it does. It fixed it "this time", but as @bchen1116 knows, sometimes just modifying something seems to affect the presence of the crashing worker. We were discussing maybe having a nightly testing of main to see if main, outside of tests via merging, exhibits this crash going forward. But really I guess this will be a question of whether we accept the additional lines due to splitting the test in the hopes of fixing the bug going forward, or not.

Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments. Don't forget to add the KNN classifier to the api_references.rst file and make sure the corresponding docs are updated to include the classifier.

Otherwise, agree with @freddyaboulton that this should be excluded from AutoMLSearch until perf tests are run.

docs/source/release_notes.rst Outdated Show resolved Hide resolved
X, y = X_y_binary
msg = 'all your model are belong to us'
mock_fit.side_effect = Exception(msg)
automl = AutoMLSearch(X_train=X, y_train=y, problem_type="binary", error_callback=None, train_best_pipeline=False, n_jobs=1)
automl.search()
assert msg in caplog.text


@patch('evalml.pipelines.BinaryClassificationPipeline.score', return_value={"Log Loss Binary": 0.8})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dreaded windows test bug D:

Copy link
Contributor

@angela97lin angela97lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with what's already been said about perf testing and adding to AutoML, but otherwise just left some comments for cleanup! 😁

Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one note, but looks good to me!

One thing I want to bring to attention would be the possibility of standardizing the names. I brought up the idea to change the string name to KNN Classifier, but I see the value of keeping the naming in line with how SKLearn names it, which was KNeighborsClassifier. I think we should make these names the same, like we do for all other components (ie class LogisticRegressionClassifier with name Logistic Regression Classifier). On this note, I think it might make sense to change the name to K Neighbors Classifier (no dash), but let me know what you think!

@chukarsten chukarsten merged commit 7fc9783 into main Jan 19, 2021
@chukarsten chukarsten deleted the 1640-knn_classifier branch January 19, 2021 18:21
@bchen1116 bchen1116 mentioned this pull request Jan 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants