Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Redesign KNN classifier and regressor #39

Closed
TonyBagnall opened this issue Feb 8, 2023 · 1 comment · Fixed by #66
Closed

[ENH] Redesign KNN classifier and regressor #39

TonyBagnall opened this issue Feb 8, 2023 · 1 comment · Fixed by #66
Labels
classification Classification package enhancement New feature, improvement request or other non-bug code enhancement regression Regression package

Comments

@TonyBagnall
Copy link
Contributor

The KNN Regressor and Classifier were redesigned to replace the inheritance model with containment. The algorithms now always work out the pairwise distance for the train data, which is then never used, since the algorithms that can exploit the triangle inequality to find neighbours are only usable with hard coded scikit-learn distances, not our elastic distances. Any option other than algorithm="brute" breaks the current version. This was all done against my opinion, but I lost the energy to fight it.

They also tightly couple to distances with a string list, rather than use the distance factory as before. This is bad practice, since the addition of new distances will also require a change to the classifier/regressor. Better to use that.

I will dig a little deeper into the scikit algorithm, but my preference would be to abandon the scikit version all together and just implement knn.

@TonyBagnall TonyBagnall added enhancement New feature, improvement request or other non-bug code enhancement classification Classification package regression Regression package labels Feb 8, 2023
@TonyBagnall
Copy link
Contributor Author

just been looking, and one reason for our memory leak is that the current version always allocates and extra unused O(n^2) memory in fit, which is grossly inefficient and explains why we were running out of memory for larger problems

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
classification Classification package enhancement New feature, improvement request or other non-bug code enhancement regression Regression package
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant