-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generalize KNNRegressor to multitarget case #328
Merged
Merged
Changes from 16 commits
Commits
Show all changes
23 commits
Select commit
Hold shift + click to select a range
63004f7
Merge pull request #313 from alan-turing-institute/dev
ablaom 67e775a
Merge pull request #316 from alan-turing-institute/dev
ablaom eb59c99
Merge pull request #320 from alan-turing-institute/dev
ablaom e07b3d5
Merge pull request #324 from alan-turing-institute/dev
ablaom b02cede
Merge pull request #326 from alan-turing-institute/dev
ablaom a4c8610
support multivariate kNN regression
mateuszbaran 0b8c3bd
updated target of kNN regressor
mateuszbaran 865a352
changing target of KNNRegressor
mateuszbaran 3062a05
target of kNN regressor again
mateuszbaran e52ea0a
trying to make the multi-target kNN regressor work with tables
mateuszbaran a794ade
fixing kNN regressor
mateuszbaran 6f28040
code review fixes
mateuszbaran 9bc0dfc
update model registry
ablaom 16ac6bf
update registry again
ablaom e7d5853
fix check_registry issue
ablaom 46fea19
Update NearestNeighbors.jl
OkonSamuel d36948e
fix wrong call signature
OkonSamuel 5325d5b
Update NearestNeighbors.jl
OkonSamuel a229988
replace `Tables.schema` with `MMI.schema`
OkonSamuel a1a9ca2
Update NearestNeighbors.jl
OkonSamuel ea2d9de
Update NearestNeighbors.jl
OkonSamuel dedcaee
Merge branch 'dev' of https://github.com/alan-turing-institute/MLJMod…
ablaom c89f593
Merge branch 'dev' into multiple-regression-knn2
ablaom File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having trouble understanding the formula in 147 (which predates this PR). @tlienart, can you recall how this was deduced or where it came from?
Naively, it seems to me that we want to write the prediction as a weighted sum of the target values, where the kth weight is simultaneously proportional to the prescribed sample weight
w_[k]
and inversely proportional to the distancedist_[k]
. That is,prediction[i,:] = sum(w_ ./ dist_ .* values) / c
wherec
is a normalisation constant, chosen such that the weights (coefficients ofvalues
in the sum) add up to one:c = sum(w_ ./ dist_)
. However, I can't reconcile this with the given formula. Am I missing something?cc @mateuszbaran @OkonSamuel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah had the same thoughts when i went through.
I would propose avoid passing weights to
fit
. i.e settingsupports_weights
trait to false since the weights needed forknn
models are not per_observation weights but per_neighbor weights,So we should stick to using weights derived from
weights
parameter passed duringknn
model construction. (i.e :uniform, :distance)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see anything wrong with mixing per sample weights with an inverse square law for the "neighbour" weights (if its done in a meaningful way). Also, presently, these two KNN models are one of the few models that support sample weights and are therefore used for testing 🙌
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Numerator in this formula does make sense to me, multiplying by
1-dist[i]/sum(dist)
should behave numerically better when distance to one of the neighbors is close to 0. The normalization constant in the denominator looks wrong though. Here is a comparison of different distance-based weights: https://perun.pmf.uns.ac.rs/radovanovic/publications/2016-kais-knn-weighting.pdf .There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mateuszbaran Thanks for pointing out the paper. Worth noting the evaluation there is for classifiers making point predictions (the mli classifier is probabilistic and so needs normalisation not needed there) and the testing was restricted to time-series applications. And the current PR is about regression. That said the paper nicely summarises a number of weighting schemes that probably covers cases in common use for both probabilistic classification and deterministic regression (mlj cases).
(Interestingly, I don't see the
1 - dist[i]/sum(dist)
in the paper, although maybe it's a special case of Macleod ??).It would be nice to implement them all and cite the paper for the definitions. But I would deem that out of scope of the current PR.
For the record sk-learn implements
1/dist[i]
(with no epson-smoothing) and uniform, but also allows a custom function.I propose we keep the current
1 - dist[i]/sum(dist)
weight (and the1/dist[i]
weight currently used for the classifier) and do the normalisation post facto, as we do for classification. (Idon't believe there's a more efficient way, if we are mixing in sample-weights.) We'll view this a a "bug fix" (patch release) and open a new issue to generalize and get consistency with regression and classification - which will be breaking. We can do that at the same time as migrating the interface to its own package (#244).
@OkonSamuel @mateuszbaran Happy with this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ablaom. This can be merged for now. Further changes would be made when migrating to
MLJNearestNeighborsInterface.jl
.#331There we may add more weighting schemes as kernels
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I'll just merge this as is, despite the mysterious normalization, and we'll fix that when we migrate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that paper isn't about kNN regression but still it nicely collects many weighting functions that can be used here as well.
Your plan sounds great 👍 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your help with the PR; I realise it was a bit more involved than you probably thought it would be 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem! Also thank you for your quick help in getting this done. I learned quite a bit and I think it was worth it.