Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize KNNRegressor to multitarget case #328

Merged
merged 23 commits into from
Oct 27, 2020
Merged

Conversation

ablaom
Copy link
Member

@ablaom ablaom commented Oct 22, 2020

Continuation of #327

@OkonSamuel
Copy link
Member

@ablaom could you hold on merging this while i review this PR

src/NearestNeighbors.jl Outdated Show resolved Hide resolved
src/NearestNeighbors.jl Outdated Show resolved Hide resolved
@ablaom
Copy link
Member Author

ablaom commented Oct 22, 2020

@OkonSamuel I think those suggestions are fair. Are you happy to push the relevant changes to save @mateuszbaran opening a new PR?

@OkonSamuel
Copy link
Member

@ablaom Ok

@ablaom
Copy link
Member Author

ablaom commented Oct 22, 2020

@OkonSamuel Thanks for that!

Looking more carefully, it seems the testing around weights for regression is a bit light. I'm going to look into this, unless you want to do it?

if m.weights == :uniform
preds[i,:] .= sum(values .* w_) / sum(w_)
else
preds[i,:] .= sum(values .* w_ .* (1.0 .- dists_ ./ sum(dists_))) / (sum(w_) - 1)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having trouble understanding the formula in 147 (which predates this PR). @tlienart, can you recall how this was deduced or where it came from?

Naively, it seems to me that we want to write the prediction as a weighted sum of the target values, where the kth weight is simultaneously proportional to the prescribed sample weight w_[k] and inversely proportional to the distance dist_[k]. That is, prediction[i,:] = sum(w_ ./ dist_ .* values) / c where c is a normalisation constant, chosen such that the weights (coefficients of values in the sum) add up to one: c = sum(w_ ./ dist_). However, I can't reconcile this with the given formula. Am I missing something?

cc @mateuszbaran @OkonSamuel

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah had the same thoughts when i went through.
I would propose avoid passing weights to fit. i.e setting supports_weights trait to false since the weights needed for knn models are not per_observation weights but per_neighbor weights,
So we should stick to using weights derived from weights parameter passed during knn model construction. (i.e :uniform, :distance)

Copy link
Member Author

@ablaom ablaom Oct 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah had the same thoughts when i went through.
I would propose avoid passing weights to fit. i.e setting supports_weights trait to false since the weights

I don't see anything wrong with mixing per sample weights with an inverse square law for the "neighbour" weights (if its done in a meaningful way). Also, presently, these two KNN models are one of the few models that support sample weights and are therefore used for testing 🙌

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Numerator in this formula does make sense to me, multiplying by 1-dist[i]/sum(dist) should behave numerically better when distance to one of the neighbors is close to 0. The normalization constant in the denominator looks wrong though. Here is a comparison of different distance-based weights: https://perun.pmf.uns.ac.rs/radovanovic/publications/2016-kais-knn-weighting.pdf .

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mateuszbaran Thanks for pointing out the paper. Worth noting the evaluation there is for classifiers making point predictions (the mli classifier is probabilistic and so needs normalisation not needed there) and the testing was restricted to time-series applications. And the current PR is about regression. That said the paper nicely summarises a number of weighting schemes that probably covers cases in common use for both probabilistic classification and deterministic regression (mlj cases).

(Interestingly, I don't see the 1 - dist[i]/sum(dist) in the paper, although maybe it's a special case of Macleod ??).

It would be nice to implement them all and cite the paper for the definitions. But I would deem that out of scope of the current PR.

For the record sk-learn implements 1/dist[i] (with no epson-smoothing) and uniform, but also allows a custom function.

I propose we keep the current 1 - dist[i]/sum(dist) weight (and the 1/dist[i] weight currently used for the classifier) and do the normalisation post facto, as we do for classification. (I
don't believe there's a more efficient way, if we are mixing in sample-weights.) We'll view this a a "bug fix" (patch release) and open a new issue to generalize and get consistency with regression and classification - which will be breaking. We can do that at the same time as migrating the interface to its own package (#244).

@OkonSamuel @mateuszbaran Happy with this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ablaom. This can be merged for now. Further changes would be made when migrating to MLJNearestNeighborsInterface.jl.#331
There we may add more weighting schemes as kernels

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll just merge this as is, despite the mysterious normalization, and we'll fix that when we migrate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that paper isn't about kNN regression but still it nicely collects many weighting functions that can be used here as well.

Your plan sounds great 👍 .

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your help with the PR; I realise it was a bit more involved than you probably thought it would be 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem! Also thank you for your quick help in getting this done. I learned quite a bit and I think it was worth it.

@OkonSamuel OkonSamuel self-assigned this Oct 26, 2020
@OkonSamuel OkonSamuel removed their assignment Oct 27, 2020
@ablaom ablaom merged commit 2791e74 into dev Oct 27, 2020
@ablaom ablaom mentioned this pull request Oct 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants