Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #64: added n_init to kmeans #78

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

lbollar
Copy link

@lbollar lbollar commented Sep 30, 2016

Sorry, I am still figuring out the pull request process and think I accidentally closed last one. Here is new request with suggested cleanups.

Apologies.

EDIT: fixes #64, original PR #77

src/kmeans.jl Outdated
maxiter=maxiter,
tol=tol,
display=display)
n_init > 0 || throw(ArgumentError("n_init must be greater than 0"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything is still being indented with tabs. You can do a find-and-replace to change everything at once, if you wish; I do it in Vim like

:%s/\t/    /g

@ararslan
Copy link
Member

If you accidentally close a pull request, you can just click "reopen," no need to open a new one.

src/kmeans.jl Outdated
@@ -72,6 +92,8 @@ function _kmeans!{T<:AbstractFloat}(
tol::Real, # in: tolerance of change at convergence
displevel::Int) # in: the level of display



Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the excess whitespace here and above. The change is unrelated to the PR.

src/kmeans.jl Outdated
end

return bestresult

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the line breaks on lines 60, 73, and 75 for style consistency with the rest of the code.

src/kmeans.jl Outdated
tol::Real=_kmeans_default_tol,
display::Symbol=_kmeans_default_display)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another unrelated whitespace change

@ararslan
Copy link
Member

ararslan commented Oct 5, 2016

Sorry, I realize the whitespace pedantry is probably annoying. Once my outstanding style comments are addressed I think this will be good to go. Thanks for the contribution!

@lbollar
Copy link
Author

lbollar commented Oct 5, 2016

No I completely apologize for these small things that I didn't take care to consider. Thanks for helping me learn about this process. It is preparing me for future commits.

src/kmeans.jl Outdated
@@ -71,7 +87,7 @@ function _kmeans!{T<:AbstractFloat}(
maxiter::Int, # in: maximum number of iterations
tol::Real, # in: tolerance of change at convergence
displevel::Int) # in: the level of display

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trailing whitespace. I don't know what editor you use, but I think Atom trims trailing whitespace by default, and in Vim you can do :%s/\s\+$//g to remove it.

src/kmeans.jl Outdated
tol::Real=_kmeans_default_tol,
display::Symbol=_kmeans_default_display)


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last remaining extraneous newline.

@ararslan
Copy link
Member

ararslan commented Oct 5, 2016

This looks great to me, though I'd like a second pair of eyes on the implementation before it's merged. @KristofferC?

@kmsquire
Copy link
Contributor

This is rather old, but Clustering.jl hasn't had any active maintainers in a while. I just fixed the merge conflicts. This seems like a nice addition, but the documentation needs to be updated to reflect the interface change. @lbollar, are you still available to update this?

@lbollar
Copy link
Author

lbollar commented Aug 26, 2017 via email

@@ -44,20 +45,34 @@ function kmeans{T<:AbstractFloat}(X::Matrix{T}, k::Int;
weights=nothing,
init=_kmeans_default_init,
maxiter::Integer=_kmeans_default_maxiter,
n_init::Integer=_kmeans_default_n_init,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that n_init comes from Python's sklearn (#64), but it doesn't sound like a best choice for me.
Maybe something like n_tries to reflect that the parameter defines how many times the algorithm, rather than some initialization procedure, is run?

Copy link
Contributor

@wildart wildart Sep 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or ntries? And wouldn't be an overkill to run 10 times? I recommend default value 1, because usually a quick partitioning is required and not necessarily best one. And, if one needs to find a best clustering, this parameter can be set to larger value explicitly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 is what sklearn does at it sounds reasonable to me.
It isn't unusual to run 1000s of times, (that was done as the baseline for the affinity propagation paper)
If some need a quick partition they can ask for it.

The default shouldn't be so sensitive to random factors.

I think 10 strikes the right balance.
Though I could see argument for 3 or 30

@oxinabox
Copy link
Contributor

Any interest in merging this sometime?

@alyst
Copy link
Member

alyst commented Oct 3, 2018

I think it's worth merging. I'd rename the parameter to ntries. The default (10) seems reasonable given that the method is quite fast.
Obviously, the PR has to be updated to 1.0 (@compat), and the param docs added to kmeans.rst.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Nondetermanistic methods should take a n_init parameter, for how many times to run
6 participants