New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
G-Means: Setting maximum number of clusters like for X-Means #602
Comments
Hi @tschechlovdev , The changes are introduced and available on |
Hi @annoviko, thank you for your fast support! That helps me a lot! However, I think it does not work 100% correct, I am specifying for some dataset k_max=200, but getting results with I am running the code from pyclustering.cluster.gmeans import gmeans
import numpy as np
# data is here a synthetically generated dataset with make_blobs from sklearn
# https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_blobs.html
gmeans_instance = gmeans(data, k_max=200)
labels = gmeans_instance.process().predict(data)
predicted_k = len(numpy.unique(labels))
print("Number of clusters: {}".format(predicted_k)) I am also a bit concerned because I am running gmeans on synthetic datasets with gaussian distributions that have for example 25 clusters using the make_blobs function from sklearn. Actually, I thought gmeans should be well suited for these datasets. Yet, gmeans is always running into k_max (or respectively a higher value) for me. Do you have any idea why this happens? |
Hi, @tschechlovdev But, I agree this is a confusing behavior. I will interrupt the statistical optimization as well. |
@tschechlovdev ,
Output:
Because currently you perform cluster analysis of your data twice. |
Ah, I see that makes sense.
I see. That's why I get different results when trying multiple times. However, I tried with different datasets and it seems that GMeans has a problem with high dimensional (d>25) datasets. For these, it always runs into k_max (or respectively higher). For datasets with d < 20 it seems to work fine, but I think that is rather a problem with the algorithm in general.
Thanks, didn't know about that way :) |
@tschechlovdev ,
I think tomorrow I will be able to deliver this changes to |
The changes regarded |
Hi,
I wanted to ask if it is possible to add a k_max parameter to the call to gmeans? So Similar to xmeans, which support this parameter. The reason is that gmeans returns for some datasets a really large number of clusters (sometimes it is even the same of the size of the dataset, which is the worst case).
I do not know the reason behind this, but it would be nice if I could limit the number of clusters as I can do for xmeans.
The text was updated successfully, but these errors were encountered: