New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
quantization error (theoretical question) #36
Comments
hi @lachhebo, the quantization error simply tells you how much information you lose in case that you quantize your data with the SOM. Just to give you an idea, If the quantization error is 0 the weights of your network are exactly as the original data. To know if the SOM is reliable, you have to test it for your specific application. |
In my case, i'm trying to assess the number of cluster in a dataset. What I'm thinking to do is to separate my dataset in two : train and test. Do you think it is the way to go to get the reliable as possible som ? |
Is your data labeled? |
Yes, it is |
Then you have can compare the clusters you obtain with your labels. |
I can, but i'm more interested on the internal validity of my clusters. My plan is to use the clustering operated by the SOM as a way to assess the number of clusters and maybe to use this unsupervised clustering in a supervised model. |
Then you can use a cluster quality measure. There are many, this is an example: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html |
IMHO, directly use the silhouette score on the clustering operated by the som is not pertinent as many nodes are next to each other, hence the silhouette score will be low. The correct number of clusters is probably inferior to the number of nodes. |
It depends on how you derive your clusters, I usually recommend to give to use small maps and assume that each position in the map gives you a cluster. For example, a 2-by-2 map will give you 4 clusters. This way the silhouette score is suitable. |
It will work but i will get a higher quantization error and simpler algorithm like Affinity propagation will probably as well in this case. I think it's better to user a bigger map with a lower quantization error and then try to interpret the distance map and see if it is reliable. |
Of course, thanks for using Minisom. Leave a star if you like it! |
Thanks for your time and your work, it is a great package and i already starred it ! |
Anyway, to go back to your initial question. You need to tune the SOM to have the quantization error that you desire. More clusters means lower quantization error. The best solution only depends in how many clusters there's in your data. |
I have a question about the interpretability of the quantization error.
How can we know that the SOM is reliable ? does the quantization error need to be lower than a certain value ?
For exemple, in my case, i have a quantization errror of 7.0 which is quite high in comparison to the exemple given in the documentation. Does that mean my som is not reliable ?
The text was updated successfully, but these errors were encountered: