Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quantization error (theoretical question) #36

Closed
lachhebo opened this issue Jul 12, 2019 · 13 comments
Closed

quantization error (theoretical question) #36

lachhebo opened this issue Jul 12, 2019 · 13 comments
Labels

Comments

@lachhebo
Copy link

I have a question about the interpretability of the quantization error.

How can we know that the SOM is reliable ? does the quantization error need to be lower than a certain value ?

For exemple, in my case, i have a quantization errror of 7.0 which is quite high in comparison to the exemple given in the documentation. Does that mean my som is not reliable ?

@JustGlowing
Copy link
Owner

hi @lachhebo, the quantization error simply tells you how much information you lose in case that you quantize your data with the SOM. Just to give you an idea, If the quantization error is 0 the weights of your network are exactly as the original data. To know if the SOM is reliable, you have to test it for your specific application.

@lachhebo
Copy link
Author

In my case, i'm trying to assess the number of cluster in a dataset.

What I'm thinking to do is to separate my dataset in two : train and test.
Then train my som on the training dataset optimising the quantization error.
Eventually, i would compare the distance map of my som to the activation frequencies of the testing dataset.

Do you think it is the way to go to get the reliable as possible som ?

@JustGlowing
Copy link
Owner

Is your data labeled?

@lachhebo
Copy link
Author

Yes, it is

@JustGlowing
Copy link
Owner

Then you have can compare the clusters you obtain with your labels.

@lachhebo
Copy link
Author

I can, but i'm more interested on the internal validity of my clusters.

My plan is to use the clustering operated by the SOM as a way to assess the number of clusters and maybe to use this unsupervised clustering in a supervised model.

@JustGlowing
Copy link
Owner

Then you can use a cluster quality measure. There are many, this is an example: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html

@lachhebo
Copy link
Author

IMHO, directly use the silhouette score on the clustering operated by the som is not pertinent as many nodes are next to each other, hence the silhouette score will be low. The correct number of clusters is probably inferior to the number of nodes.

@JustGlowing
Copy link
Owner

It depends on how you derive your clusters, I usually recommend to give to use small maps and assume that each position in the map gives you a cluster. For example, a 2-by-2 map will give you 4 clusters. This way the silhouette score is suitable.

@lachhebo
Copy link
Author

It will work but i will get a higher quantization error and simpler algorithm like Affinity propagation will probably as well in this case.

I think it's better to user a bigger map with a lower quantization error and then try to interpret the distance map and see if it is reliable.

@JustGlowing
Copy link
Owner

Of course, thanks for using Minisom. Leave a star if you like it!

@lachhebo
Copy link
Author

lachhebo commented Jul 12, 2019

Thanks for your time and your work, it is a great package and i already starred it !

@JustGlowing
Copy link
Owner

Anyway, to go back to your initial question. You need to tune the SOM to have the quantization error that you desire. More clusters means lower quantization error. The best solution only depends in how many clusters there's in your data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants