quantization error (theoretical question) #36

lachhebo · 2019-07-12T07:31:53Z

I have a question about the interpretability of the quantization error.

How can we know that the SOM is reliable ? does the quantization error need to be lower than a certain value ?

For exemple, in my case, i have a quantization errror of 7.0 which is quite high in comparison to the exemple given in the documentation. Does that mean my som is not reliable ?

JustGlowing · 2019-07-12T08:08:31Z

hi @lachhebo, the quantization error simply tells you how much information you lose in case that you quantize your data with the SOM. Just to give you an idea, If the quantization error is 0 the weights of your network are exactly as the original data. To know if the SOM is reliable, you have to test it for your specific application.

lachhebo · 2019-07-12T08:39:58Z

In my case, i'm trying to assess the number of cluster in a dataset.

What I'm thinking to do is to separate my dataset in two : train and test.
Then train my som on the training dataset optimising the quantization error.
Eventually, i would compare the distance map of my som to the activation frequencies of the testing dataset.

Do you think it is the way to go to get the reliable as possible som ?

JustGlowing · 2019-07-12T08:42:01Z

Is your data labeled?

lachhebo · 2019-07-12T08:45:54Z

Yes, it is

JustGlowing · 2019-07-12T08:55:43Z

Then you have can compare the clusters you obtain with your labels.

lachhebo · 2019-07-12T09:20:47Z

I can, but i'm more interested on the internal validity of my clusters.

My plan is to use the clustering operated by the SOM as a way to assess the number of clusters and maybe to use this unsupervised clustering in a supervised model.

JustGlowing · 2019-07-12T09:28:58Z

Then you can use a cluster quality measure. There are many, this is an example: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html

lachhebo · 2019-07-12T09:40:37Z

IMHO, directly use the silhouette score on the clustering operated by the som is not pertinent as many nodes are next to each other, hence the silhouette score will be low. The correct number of clusters is probably inferior to the number of nodes.

JustGlowing · 2019-07-12T09:48:21Z

It depends on how you derive your clusters, I usually recommend to give to use small maps and assume that each position in the map gives you a cluster. For example, a 2-by-2 map will give you 4 clusters. This way the silhouette score is suitable.

lachhebo · 2019-07-12T09:57:25Z

It will work but i will get a higher quantization error and simpler algorithm like Affinity propagation will probably as well in this case.

I think it's better to user a bigger map with a lower quantization error and then try to interpret the distance map and see if it is reliable.

JustGlowing · 2019-07-12T10:05:58Z

Of course, thanks for using Minisom. Leave a star if you like it!

lachhebo · 2019-07-12T10:07:21Z

Thanks for your time and your work, it is a great package and i already starred it !

JustGlowing · 2019-07-12T10:15:44Z

Anyway, to go back to your initial question. You need to tune the SOM to have the quantization error that you desire. More clusters means lower quantization error. The best solution only depends in how many clusters there's in your data.

JustGlowing added the question label Jul 12, 2019

JustGlowing closed this as completed Jul 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quantization error (theoretical question) #36

quantization error (theoretical question) #36

lachhebo commented Jul 12, 2019

JustGlowing commented Jul 12, 2019

lachhebo commented Jul 12, 2019

JustGlowing commented Jul 12, 2019

lachhebo commented Jul 12, 2019

JustGlowing commented Jul 12, 2019

lachhebo commented Jul 12, 2019

JustGlowing commented Jul 12, 2019

lachhebo commented Jul 12, 2019

JustGlowing commented Jul 12, 2019

lachhebo commented Jul 12, 2019

JustGlowing commented Jul 12, 2019

lachhebo commented Jul 12, 2019 •

edited

JustGlowing commented Jul 12, 2019

quantization error (theoretical question) #36

quantization error (theoretical question) #36

Comments

lachhebo commented Jul 12, 2019

JustGlowing commented Jul 12, 2019

lachhebo commented Jul 12, 2019

JustGlowing commented Jul 12, 2019

lachhebo commented Jul 12, 2019

JustGlowing commented Jul 12, 2019

lachhebo commented Jul 12, 2019

JustGlowing commented Jul 12, 2019

lachhebo commented Jul 12, 2019

JustGlowing commented Jul 12, 2019

lachhebo commented Jul 12, 2019

JustGlowing commented Jul 12, 2019

lachhebo commented Jul 12, 2019 • edited

JustGlowing commented Jul 12, 2019

lachhebo commented Jul 12, 2019 •

edited