Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best practices for using the quantization error of very large datasets #53

Closed
TimNagle-McNaughton opened this issue Nov 29, 2019 · 3 comments

Comments

@TimNagle-McNaughton
Copy link

TimNagle-McNaughton commented Nov 29, 2019

I'm hoping to apply minisom to a few very large datasets. These are on the order of 10e6 - 50e6 instances with 6 dimensions. The data are normalized, and are all floats.

The training time is no problem, typically <10min. However, I'm implementing a function to optimize sigma and the learning rate by minimizing the quantization error, but at present, the calculation time for the quantization error is so long (>24hrs) that an iterative approach is not feasible.

If there is anything I can do to improve the calculation time for the quantization error that would be great, otherwise I may have to use a different metric.


I should note that I am using map sizes of sqrt(5*sqrt(instances)) as recommended in: Rojas, Ignacio, Gonzalo Joya, and Andreu Catala, eds. Advances in Computational Intelligence: 13th International Work-Conference on Artificial Neural Networks, IWANN 2015, Palma de Mallorca, Spain, June 10-12, 2015. Proceedings. Vol. 9094. Springer, 2015.

So for a dataset with 7,069,696 instances the map is 115 x 115 (13225 nodes).

@JustGlowing
Copy link
Owner

JustGlowing commented Nov 30, 2019 via email

@TimNagle-McNaughton
Copy link
Author

Cheers, my hero.

@JustGlowing
Copy link
Owner

@TimNagle-McNaughton in the latest version of MiniSom the computation of the quantization error is much faster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants