Best practices for using the quantization error of very large datasets #53

TimNagle-McNaughton · 2019-11-29T16:52:40Z

I'm hoping to apply minisom to a few very large datasets. These are on the order of 10e6 - 50e6 instances with 6 dimensions. The data are normalized, and are all floats.

The training time is no problem, typically <10min. However, I'm implementing a function to optimize sigma and the learning rate by minimizing the quantization error, but at present, the calculation time for the quantization error is so long (>24hrs) that an iterative approach is not feasible.

If there is anything I can do to improve the calculation time for the quantization error that would be great, otherwise I may have to use a different metric.

I should note that I am using map sizes of sqrt(5*sqrt(instances)) as recommended in: Rojas, Ignacio, Gonzalo Joya, and Andreu Catala, eds. Advances in Computational Intelligence: 13th International Work-Conference on Artificial Neural Networks, IWANN 2015, Palma de Mallorca, Spain, June 10-12, 2015. Proceedings. Vol. 9094. Springer, 2015.

So for a dataset with 7,069,696 instances the map is 115 x 115 (13225 nodes).

JustGlowing · 2019-11-30T04:49:31Z

Hi Tim, You can compute the quantization error using only a sample of your data. You'll have a good estimation as long as the sample is representative.

…

On Fri, Nov 29, 2019, 4:52 PM Tim Nagle-McNaughton ***@***.***> wrote: I'm hoping to apply minisom to a few very large datasets. These are on the order of 10e6 - 50e6 instances with 6 dimensions. The data are normalized, and are all floats. The training time is no problem, typically <10min. However, I'm implementing a function to optimize sigma and the learning rate by minimizing the quantization error, but at present, the calculation time for the quantization error is so long (>24hrs) that an iterative approach is not feasible. If there is anything I can do to improve the calculation time for the quantization error that would be great, otherwise I may have to use a different metric. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#53?email_source=notifications&email_token=ABFTNGOWJ6LCH7MLXAM6AWDQWFCFTA5CNFSM4JTBYFIKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4H45YFRQ>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABFTNGN3QX2O57DODX2LYQDQWFCFTANCNFSM4JTBYFIA> .

TimNagle-McNaughton · 2019-11-30T05:58:47Z

Cheers, my hero.

JustGlowing · 2019-12-13T09:35:09Z

@TimNagle-McNaughton in the latest version of MiniSom the computation of the quantization error is much faster.

TimNagle-McNaughton closed this as completed Nov 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best practices for using the quantization error of very large datasets #53

Best practices for using the quantization error of very large datasets #53

TimNagle-McNaughton commented Nov 29, 2019 •

edited

JustGlowing commented Nov 30, 2019 via email

TimNagle-McNaughton commented Nov 30, 2019

JustGlowing commented Dec 13, 2019

Best practices for using the quantization error of very large datasets #53

Best practices for using the quantization error of very large datasets #53

Comments

TimNagle-McNaughton commented Nov 29, 2019 • edited

JustGlowing commented Nov 30, 2019 via email

TimNagle-McNaughton commented Nov 30, 2019

JustGlowing commented Dec 13, 2019

TimNagle-McNaughton commented Nov 29, 2019 •

edited