-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Best practices for using the quantization error of very large datasets #53
Comments
Hi Tim,
You can compute the quantization error using only a sample of your data.
You'll have a good estimation as long as the sample is representative.
…On Fri, Nov 29, 2019, 4:52 PM Tim Nagle-McNaughton ***@***.***> wrote:
I'm hoping to apply minisom to a few very large datasets. These are on the
order of 10e6 - 50e6 instances with 6 dimensions. The data are normalized,
and are all floats.
The training time is no problem, typically <10min. However, I'm
implementing a function to optimize sigma and the learning rate by
minimizing the quantization error, but at present, the calculation time for
the quantization error is so long (>24hrs) that an iterative approach is
not feasible.
If there is anything I can do to improve the calculation time for the
quantization error that would be great, otherwise I may have to use a
different metric.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#53?email_source=notifications&email_token=ABFTNGOWJ6LCH7MLXAM6AWDQWFCFTA5CNFSM4JTBYFIKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4H45YFRQ>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABFTNGN3QX2O57DODX2LYQDQWFCFTANCNFSM4JTBYFIA>
.
|
Cheers, my hero. |
@TimNagle-McNaughton in the latest version of MiniSom the computation of the quantization error is much faster. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I'm hoping to apply minisom to a few very large datasets. These are on the order of 10e6 - 50e6 instances with 6 dimensions. The data are normalized, and are all floats.
The training time is no problem, typically <10min. However, I'm implementing a function to optimize sigma and the learning rate by minimizing the quantization error, but at present, the calculation time for the quantization error is so long (>24hrs) that an iterative approach is not feasible.
If there is anything I can do to improve the calculation time for the quantization error that would be great, otherwise I may have to use a different metric.
I should note that I am using map sizes of sqrt(5*sqrt(instances)) as recommended in: Rojas, Ignacio, Gonzalo Joya, and Andreu Catala, eds. Advances in Computational Intelligence: 13th International Work-Conference on Artificial Neural Networks, IWANN 2015, Palma de Mallorca, Spain, June 10-12, 2015. Proceedings. Vol. 9094. Springer, 2015.
So for a dataset with 7,069,696 instances the map is 115 x 115 (13225 nodes).
The text was updated successfully, but these errors were encountered: