Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea to Speed Similarity GMM Step #165

Open
spficklin opened this issue Jun 29, 2020 · 0 comments
Open

Idea to Speed Similarity GMM Step #165

spficklin opened this issue Jun 29, 2020 · 0 comments

Comments

@spficklin
Copy link
Member

When using GMMs, KINC can get extremely slow with large sample sizes (i.e. thousands). However, it is probably not necessary to use all of the samples to establish "modes". I propose the following

  1. Rather than use all samples, use a randomly selected subset. Perhaps this could be as small as 30 samples? Using 30 should allow GMMs to run quickly.
  2. Perform multiple GMM iterations with different randomly selected samples. This would allow for different modes to be identified in different iterations.
  3. Select all non-overlapping clusters as the final set.

Just an idea....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant