Idea to Speed Similarity GMM Step #165

spficklin · 2020-06-29T07:45:57Z

When using GMMs, KINC can get extremely slow with large sample sizes (i.e. thousands). However, it is probably not necessary to use all of the samples to establish "modes". I propose the following

Rather than use all samples, use a randomly selected subset. Perhaps this could be as small as 30 samples? Using 30 should allow GMMs to run quickly.
Perform multiple GMM iterations with different randomly selected samples. This would allow for different modes to be identified in different iterations.
Select all non-overlapping clusters as the final set.

Just an idea....

spficklin added the discussion label Jun 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Idea to Speed Similarity GMM Step #165

Idea to Speed Similarity GMM Step #165

spficklin commented Jun 29, 2020

Idea to Speed Similarity GMM Step #165

Idea to Speed Similarity GMM Step #165

Comments

spficklin commented Jun 29, 2020