In 'multi' method, allow modes to disappear gracefully #8

kbarbary · 2015-09-04T21:59:43Z

In the multi-ellipsoidal algorithm, a cluster must have a minimum of ndim+1 member points. This means that two widely separated modes will not be split into separate clusters if one mode has only a few points.

For example, in 2-d, here is how two modes are separated when the lower peak has 3 points:

On a later iteration, one of the 3 points in the lower peak is discarded due to having the lowest likelihood. At that point, the separation looks like:

Possible Solution 1: "Freeze" bounding ellipsoid for clusters that have ndim + 1 points. That ellipsoid will be used until all its points disappear. A little distatesful because it makes the ellipsoid decomposition "stateful": You can't just look at a set of points and see how the bounding ellipsoids will look - the answer depends on previous iterations.

Possible solution 2: Relax requirement of clusters having ndim + 1 points. Would expand ellipsoid dimensions to fulfill a target volume. May lead to oversplitting into too many ellipsoids.

DBSCAN for mode identification might help with this. (If DBSCAN identifies a mode, always split, even if there are <= ndim points.)

A good test case to see how big a problem this is would be two N-d gaussians of different heights.

kbarbary modified the milestone: v0.2 Sep 4, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In 'multi' method, allow modes to disappear gracefully #8

In 'multi' method, allow modes to disappear gracefully #8

kbarbary commented Sep 4, 2015

In 'multi' method, allow modes to disappear gracefully #8

In 'multi' method, allow modes to disappear gracefully #8

Comments

kbarbary commented Sep 4, 2015