You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the multi-ellipsoidal algorithm, a cluster must have a minimum of ndim+1 member points. This means that two widely separated modes will not be split into separate clusters if one mode has only a few points.
For example, in 2-d, here is how two modes are separated when the lower peak has 3 points:
On a later iteration, one of the 3 points in the lower peak is discarded due to having the lowest likelihood. At that point, the separation looks like:
Possible Solution 1: "Freeze" bounding ellipsoid for clusters that have ndim + 1 points. That ellipsoid will be used until all its points disappear. A little distatesful because it makes the ellipsoid decomposition "stateful": You can't just look at a set of points and see how the bounding ellipsoids will look - the answer depends on previous iterations.
Possible solution 2: Relax requirement of clusters having ndim + 1 points. Would expand ellipsoid dimensions to fulfill a target volume. May lead to oversplitting into too many ellipsoids.
DBSCAN for mode identification might help with this. (If DBSCAN identifies a mode, always split, even if there are <= ndim points.)
A good test case to see how big a problem this is would be two N-d gaussians of different heights.
The text was updated successfully, but these errors were encountered:
In the multi-ellipsoidal algorithm, a cluster must have a minimum of ndim+1 member points. This means that two widely separated modes will not be split into separate clusters if one mode has only a few points.
For example, in 2-d, here is how two modes are separated when the lower peak has 3 points:
On a later iteration, one of the 3 points in the lower peak is discarded due to having the lowest likelihood. At that point, the separation looks like:
Possible Solution 1: "Freeze" bounding ellipsoid for clusters that have ndim + 1 points. That ellipsoid will be used until all its points disappear. A little distatesful because it makes the ellipsoid decomposition "stateful": You can't just look at a set of points and see how the bounding ellipsoids will look - the answer depends on previous iterations.
Possible solution 2: Relax requirement of clusters having ndim + 1 points. Would expand ellipsoid dimensions to fulfill a target volume. May lead to oversplitting into too many ellipsoids.
DBSCAN for mode identification might help with this. (If DBSCAN identifies a mode, always split, even if there are <= ndim points.)
A good test case to see how big a problem this is would be two N-d gaussians of different heights.
The text was updated successfully, but these errors were encountered: