Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure to split: component starvation #2

Closed
eywalker opened this issue Jan 4, 2014 · 4 comments
Closed

Failure to split: component starvation #2

eywalker opened this issue Jan 4, 2014 · 4 comments

Comments

@eywalker
Copy link

eywalker commented Jan 4, 2014

On some data sets, the MoKSM fails to make split due to premature component starvation even on data sets with seemingly obvious split candidates. Running following code results in the result shown in the MoKSM GUI below. Attempt to further split the cluster in GUI fails with the same error: Splitting cluster.. aborted due to error: Component starvation: cluster 1. I've tried this under various parameters settings but none of the setting appears to matter.

leo = fetch(acq.Sessions('session_datetime > "2010"','subject_id = 3',acq.Stimulation('exp_type like "ClassD%"')));
sortedSet =fetch(detect.Electrodes(leo) & sort.KalmanAutomatic);
electrode = sortedSet(50); % this is the electrode with particularly problematic result
% !! if you change above number from 50 -> 7, it shows a case where doing a
% manual split in GUI results in cluster with FP of 1282

% number of PCA components to be used
numComponent = 8;

%model = MoKsmInterface(electrodes(ind));
model = MoKsmInterface(electrode);
model = getFeatures(model, 'PCA', numComponent);

% Set up model parameters
params = model.params;
switch(numComponent)
    case 3 % this is the same setting as used in tetrodes
        params.ClusterCost = 0;
        params.Df = 9;
        params.CovRidge = 1.5;%%1.5;
        params.DriftRate = 300 / 3600 / 1000;
        params.DTmu = 100 * 1000;
        params.Tolerance = 0.0001;%0.0005;
        params.Verbose = true;
end
model.params = params;

fitted = fit(model);
m = ManualClustering(fitted);

untitled

@aecker
Copy link
Owner

aecker commented Jan 4, 2014

This issue is most likely not a bug but a problem with the data. Component starvation means that one of the components had less data points assigned to it than necessary to estimate the covariance matrix. This usually happens once you start overfitting (obviously not the case here) or if there are outliers in the dataset.

There are two things you can try to diagnose the problem:

  1. Plot the peak-to-peak amplitudes or first PC versus time to check if there are obvious outliers. If that's the case, try removing them manually and re-run the algorithm.

  2. Run MoKsm with verbose=true and maybe have it plot the data after every few iterations (insert a call to plot() in the EM iteration) to see what's happening. Most likely one of the two components converges towards a small number of data points and gets a weird shape.

@eywalker
Copy link
Author

eywalker commented Jan 5, 2014

I've tried what you suggested - removing outliers and also plotting during EM iteration. However, even when using only the feature dimension with greatest separation between two clusters, I observed that two cluster means converge towards each other with one cluster eventually dropping out. I played around with the parameter settings, but again this doesn't seem to help.

I have created test-case at eywalker/moksm debug branch eywalker/moksm@eb282e9 - it will be great if you can take a look at it by running through the testMoKsm script.

@aecker
Copy link
Owner

aecker commented Jan 5, 2014

I think the problem is the scale of the data. It has to be in muV, but it seems your data is on a different scale (judging from the feature vs. time plot in the image above). In this case the CovRidge and DriftRate parameters (and possibly others, which are sensitive to the scale of the data) need to be scaled down accordingly.

@eywalker
Copy link
Author

eywalker commented Jan 5, 2014

Scaling was indeed the problem - scaling up the data by a factor of 100 did the trick and now the data clusters correctly using the parameter settings I have configured previously. It looks like that for some reason this particular data-set breaks assumptions made in the gain adjustment inside the SpikeSortingHelper. I'll work on correcting the issue there. Thanks!

@eywalker eywalker closed this as completed Jan 5, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants