Spike cross-correlation histogram #40

mdenker · 2015-06-16T16:13:23Z

To complement and add to the discussion initiated by PR #39, this is an alternate draft implementation of the CCH function we have been preparing. Without having looked at #39 in detail yet, I think the main difference in terms of algorithm is that this implementation does not rely on creating a binary vector, but uses the information of the sparse matrix representation.

Altered function and parameter names of core CCH function Clarified in documentation which spike train is lagging and which is leading for CCH calculation. Use new binning function in spikecorr.cch(). Removed old code from cch() function that is not needed with BinnedSpikeTrain. Simplified the way in which the smoothing kernel is supplied to the cch() function. Only arrays are allowed as kernels, the option to supply a string leading to a function call to a specific numpy function creating a kernel within was removed. Added TODOs for the future to cch() function header. Long function name is better than cch Changed the output to the cch to a neo.AnalogSignalArray and adjusted the relative unit_test Fixed bug in the trasformation of the histogram to AnalogSignalArray, before the axis of the signal were inverted and following corrections to the unit test Fixed bug in the border_correction Moved CCH function from to correct elephant module. Added TODO where it is unclear how to compute the default window size. Latest additions from code sprint. Removed wrong auto-import Fixed bug of the time axis of the cch, now the center bin is zero and the axis is symmetric respect to the center bin, udjusted also the check for the time axis check in the unit_test that now does not fail Changed the definition of the histogram length to the length of the longest spike trains. Included two more tests, one for different bin sizes and one for when a window size is given. Fixed bug where normalization and window options did not work together (use of Hlen instead of Hbins) 3: Cross-correlation Histogram Task-Url: http://github.com/INM-6/elephant/issues/issue/3 Signed-off-by: Michael Denker <m.denker@fz-juelich.de> Put all data items of cch unit test into setup Full coverage achieved with the test functions. Ready for pull request. 3: Cross-correlation Histogram Task-Url: http://github.com/INM-6/elephant/issues/issue/3 Changed definition of time axis in CCH to be left bins. Smaller fixes in documentation throughout. Added an example for the CCH to the documentation. 3: Cross-correlation Histogram Task-Url: http://github.com/INM-6/elephant/issues/issue/3 Changed all cch() to full name in unit tests 3: Cross-correlation Histogram Task-Url: http://github.com/INM-6/elephant/issues/issue/3 Small changes to make unit tests more legible 3: Cross-correlation Histogram Task-Url: http://github.com/INM-6/elephant/issues/issue/3 Change in error message

btel · 2015-06-16T16:49:00Z

hi Michael - this is perfect. I was sure someone must have worked on something similar. It just proves that there is much need for this functionality.

Our implementations indeed differ in the representation of spikes. It would be nice to do some benchmarking to see which one is faster in several use cases (for short and long spike trains). I have another implementation which is closer to yours, but uses numba for better performance.

Another question I have is why you decided to take binned spike trains as the argument. Since you are working with spare spike time representation, you could just take original spike train object.

Finally, I noticed that you can use kernel for smoothing. Another thing worth adding would be the spike dithering method.

mdenker · 2015-06-17T07:49:04Z

Regarding your question on the use of the binned object, we use this in order to define the binning in a consistent way with other functions. Here, the binning parameters are important for the "binary" option (or, clipped spike trains), and in order to make the normalization consistent with the classic xcorr()-with-a-binary-vector method.
Indeed, another thing to implement would be something a cch-continuous() function, which does not bin at all and thus directly works on the Neo.SpikeTrain object().

In terms of profiling, it's a good idea to get a feeling. However, the main motivation for the sparse representation function was also the memory efficiency aspect, so possibly it's even a good idea to consolidate the two functions in terms of input parameters, and keep both approaches.

btel · 2015-06-17T08:50:21Z

It makes sense to have two implementations of cch - one for SpikeTrain object and one for BinnedSpikeTrain. However, using two different functions might be confusing. One could test in the implementation of cch which object was passed and then choose the right algorithm. As you say the problem is then how to pass extra binsize parameter to SpikeTrain-based implementation.

I agree that each of the method has its use cases - yours should work best for sparse firing (low firing rates) and very fine binning, mine for high firing rates and coarse binning. I think we can safely assume that people have >= 2 gb of RAM these days, which should set the limits of how much memory we are ok to allocate.

I will try to do some benchmarks later this week. BTW It might be a good idea to have automated benchmarks akin to pandas.

alperyeg · 2015-06-17T09:41:18Z

I also think two functions for doing the same thing is confusing. Thus, I would opt for a version which can distinguish between BinnedSpikeTrain objects and SpikeTrain objects.
The parameters could be passed within a parameter dictionary (as **kwargs). This may be not a good design choice since it packs two things together which are different, the binning parameter and cch parameter. This can be discussed further on.

With the newest optimisation for the binning class ( see #32) speed at least shouldn't be a problem.

@btel would be interesting to hear more about the pandas automatic benchmarking. Maybe as a new suggestion (issue).

btel · 2015-06-17T10:18:50Z

This would need some more design work, but I imagine one could match the positional arguments to the implementation in a multiple-dispatch way and rise an exception when a function with an appropriate signature is not found. For example,

cch(BinnedSpikeTrain1, BinnedSpikeTrain2) calls cch_binned
cch(SpikeTrain1, SpikeTrain2, binsize) calls cch_continous
cch(SpikeTrain1, SpikeTrain2) raises an exception.

(There is a python library called multipledispatch, but I have no experience with it.)

Regarding the performance, I think that the bottleneck is in the calculation of the cch. Michael uses a for loop to iterate over spikes. My implementation in #39 uses np.convolve on binned spike trains so that the computation/memory scales with number of bins.

btel · 2015-06-17T10:21:03Z

elephant/spike_train_correlation.py

+    # Compute the CCH at lags in -hist_bins,...,hist_bins only
+    for r, i in enumerate(st1_bin_idx_unique):
+        timediff = st2_bin_idx_unique - i
+        timediff_in_range = np.all(


This is rather inefficient, because for each spike you create a new boolean array timediff_in_range. Since the spikes are sorted, you could use np.searchsorted instead to find the indices of the first and last spike in the window.

mdenker · 2015-06-17T10:29:34Z

So, I think we need to clarify a bit terminology. As I understand, we have three concepts, let me suggest terms to distinguish them in order not to get confused:

binned, using binned binary vectors and convolve or similar cch_conv
binned, using the binned sparse representation of spike trains cch_sparse
continuous which bins only after calculating all continuous non-binned spike time differences cch_continuous

Please correct me if you see it differently.

I think indeed cch_conv and cch_sparse should be in a single function if both are deemd desireable. Both should in my opinion get the BinnedSpiketrain as input, in order to make this conversion transparent to the user.

cch_continuous is not only a different function call accepting Neo objects, but it is also conceptually different as it will lead to a different result (spikes are not "split" across bin borders). Therefore I opt to have this as a different function.

In either case, I would suggest to focus on the first two, cch_conv and cch_sparse, for now. I would opt for an option method='sparse' or method='conv' to allow to select either, with one being the default.

btel · 2015-06-17T10:41:13Z

I see the same three functions that you listed. However, for cch_continous you also need to bin the cross-correlogram and that's why you need the bining argument. Effectively, the functions won't be very different - all of them should return the lags and correlogram, but calculated slightly differently. I guess when user passes a SpikeTrain object, she or he should expect that the spikes won't be binned.

The selection between cch_conv vs cch_sparse could be automatic based on the size of data (number of spikes and number of bins). Plus the option to override the automatic selection.

We should probably start just with a single, most complete implementation, but keep the api open so that one can include other implementations in the future.

alperyeg · 2015-11-27T10:52:43Z

@mdenker and @btel please have a look here #63

apdavison · 2016-01-14T13:55:50Z

Replaced by #63

alperyeg added the data analysis label Jun 17, 2015

btel reviewed Jun 17, 2015
View reviewed changes

pietroquaglio mentioned this pull request Nov 24, 2015

Added the cross_correlation_histogram() function and relative test #63

Merged

apdavison closed this Jan 14, 2016

pietroquaglio deleted the feature/cch_proposed branch March 1, 2016 11:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spike cross-correlation histogram #40

Spike cross-correlation histogram #40

mdenker commented Jun 16, 2015

btel commented Jun 16, 2015

mdenker commented Jun 17, 2015

btel commented Jun 17, 2015

alperyeg commented Jun 17, 2015

btel commented Jun 17, 2015

btel Jun 17, 2015

mdenker commented Jun 17, 2015

btel commented Jun 17, 2015

alperyeg commented Nov 27, 2015

apdavison commented Jan 14, 2016

Spike cross-correlation histogram #40

Spike cross-correlation histogram #40

Conversation

mdenker commented Jun 16, 2015

btel commented Jun 16, 2015

mdenker commented Jun 17, 2015

btel commented Jun 17, 2015

alperyeg commented Jun 17, 2015

btel commented Jun 17, 2015

btel Jun 17, 2015

Choose a reason for hiding this comment

mdenker commented Jun 17, 2015

btel commented Jun 17, 2015

alperyeg commented Nov 27, 2015

apdavison commented Jan 14, 2016