Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spike cross-correlation histogram #40

Closed
wants to merge 1 commit into from

Conversation

mdenker
Copy link
Member

@mdenker mdenker commented Jun 16, 2015

To complement and add to the discussion initiated by PR #39, this is an alternate draft implementation of the CCH function we have been preparing. Without having looked at #39 in detail yet, I think the main difference in terms of algorithm is that this implementation does not rely on creating a binary vector, but uses the information of the sparse matrix representation.

Altered function and parameter names of core CCH function

Clarified in documentation which spike train is lagging and which is
leading for CCH calculation.

Use new binning function in spikecorr.cch().

Removed old code from cch() function that is not needed with
BinnedSpikeTrain.

Simplified the way in which the smoothing kernel is supplied to the
cch() function.

Only arrays are allowed as kernels, the option to supply a string
leading to a function call to a specific numpy function creating a
kernel within was  removed.

Added TODOs for the future to cch() function header.

Long function name is better than cch

Changed the output to the cch to a neo.AnalogSignalArray and adjusted the
relative unit_test

Fixed bug in the trasformation of the histogram to AnalogSignalArray, before
the axis of the signal were inverted and following corrections to the unit
test

Fixed bug in the border_correction

Moved CCH function from to correct elephant module.

Added TODO where it is unclear how to compute the default window size.

Latest additions from code sprint.

Removed wrong auto-import

Fixed bug of the time axis of the cch, now the center bin is zero and the axis is symmetric respect to the center bin, udjusted also the check for the time axis check in the unit_test that now does not fail

Changed the definition of the histogram length to the length of the longest spike trains. Included two more tests, one for different bin sizes and one for when a window size is given.

Fixed bug where normalization and window options did not work together
(use of Hlen instead of Hbins)

3: Cross-correlation Histogram

Task-Url: http://github.com/INM-6/elephant/issues/issue/3
Signed-off-by: Michael Denker <m.denker@fz-juelich.de>

Put all data items of cch unit test into setup

Full coverage achieved with the test functions. Ready for pull request.

3: Cross-correlation Histogram

Task-Url: http://github.com/INM-6/elephant/issues/issue/3

Changed definition of time axis in CCH to be left bins. Smaller fixes in
documentation throughout.

Added an example for the CCH to the documentation.
3: Cross-correlation Histogram

Task-Url: http://github.com/INM-6/elephant/issues/issue/3

Changed all cch() to full name in unit tests
3: Cross-correlation Histogram

Task-Url: http://github.com/INM-6/elephant/issues/issue/3

Small changes to make unit tests more legible
3: Cross-correlation Histogram

Task-Url: http://github.com/INM-6/elephant/issues/issue/3

Change in error message
@btel
Copy link

btel commented Jun 16, 2015

hi Michael - this is perfect. I was sure someone must have worked on something similar. It just proves that there is much need for this functionality.

Our implementations indeed differ in the representation of spikes. It would be nice to do some benchmarking to see which one is faster in several use cases (for short and long spike trains). I have another implementation which is closer to yours, but uses numba for better performance.

Another question I have is why you decided to take binned spike trains as the argument. Since you are working with spare spike time representation, you could just take original spike train object.

Finally, I noticed that you can use kernel for smoothing. Another thing worth adding would be the spike dithering method.

@mdenker
Copy link
Member Author

mdenker commented Jun 17, 2015

Regarding your question on the use of the binned object, we use this in order to define the binning in a consistent way with other functions. Here, the binning parameters are important for the "binary" option (or, clipped spike trains), and in order to make the normalization consistent with the classic xcorr()-with-a-binary-vector method.
Indeed, another thing to implement would be something a cch-continuous() function, which does not bin at all and thus directly works on the Neo.SpikeTrain object().

In terms of profiling, it's a good idea to get a feeling. However, the main motivation for the sparse representation function was also the memory efficiency aspect, so possibly it's even a good idea to consolidate the two functions in terms of input parameters, and keep both approaches.

@btel
Copy link

btel commented Jun 17, 2015

It makes sense to have two implementations of cch - one for SpikeTrain object and one for BinnedSpikeTrain. However, using two different functions might be confusing. One could test in the implementation of cch which object was passed and then choose the right algorithm. As you say the problem is then how to pass extra binsize parameter to SpikeTrain-based implementation.

I agree that each of the method has its use cases - yours should work best for sparse firing (low firing rates) and very fine binning, mine for high firing rates and coarse binning. I think we can safely assume that people have >= 2 gb of RAM these days, which should set the limits of how much memory we are ok to allocate.

I will try to do some benchmarks later this week. BTW It might be a good idea to have automated benchmarks akin to pandas.

@alperyeg
Copy link
Contributor

I also think two functions for doing the same thing is confusing. Thus, I would opt for a version which can distinguish between BinnedSpikeTrain objects and SpikeTrain objects.
The parameters could be passed within a parameter dictionary (as **kwargs). This may be not a good design choice since it packs two things together which are different, the binning parameter and cch parameter. This can be discussed further on.

With the newest optimisation for the binning class ( see #32) speed at least shouldn't be a problem.

@btel would be interesting to hear more about the pandas automatic benchmarking. Maybe as a new suggestion (issue).

@btel
Copy link

btel commented Jun 17, 2015

This would need some more design work, but I imagine one could match the positional arguments to the implementation in a multiple-dispatch way and rise an exception when a function with an appropriate signature is not found. For example,

cch(BinnedSpikeTrain1, BinnedSpikeTrain2) calls cch_binned
cch(SpikeTrain1, SpikeTrain2, binsize) calls cch_continous
cch(SpikeTrain1, SpikeTrain2) raises an exception.

(There is a python library called multipledispatch, but I have no experience with it.)

Regarding the performance, I think that the bottleneck is in the calculation of the cch. Michael uses a for loop to iterate over spikes. My implementation in #39 uses np.convolve on binned spike trains so that the computation/memory scales with number of bins.

# Compute the CCH at lags in -hist_bins,...,hist_bins only
for r, i in enumerate(st1_bin_idx_unique):
timediff = st2_bin_idx_unique - i
timediff_in_range = np.all(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is rather inefficient, because for each spike you create a new boolean array timediff_in_range. Since the spikes are sorted, you could use np.searchsorted instead to find the indices of the first and last spike in the window.

@mdenker
Copy link
Member Author

mdenker commented Jun 17, 2015

So, I think we need to clarify a bit terminology. As I understand, we have three concepts, let me suggest terms to distinguish them in order not to get confused:

  • binned, using binned binary vectors and convolve or similar cch_conv
  • binned, using the binned sparse representation of spike trains cch_sparse
  • continuous which bins only after calculating all continuous non-binned spike time differences cch_continuous

Please correct me if you see it differently.

I think indeed cch_conv and cch_sparse should be in a single function if both are deemd desireable. Both should in my opinion get the BinnedSpiketrain as input, in order to make this conversion transparent to the user.

cch_continuous is not only a different function call accepting Neo objects, but it is also conceptually different as it will lead to a different result (spikes are not "split" across bin borders). Therefore I opt to have this as a different function.

In either case, I would suggest to focus on the first two, cch_conv and cch_sparse, for now. I would opt for an option method='sparse' or method='conv' to allow to select either, with one being the default.

@btel
Copy link

btel commented Jun 17, 2015

I see the same three functions that you listed. However, for cch_continous you also need to bin the cross-correlogram and that's why you need the bining argument. Effectively, the functions won't be very different - all of them should return the lags and correlogram, but calculated slightly differently. I guess when user passes a SpikeTrain object, she or he should expect that the spikes won't be binned.

The selection between cch_conv vs cch_sparse could be automatic based on the size of data (number of spikes and number of bins). Plus the option to override the automatic selection.

We should probably start just with a single, most complete implementation, but keep the api open so that one can include other implementations in the future.

@alperyeg
Copy link
Contributor

@mdenker and @btel please have a look here #63

@apdavison
Copy link
Member

Replaced by #63

@apdavison apdavison closed this Jan 14, 2016
@pietroquaglio pietroquaglio deleted the feature/cch_proposed branch March 1, 2016 11:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants