# Example: Data conversion for SPOTDis and SpikeShip algorithms

Consider the case when your data format is as follows:
+ an array `1xn` called `raw_spike_times` which contains all the times of $n$ spikes.
+ an array `1xN` called `neuron_ids` which contains the neuron identity of each spike (i.e., $N$: total of neurons), and
+ a `Mx2` called `event_intervals` array which contains the start (first column) and stop (second column) times of all my $M$ trials.

In this tutorial, I introduce a simple method to convert and process input dataset with this format.

In [1]:
import numpy as np

**Generating data**


In [2]:
num_spikes = 10_000 # number of spikes
N = 100  # number of neurons
M = 200 # number of trials

In [3]:
raw_spike_times = np.random.random(size=(num_spikes))
raw_spike_times.shape

(10000,)

In [4]:
neuron_ids = np.random.randint(low=0, high=N, size=(num_spikes), dtype=np.int16)
neuron_ids.shape

(10000,)

Defining epochs based on start and end times of trials:

In [5]:
intervals = np.linspace(start=0, stop=1, num=(M+1))
intervals

array([0.   , 0.005, 0.01 , 0.015, 0.02 , 0.025, 0.03 , 0.035, 0.04 ,
       0.045, 0.05 , 0.055, 0.06 , 0.065, 0.07 , 0.075, 0.08 , 0.085,
       0.09 , 0.095, 0.1  , 0.105, 0.11 , 0.115, 0.12 , 0.125, 0.13 ,
       0.135, 0.14 , 0.145, 0.15 , 0.155, 0.16 , 0.165, 0.17 , 0.175,
       0.18 , 0.185, 0.19 , 0.195, 0.2  , 0.205, 0.21 , 0.215, 0.22 ,
       0.225, 0.23 , 0.235, 0.24 , 0.245, 0.25 , 0.255, 0.26 , 0.265,
       0.27 , 0.275, 0.28 , 0.285, 0.29 , 0.295, 0.3  , 0.305, 0.31 ,
       0.315, 0.32 , 0.325, 0.33 , 0.335, 0.34 , 0.345, 0.35 , 0.355,
       0.36 , 0.365, 0.37 , 0.375, 0.38 , 0.385, 0.39 , 0.395, 0.4  ,
       0.405, 0.41 , 0.415, 0.42 , 0.425, 0.43 , 0.435, 0.44 , 0.445,
       0.45 , 0.455, 0.46 , 0.465, 0.47 , 0.475, 0.48 , 0.485, 0.49 ,
       0.495, 0.5  , 0.505, 0.51 , 0.515, 0.52 , 0.525, 0.53 , 0.535,
       0.54 , 0.545, 0.55 , 0.555, 0.56 , 0.565, 0.57 , 0.575, 0.58 ,
       0.585, 0.59 , 0.595, 0.6  , 0.605, 0.61 , 0.615, 0.62 , 0.625,
       0.63 , 0.635,

For this example, I'll consider a case of consecutive epochs where each of them has $1/M$ as length.

In [6]:
event_intervals = np.zeros(shape=(M,2))
for i in range(M):
    event_intervals[i][0], event_intervals[i][1] = intervals[i], intervals[i+1]

In [7]:
event_intervals

array([[0.   , 0.005],
       [0.005, 0.01 ],
       [0.01 , 0.015],
       [0.015, 0.02 ],
       [0.02 , 0.025],
       [0.025, 0.03 ],
       [0.03 , 0.035],
       [0.035, 0.04 ],
       [0.04 , 0.045],
       [0.045, 0.05 ],
       [0.05 , 0.055],
       [0.055, 0.06 ],
       [0.06 , 0.065],
       [0.065, 0.07 ],
       [0.07 , 0.075],
       [0.075, 0.08 ],
       [0.08 , 0.085],
       [0.085, 0.09 ],
       [0.09 , 0.095],
       [0.095, 0.1  ],
       [0.1  , 0.105],
       [0.105, 0.11 ],
       [0.11 , 0.115],
       [0.115, 0.12 ],
       [0.12 , 0.125],
       [0.125, 0.13 ],
       [0.13 , 0.135],
       [0.135, 0.14 ],
       [0.14 , 0.145],
       [0.145, 0.15 ],
       [0.15 , 0.155],
       [0.155, 0.16 ],
       [0.16 , 0.165],
       [0.165, 0.17 ],
       [0.17 , 0.175],
       [0.175, 0.18 ],
       [0.18 , 0.185],
       [0.185, 0.19 ],
       [0.19 , 0.195],
       [0.195, 0.2  ],
       [0.2  , 0.205],
       [0.205, 0.21 ],
       [0.21 , 0.215],
       [0.2

**Data transformation**

In [8]:
def to_spikeship_dataformat(raw_spike_times, neuron_ids, epoch_intervals):
    """
    Processes spike times relative to epoch intervals and return `spike_times` and
    `ii_spike_times` indices for spikeship's data format. `raw_spike_times` is an
    array with all the spike times of a dataset, `neuron_ids` is the neuron to
    which such spike belongs to, and `epoch_intervals` contains the start and end
    time to consider for defining each epoch.

    Parameters
    ----------
    raw_spike_times : numpy.ndarray
    Array of 1xn which contains all the spike times (n).
    neuron_ids : numpy.ndarray
    Array of 1xN labels which correspond to neurons identity of each spike (i.e., N 
    neurons).
    epoch_intervals : numpy.ndarray
    Array of `Mx2` values. `M` corresponds to the number of trials or epochs.
    First and second columns correspond to start and stop times, respectively.

    Returns
    -------
    spike_times : numpy.ndarray
        Array with relative spike times.
    ii_spikes_times: numpy.ndarray
        `(M,N,2)`-Array with indices per neuron (N) and epoch (M).
    """

    total_n = raw_spike_times.shape[0]
    M = epoch_intervals.shape[0]
    N = np.unique(neuron_ids).shape[0]

    spike_times = [] 
    ii_spike_times = [] 

    count_st = 0
    current_count = 0

    for i_M in range(M):
        t_start = epoch_intervals[i_M][0]
        t_end = epoch_intervals[i_M][1]

        temp_ii_spike_times = []
        for i_N in range(N):
            neuron_id = neuron_ids[i_N]
            indices = np.where(neuron_ids == neuron_id)[0]

            mask = ((raw_spike_times[indices] >= t_start) & (raw_spike_times[indices] < t_end))

            current_count = int(np.sum(mask))

            if current_count > 0:
                temp_spike_train = raw_spike_times[indices][mask]
                # spike times are relative to trial onset
                temp_spike_train = temp_spike_train - t_start 
                spike_times.append(temp_spike_train)

            temp_ii_spike_times.append([current_count, (count_st+current_count)])
            current_count += current_count
        ii_spike_times.append(temp_ii_spike_times)

    spike_times = np.concatenate(spike_times)
    ii_spike_times = np.array(ii_spike_times)

    return spike_times, ii_spike_times


In [9]:
%%time
spike_times, ii_spike_times = to_spikeship_dataformat(
    raw_spike_times, neuron_ids, event_intervals
    )

CPU times: user 403 ms, sys: 4.95 ms, total: 408 ms
Wall time: 407 ms


Now, the data format for `ii_spike_times` considers `M` trials and `N` neurons:

In [10]:
np.min(spike_times), np.max(spike_times)

(6.616630418010416e-07, 0.004999820382463521)

In [11]:
ii_spike_times.shape

(200, 100, 2)

---