Skip to content

Tobias-Fechner/yr5-computational-intelligence

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

64 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Intro

Extracellular recordings are taken using an electrode in the brain to monitor the firing activity of one or more neurons. A crucial problem in this field of research is that the signal from several neurons of varying proximity to the electrode are detected. Spike sorting is the process of deconstructing the recording into its constituent waveforms to attribute the spikes (action potentials) to a single neuron. This is important since different neurons typically fire for different functions. Accurate labelling is also vital in determining the firing pattern of a collection of neurons and their relation to one another. This report evaluates two spike sorting algorithms for the detection and classification of action potentials in a voltage recording made by a simple bipolar electrode inserted into the cortical region of the brain. Simulated data has been generated for training purposes and contains a single time domain recording of the spikes from four types of neuron with a low level of noise. The algorithms are then validated on real data taken from a human that is expected to contain spikes from four neurons of equivalent morphology to those contained within the synthetic data. The real data has a much higher degree of noise related to non-stationary activity by the subject.

Spike Detection

Spike Detection: Manual Amplitude Thresholding

The various spike detection approaches reviewed so far have demonstrated more advanced techniques can be used to generalise the solution to several datasets. For instance, the adaptive amplitude thresholding does not require manual parameter setting. In addition to this, algorithms have been discussed that are suitable for online spike detection. Yet since the assignment included only two datasets, containing what is expected to be spikes from the same four types of neurons, these extensions are less relevant. As such, spikes were detected using amplitude thresholding, which could be set appropriately by inspecting the data and was favoured for its simplicity and fulfilment of the requirements of this problem. A comparator mask was applied to remove all amplitudes greater than the set threshold to separate the peaks from the noise. A comparison between neighbouring points identified the peaks as those points that were larger than both of its neighbours. This was an effective solution in almost all cases yet had could be improved. One immediate pitfall could be addressed by refining the detection algorithm. Peaks that returned more than one count, due to more than one local maxima as seen in Figure 4, were filtered to remove repeat counts (false positives) within a given window. In order to resemble the labelled dataset more closely, the spike index was also shifted back by roughly 8 points. This was to allow detected spikes to be used during training, ensuring inputs used for training and prediction were generated in the same way, whilst still fulfilling the requirement of labelled data. Measures were also taken to minimise false negatives, discussed below.

Filtering: Savitzky-Golay and Butterworth Filters

In order to reduce the noise in the signal, a bandpass filter was first used to remove the frequencies outside of the range 300-3000Hz prior to detecting the spikes. This range was taken from literature, but it was validated as containing the desired frequencies by reviewing the frequency spectrum of the original signal. As seen in the red trace in Figure 4Error! Reference source not found., although this reduced noise, this also reduced the amplitude of the signal and thus changed the spike morphology, as forewarned by Quiroga, 2009, in the section Spike Detection. A Savitzky-Golay (Savgol) filter could be used to smooth the signal whilst better maintaining the original spike shape. The filter mechanics from these two filters were combined in the following way to leverage the strengths of each. A high pass filter was applied before peak detection since its characteristics meant the filtered signal’s amplitudes were always about zero, with any large signal deviations being removed. This was particularly important for processing the submission dataset since the noise introduced by the subject’s movement caused significant drift in the zero reference of the original signal. By using a high pass filter, the constant amplitude threshold was still effective, and it was not necessary to cover any alternatives discussed in Spike Detection. The Savgol filter was then applied to smooth the original signal, benefitting the peak detection by reducing the number of false positives caused by excessive noise. In addition to this, a less conservative threshold could be set to minimise the number of missed spikes (false negatives). Finally, the spike waveforms were extracted from the HP-Savgol-filtered signal (without the bandpass filter), minimising the impact from phase-shifting on the spike morphologies used for training the models. The benefits of this filtering approach can be seen in the filtered submission data in Figure 5.

Waveform Extraction

The putative waveforms were then extracted by retrieving the signal values for a window of values either side of the detected spike index. By detecting the spikes using the methodology discussed, the spike waveforms were automatically aligned at their peaks, as per the suggestion of Rey, et al., and unlike the detection method of the given labelled data which was aligned at the start of the rising edge. These set of values formed the inputs to both solutions, and the quality of their resemblance of the underlying nueron’s action potential would heavily influence the outcome of the final classification performance. The average waveforms were visualised for each class, as seen in Figure 6, to ensure the extracted waveforms were as expected. Aditionally, these could have been used to create templates for post-processing template matching as presented by Wouters, et al. and referred to in the section Classification using Neural Networks.

Solution 1: PCA + KNN

Justification of Solution Choice

As mentioned in Feature Extraction, more than half of the papers published and covered in an academic review used PCA for feature extraction. Since the first solution would be designed without any prior expertise of spike sorting, it seemed logical to use the most common approach seen in use so far. In addition to this, PCA algorithms have been used enough to become readily available in third party packages such as Python’s Scikit-Learn. PCA was applied to the detected spikes dataset, combining this with the labels from the labelled dataset as discussed in Spike Detection: Manual Amplitude Thresholding.

Explanation: PCA Feature Extraction

The first two components were used to create the two-dimensional component space seen below in Figure 7, which can already be seen to contain 4 clear clusters of data. Since the training data contained relatively little noise, the waveforms resulting in points further from the centre of these clusters is most likely to results from spike overlaps. This is expected when considering the number of waveforms in each class with non-characteristic morphologies, visualised in Figure 6. The cumulative explained variance was plotted to select the number of components to use when fitting the model to the data. As seen in Figure 9, 97% of the data’s variance is explained by the first 8 extracted components.

Explanation: Classification with K-Nearest Neighbour

When reviewing the component space generated above, no clear distribution shape was recognised. As such, the irregular decision boundaries could be suitably addressed by using a K-Nearest Neighbour (KNN) classification algorithm. The use of discrete labelled data made the supervised learning approach of the KNN classifier a suitable choice, in addition to the fact there were no online classification requirements. A range of values were tried for the number of neighbours (K) to evaluate when classifying a test waveform, and for the number of components to extract during the PCA stage. A higher value of K will be more able to suppress noise, at the cost of blurring the classification boundaries.

Evaluation of Classification Performance

The best result of 93.9% accuracy was achieved using the first 8 principle components and evaluating the 6 nearest neighbours. A significant portion of false positives are attributable to class 2 spikes being classified as class 3. False positives could occur for a number of reasons, such as inaccuracies in the labelled data (the labelled data was found to contain 9 spikes that were labelled with two different labels). More likely, two or more classes of spikes that are attributable to neurons that fire in close succession with one another, or even in bursts as is described by Rey, et al., can cause the extracted spike waveforms to contain values from another spike that shifts its position in the feature space, causing it to be classified to the wrong class. The lower recall performance for class 2 may simply be because the similarities in spike morphologies between classes 2 and 3, seen in Figure 6, makes it difficult to distinguish between the two spikes, and vice versa for class 4, which has a spike shape most unlike others. These points are referred to in Conclusion & Confidence Level.

Solution 2: MLP

Justification of Solution Choice

As discussed in the previous section, spikes that fire in close succession to one another will lead to spike overlap. The extracted spike waveforms then contain segments from another spike and will lead to a different position in the feature space. It is believed the problem of spike overlap cannot be solved in the feature space. [16] [22] As such, a neural network-based approach has been implemented to avoid extending the traditional spike sorting pipeline that centres around feature extraction as many have tried previously. Furthermore, ANNs are becoming increasingly justified due to their ability to process large amounts of data in a semi-automatic way. This is an increasing need as studies with multiple electrodes simultaneously measuring several hundred data channels are becoming increasingly frequent. [3] As such, a multi-layer perceptron model was used for the classification stage in the second solution.

Explanation: MLP Architecture

The size of the extracted waveform window determined the number of neurons on the input layer of the MLP, as the signal values formed the inputs to the network. Only one hidden layer was used, and a variety of depths were tried. Again, the network was trained on the detected spikes dataset, attaching the labels from the given labelled dataset. The output layer contained 4 nodes, representing the 4 spike classes. A network with more hidden layers would have been desirable, as seen in studies to achieve higher performances.

Explanation: Training Process

The network used the sigmoid activation function for each neuron, with a basic difference error function. The learning rate was initially set at 0.1 but decayed by 40% every 6 epochs in order to more precisely target the local minima. The training was stopped early once the patience window of 4 epochs had been passed, and the variance in the trailing 5 performance measures had decreased to below 0.05. The first 80% of the full training dataset containing 1.44M rows was used for training, with the remaining 20% being used for validation during training. Results show the network achieving over 97.0% accuracy on the validation dataset.

Optimisation Algorithm: SA

Justification of Optimisation Approach

The optimisation would be performed on the neural network solution to find the model architecture able to achieve the highest classification accuracy. In order to minimise the dimensions in the configuration space, parameters that affect principally the training time such as the learning rate and the number of epochs would not be included, although this should be considered in succeeding development efforts. The number of neurons in the input layer and the hidden layer would be optimised. For the hidden layer, this is achieved through a configurable upon the instantiation of the network, whereas for the input layer, this is determined by the size of the window used to extract the spike waveforms from the signal. These are two closely related parameters, since the weights connecting the two layers must be able to translate the complexity of the data within the first layer to the next in order to propagate the information forward and train the network effectively. A variable learning rate is implemented through the step-decay previously described in Explanation: Training Process, although this is considered to be a conventional form of neural network optimisation. Other hyperparameters that could be optimised would be the number of hidden layers and their connectivity (convolution layers, fully connected layers, pooling layers), the way the weights are initialised, and the training algorithms.

Evaluation of Alternative Optimisation Algorithms

Several papers were reviewed to identify the following three optimisation approaches as the most commonly used with neural networks (a small subset of the many metaheuristic frameworks): particle swarm optimisation (PSO), Genetic Algorithms (GA), and Simulated Annealing (SA). Gradient optimisation methods, such as stochastic gradient descent, were not considered since the solution space is expected to contain several local minima, increasing the likelihood of gradient-based approaches finding a sub-optimal solution, whereas a global search of the problem space is desired. [23] In GAs, an initially random population of candidate solutions (genes) is created and evaluated using an objective function. The best performing candidates are selected, mutated and made subject to a mechanism equivalent to gene crossover to form the next generation. Studies show that GAs do not guarantee global convergence and are recommended to be compared to other optimisation methods. [23] GAs also require a minimum population size to exhibit enough diversity within the gene pool for an acceptable portion of the solution space to be explored. As described by Bergstra, et al. (2011), a budgeting choice of how many CPU cycles will be spent on hyperparameter exploration must be made. It was decided that to train this many networks over several generations would be too much, and that a near-enough optimal solution would be targeted using a less computationally intensive optimsation method. However, a GA may be suitable in future work where the activation function of the neurons could also be considered, due to its superior ability to explore discontinuous spaces. PSO models the behaviour seen in schools of fish and flocks of birds. Several candidate solutions (particles) are created to leverage both their own knowledge of the local solution space, but also a “social” knowledge gained from communication with the other particles. Often better solutions are converged to more quickly in comparison to GA and have a superior ability to find the global optimal solution. Simulated annealing was chosen, however, for its simplicity over both of the aforementioned solutions, where it generates only a single solution as opposed to a population or a swarm, and since the solutions space is relatively small. Further, a prior knowledge about the problem, built up over the course of the assignment, can be used to start the optimisation process with a known well-performing network configuration. In simulated annealing, the performance of an initial candidate solution is quantified using the given cost function. A change is then made to the solution and the performance of the new solution compared. The new solution is then taken based on an acceptance probability, which is determined by the difference in performance and the current temperature. The temperature reduces from the initial set temperature in relation to the number of solution changes made. As the temperature cools, the likelihood of new solutions being taken reduces and the solution should converge towards the best solution found within the explored space. By there being a chance a less than optimal solution will be taken forward, the chance of becoming stuck in a local minimum is reduced.

Evaluation of Results

A simulated annealing optimisation was used, generating roughly 20 model iterations for a range of the number of hidden nodes and window size. Due to the dependency on the initialising conditions, several runs were performed and the best final result was selected, seen in Figure 11 to achieve a final accuracy of 97.98%, thus having increased the maximum performance achieved by roughly 1%. The optimisation took under 15 minutes to train 20 networks for 25 epochs, and it is thought that a population-based optimisation using PSO or GA may be more suitable for circumstances that permit dedicated hardware. In this case, more hyperparameters could be included in the optimisation, such as those listed in Justification of Optimisation Approach. Between consecutive optimisation runs it was found the solution converge to very different network architectures. This highlighted the number of local minima that were present within the parameter space and may have been as a result of the interrelatedness between the two parameters. Therefore, further work should be carried out to optimise the parameters of the optimiser, ensuring the greatest portion of the parameter space is explored and maximising the probability of finding the global optimal solution. The remaining wrongly classified spikes may be addressable with post-processing methods.

Conclusion & Confidence Level

The neural network has been shown to train to a better performance than the K-nearest neighbour algorithm on the supplied synthesised extracellular neuron recordings (see Figure 8 and Figure 10). A higher precision and recall performance for the MLP solution has been attributed to its ability to resolve spike overlapping (see Figure 6) outside of the feature space. A simulated annealing optimisation approach was then used, chosen largely because of its simplicity of implementation and the relatively small parameter space that it would optimise, and was able to improve the performance of the network classifier by a further 1%, reaching a final precision score of 97.98%.

Confidence Level

The spike sorting algorithm applied to the submission dataset returned 3519 spikes, which were classified in the following proportions: 961 to class 1, 834 to class 2, 955 to class 3, and 769 to class 4. The data distribution resembles that of the synthetic recording, as well as the total number of spikes present, which is taken as a positive sign since the synthetic data is a simulation that was based on the original raw data. In order to validate the performance of the classification, we can compare the average waveforms of each class in the submission data classification (Figure 12), to those seen in Figure 6. These show large numbers of waveforms distinct from the class average, which could indicate several spikes have been wrongly classified. This is likely due to the higher levels of noise, which could be validated in future runs by applying noise to the synthetic data and comparing results.

Gracias por su atención. Danke für Ihre Aufmerksamkeit. Thank you for your attention.