# neuro boot camp day 1
## wrangling ephys data

## contents
* [1. types of data neuroscientists acquire](#data)
* [2. voltage trace as vector (everything is a vector)](#vect)
* [3. peak detection (from scratch)](#peak)
* [4. simple spike sorting](#sort)
* [5. filtering: low, high pass; notch](#filt)
* [6. spike trains, raster plots](#plot)



## 0. preliminaries

(FYI: if you are an advanced student and you breeze through these exercises, I would point you at [Neural Data Science](https://www.sciencedirect.com/book/9780128040430/neural-data-science) by Nylen and Wallisch.  You can push yourself to work through some of the more advanced examples there.  Everyone else may find it a useful set of prompts to consider down the road.

In [None]:
import numpy as np
import matplotlib
%matplotlib tk

import matplotlib.pyplot as pl

import scipy.io

<a id="data"></a>
## 1. types of data neuroscientists acquire

**Exercise 1**: Load dataset.npy, and tell me what's in it.

<a id="vect"></a>
## 2. voltage trace as a vector (everything is a vector)

<a id="peak"></a>
## 3. peak detection (from scratch)

Python has some libraries that allow for peak detection, but I think doing some manual peak finding is a useful way to hone your python skills as well as think about ephys traces.

For the early parts of this exercise, we'll start with a simple sine wave.  But even this step requires a little bit of thought.  We're not going to find peaks on an abstract or analog sine wave, but rather that is explicitly sampled over time.  Play with the sampling frequencies(Fs) and sine wave frequency(f) in the code block below to see if you can build some intuition about how the Fs needs to relate to f in order to be able to pick out individual peaks in our sine wave.

(add during lecture: link to discussion of Nyquist limit)

**Exercise 3:** write a function that will find the local maxima in a sine wave.

In [None]:
#use this code block to develop your function

def find_maxima():
    
    # your code here
    
    return result  # what should result look like?



In [None]:
# interlude
#
# what should the input sinewave look like??

# exercise 3.1a: plot 1 second of a sine wave of frequency 8 Hz.

In [None]:
# run your function and print output describing local maxima

In [None]:
# let's embellish findpeaks by having it plot the input together the detected peaks.

Let's extend our use of this function to actual ephys data.  Will it work on one of the traces in _dataset.npy_?  What additional concerns prop up?

for reference, see the [matlab findpeaks documentation](http://www.mathworks.com/help/signal/ref/findpeaks.html) and their [peak finding tutorial](https://www.mathworks.com/help/signal/examples/peak-analysis.html).
* prominence
* min peak distance
* height threshold

![alt text](http://www.mathworks.com/help/examples/signal/win64/DeterminePeakWidthsExample_02.png "peak features")

In practice, you will probably end up using other functions out there.  To point you toward some resources, here are:
* a [blog post](https://blog.ytotech.com/2015/11/01/findpeaks-in-python/) discussing various packages containing a peak finding function;
* [peakutils](https://bitbucket.org/lucashnegri/peakutils), the package we'll use for consistency going forward in this exercise (with a [tutorial](https://peakutils.readthedocs.io/en/latest/tutorial_a.html));
* a [jupyter notebook](http://nbviewer.jupyter.org/github/demotu/BMC/blob/master/notebooks/DetectPeaks.ipynb) that was the basis for the peakutils package.  The peak detection function is listed inside the notebook if you want to see how the approach compares to what we came up with.
* there's also a [built-in scipy function](https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.signal.find_peaks_cwt.html) based on wavelets, a different approach that lacks the features we were seeking to build out from the matlab model.

In [None]:
import peakutils as pu
from peakutils.plot import plot as puplot


<a id="sort"></a>
## 4. simple spike sorting

Can we separate our spikes into different classes?  Let's start with making a histogram of peak amplitudes.

k means approach

In [None]:
from sklearn.cluster import KMeans

n_clusters = 5

# reshape the data to the shape (n_samples, n_features) -- required for scikit-learn
X = peak_heights.reshape([-1,1])
# run k-means clustering
km = KMeans(n_clusters=n_clusters).fit(X)

In [None]:
# display the nerve and the peaks colored by cluster
pl.figure(3)
pl.plot(nerve, color='gray', lw=1)
pl.scatter(peak_idxs, peak_heights, c=km.labels_, s=20, zorder=10)

For future reference: play with https://github.com/tridesclous/tridesclous.

<a id="filt"></a>
## 5. filtering: low, high pass; notch; baseline

<a id="plot"></a>
## 6. spike trains, raster plots

In [None]:
A = np.random.choice([0,1], 1000, p=[0.9,0.1]).reshape(10,100)

fig6,ax6 = pl.subplots()

spiketimes = [i for i,x in enumerate(A[0]) if x==1]
ax6 = pl.vlines(spiketimes,0,1)

print(spiketimes)

**exercise 6**: show me a plot with 10 spiketrain rasters in it, that looks like this:

![alt_text](Figure_1.png)

**provocation/challenge**:  let's look all the way back at part 1.  Can you use vlines to put color-coded y scales correlated with each trace in your plot?

Here's a reminder of the scale units for each trace:

## *neo*?