# Spike Tutorial -- Part 2

## Analyzing Populations

In part 1 we had to do quite a bit of work to get our data in a workable format. Now that we understand the data structures and how to manipulate them, we can focus more on scientific questions. In this tutorial, the focus will be on analyzing multiple neurons and organizing and analyzing that data. There are many ways to do this in Matlab, and it seems like even more in Python, so we'll explore a few of them.

First import our favorite modules!

In [None]:
# import necessary modules
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import os
import pickle


%matplotlib inline

We pickled our session info. from last time, let's load it up:

In [None]:
root = '_data'
fn = os.path.join(root,'sessionInfo.pkl')

with open(fn,'rb') as f:
    [order, clustInfo, fs, stimOn] = pickle.load(f)

Ok, we have almost everything we need except our spikes. Load the `'spike_times.npy'` and `'spike_clusters.npy'` to `spikes` and `clust`, respectively. Convert your spikes to seconds using `fs`.

In [None]:
# load spikes and clusters


We have all of our raw data, lets use what we learned from last time to make PSTHs and rasters for each cell! Because we need to make a PSTH for every neuron, we should write a function to do that. Write a function called `makePSTH` that takes 3 inputs: `spikes`, `triggers`, `edges` and returns `psth`, `raster` and `trials`.

TIP: make the function as general as possible... to compute the firing rate per bin you need to divide by the bin time. Is there a way to do that from just the three variables we've passed?

In [None]:
# makePSTH function
def makePSTH(spikes,triggers,edges):
    
    # preallocate
    raster = []
    trials = []
    psth = np.empty([len(triggers),len(edges)-1])

    # function body here

Like last time, lets define `edges` in steps of .001 between -.1 and 3.1s, and then define a vector called `time` with the edge centers. Take some spikes from unit 26 and use your function to make `psth`, `raster`, and `trials`. Scatter plot the raster to make sure this worked well.

In [None]:
# compute psth, raster
c = 26


# plot


Next we want to compute PSTHs for all of our neurons and save them to a data struct. There are a lot of ways to do this, but the way we can first try is by saving them in a 3d array (that is, n trials x m timepoints x p neurons).

**Try this:**
1. Make an **array** called `cellID` that contains the unique cell ID numbers for all our our neurons in `clustInfo`.
2. Loop through this variable. On each loop, extract spikes for that cell and use your PSTH function to make a PSTH (don't make raster,trials for now)
3. Save each cell's PSTH to a new matrix called `allPSTH`, like described above. Remember to preallocate an **empty** version of this matrix.
4. This might take a little while to run, print the iteration each time the loop runs to keep track of your progress.

In [None]:
# answer here

The benefit of putting everything in an array is that we can really easily index and average data using numpy. 

**Try this:**
1. As a first pass, make an index for laser ON/OFF trials.
2. Then index `allPSTH` and average to find the mean response over all neurons when the laser is on vs when it is off. You can save your responses to `meanLaserOff` and `meanLaserOn`. HINT: you need to average over more than 1 dimension!
3. Plot these cell averages on top of each other. Give them a legend and axis labels.

In [None]:
# index and means

# plot


Woah! It should look pretty funky, with the laserOn plot having very stereotyped bursting activity. This is because the laser was pulsed at 25Hz and it looks like there are some optotagged units in this recording! (that means that the unit will be activated on each laser pulse).

Lets try to separate these units out from the rest.

**Try this:**
1. Compute new averages, averaging over the entire trace on laser off vs laser on trials for each cell. Your ending vectors will be length of the number of neurons, 49. Call them `cellLaserOff` and `cellLaserOn`.
2. Compute a difference vector, subtracting the firing rate when the laser is off from when it is on, call this `dLaser`

In [None]:
# compute the average response difference between laser off and laser on trials for each neuron


Now you can use `dLaser` as an index to separate cells that are activated by the laser vs suppressed by the laser.

1. Make two new PSTHs, one called `supPSTH` which contains neurons that are suppressed by laser, and another called `actPSTH` which contains only activated neurons.
2. Get the population means for laser on and off trials for each of these sets of neurons.
3. On two subplots, plot laser on vs laser off traces for each population. Color the activated subplot in black for laser off and black, dashed lines for on. Do the same for the suppressed neurons subplot, but in blue.

In [None]:
# separate out these cells into new PSTH matrices for optotagged vs suppressed neurons

# get new means

# plot


This simple way of filtering the data looks like it did an ok job of filtering out the different cell types. However, it's not perfect (see in the laser on condition in suppressed cells, there is still some laser-locked activity). Maybe we can try to refine this method a bit by using statistical tests to tell whether a cell significantly changes its activity.

We can do this by using a ranksum test for each cell, across all trials, and then asking whether the difference is significant, and in what direction. We can use the ranksums function in the scipy.stats module to do this. To run the test, lets just look at the time window from 0 to 3s.

In [None]:
import scipy.stats

# get a mean per trial for one neuron in the time window
c = 32

# run a rank sum test for this neuron


Here, f and p are results of the test. In our case, f is the test statistic, and its sign will indicate whether the cell was activated by laser (positive f) or suppressed (negative). The p value tells us if our cell was significantly modulated by laser.

That said, we want information for each cell. Use a for loop to calculate the test statistics and p-values for all the neurons. Save your results to variables called `fVal` and `pVal`.

In [None]:
fVal = np.empty(cellID.shape)
pVal = np.empty(cellID.shape)

# loop

Now we have another cell selection criteria we can use, whether the cell was significantly modulated by laser. However, we just ran quite a few tests, by chance some of them are false positive for significance. There are many ways to correct for this, but lets use the simplest one that is also quite strict, the Bonferonni correction, which just states that our new significance value is our previous criterion for significance, divided by the number of tests we ran.

In [None]:
# corrected alpha

Using this new criterion, nearly all of the cells are still significant (this is probably due to the large number of trials we had). By my count, only one cell failed the test, lets look at that cell:

**try this:**
1. Make an index that finds cells violating our corrected alpha level, call it `pI`.
2. Plot the average traces for this cell, does the test result make sense to you?

In [None]:
# answer here

It looks like that previous cell didn't have much activity, so it probably had quite a bit of variability over trials that contributed to it failing the test.

So far, we've organized our data in arrays, but once we get to this population level, maybe panda dataframes are a useful way to organize and select neurons for analysis, let's try this out. Remember, we already have some info about our neurons in a dataframe called `clustInfo`. Lets look at that again:

In [None]:
clustInfo.head()

Next, lets try adding information to this data frame about the activity of our neurons. First, make a `.copy()` of your dataframe called `cellData` and add columns for the results of our ranksum test, we'll call them `laserFval`, `laserPval` and `laserSig` for the f values, p values, and significant pvals respectively.

In [None]:
# answer here

Let's also add some information about the activity of each cell in each condition. We have 2x2 conditions, laser On/Off and contrast Low/High. Let's populate new columns in our matrix with the mean firing rate in each unique condition, calling our columns `frLowOFF`, `frLowON`, `frHighOff`, `frHighON`.

**Try this:**
1. Create indices for all 4 conditions.
2. Loop through each neuron, get the mean firing rate from 0 - 3s for each unique condition.
3. Add the new data to the columns mentioned above.

In [None]:
# create trial indexes

# index out the time period we want

# for each neuron, find mean fr in each condition
frLowOff = []
frLowOn = []
frHighOff = []
frHighOn = []
# loop here
    
# add to data frame


Ok, remember before when we wanted to test whether a neuron was significantly affected by the laser? This can come in handy now when we want to summarize data across the population. Common practice is to exclude cells that are either non-responsive, have low firing rates, etc.

The nice thing about the dataframe object is that we have all the information about our cells in one place, and it makes it simple to filter out neurons depending on your criteria.

**Try this:**
1. First create a copy of your unfiltered dataframe called `cellData_orig`, in case you want to revert back to your original data.
2. While we could filter by our ranksum test results, looking at the data you can see quite a few cells with very low firing rates, less than 1Hz. For now, lets modify `cellData` to remove cells with a rate below 1Hz. 
3. Note that the `firing_rate` column is a string format, to deal with this you will need to split this column and convert to a floating point number. I managed to do this in one messy line, but feel free to make masks/take whatever strategy you want.

In [None]:
# answer here

First lets make a scatterplot to summarize the effect of our conditions on firing rate. 

**Try this:**
1. Scatter plot the laser off (x) vs laser on (y) seprately for low and high contrast. Color the points to indicate the contrast condition (red for high, blue for low).
2. Add a unity line from [0 0] to [25 25]
3. Add a legend, axis labels, and title

**Bonus:** the current plot has two dots for each cell, one for low and one for high contrast.  To visualize the contrast change, plot lines between each red and blue dot to indicate cell identity. Make sure the lines are underneath (ie. occluded by) the scatter points (look up zorder).

In [None]:
# answer here

In the plot above, you can see that nearly all the neurons are suppressed, with the exception of those above the unity line. If you draw lines identifying each neuron, you can also see that high contrast generally elicits lower firing rates, as most of the red dots are shifted towards zero, relative to the blue dots.

We can also plot the data another way, where we our axes are the effect of contrast.

**Try this:**
Recreate the same plot, but scatter low vs. high contrast and color the dots black for no laser and cyan for laser. As usual, add a legend, title, and axis labels.

As a **bonus**, you can also add lines connecting each cell, but set their transparency to .3.

In [None]:
# answer here

Remember we had a bunch of neurons that were optotagged? We may want to remove those from our current analysis to look only at the effect of activating VGAT neurons on excitatory cells.

**Try this:**
1. Filter `cellData` to remove cells that are activated by the laser.
2. Replot the previous two graphs with our newly filtered data, but make them subplots this time.

In [None]:
# filter out activated neurons

# plots


So far we've visualized our data, but what about running statistics?

To look at the joint effect of contrast and laser on firing rate, the best test to run is a repeated measures, two-way ANOVA. This is because we are measuring a firing rate for each of our neurons in each condition. Like in Matlab, this is kind of a pain to do... we're going to need to reformat our data and also install a special package to do it.

Most repeated measures ANOVA packages that I could find take pandas dataframes as inputs, where one column is the dependent variable (firing rate), one column is the subject (cell ID), and each other column is an independent variable (for us, laser and contrast). 

While we do have our data in a dataframe already, the formatting is not quite right... we need to reformat it so that each row isn't all the info for a neuron, but contains the mean firing rate for a specific condition, and labels indicating the condition as well as cell ID. So, because we have 4 conditions, our new data frame should have 4 x nNeurons rows. For example, for the our first neuron, the dataframe should look something like this: 
<img src="cellDF.png">



To make this, I found it easiest to make lists called `cell`, `FR`, `laser`, and `contrast`, then looped through each neuron and each of the 4 conditions to append to these lists. After creating the lists, I added them to a dataframe. I'll give you some useful variables that you may want to use in your loop

In [None]:
colLabels = ['frLowOff','frLowOn','frHighOff','frHighOn']
laserLabels = ['off','on','off','on']
contrastLabels = ['low','low','high','high']

# make lists in loop

# add to data frame

Ok, that was a pain... now we're ready to do an ANOVA. First we need to install a new package using `pip`, python's package installation tool.

You can either run the following in your terminal:
`pip install pingouin`

Or running the next cell should also work, using a magic (%) command

In [None]:
%pip install pingouin

Now we can import the package and run the ANOVA. If this step doesn't work, you may need to restart your kernel to access the newly installed package.

In [None]:
import pingouin as pg

To run the anova, we'll use the `rm_anova` function in pingouin, you can either look at the help or use google to figure out how to format this command. Basically, we want to test whether the independent variables `laser` and `contrast` affect our dependent variable `FR` over repeated measures in `cell`

In [None]:
# anova


We can also run post hoc tests, look at the methods in `pg` to see what command to use for this.

In [None]:
# post-hoc tests
