# Final Project
*Replace this with the name of your project*

## Team Member Names & Contributions
*Feel free to name your team, but please also include your real names and IDs here. Please specify who in your group worked on which parts of the project.*

- **Captain Marvel**: You know, blowing up things and such.
- **Ant Man**: Cleverly sneaking into small spaces
- **Hulk**: AKA The "Muscle"

## Abstract

*Fill in your 3-4 sentence abstract here*

## Research Question

*Fill in your research question here*

# Background and Prior Work

*Fill in your background and prior work here (~500 words). You are welcome to use additional subheadings. You should also include a paragraph describing each dataset and how you'll be using them.* 

### References (include links):
(1)

(2)

## Hypothesis


*Fill in your hypotheses here*

## Setup
*Are there packages that need to be imported, or datasets that need to be downloaded?*

In [1]:
#Import the necessary toolboxes from the allensdk
from allensdk.core.cell_types_cache import CellTypesCache
from allensdk.api.queries.cell_types_api import CellTypesApi

#Import necessary packages 
#Plotting packages
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns

#For manipulating data
from scipy import stats
import numpy as np
import pandas as pd
from sklearn.decomposition import PCA


## Data Wrangling

At first we have to get the data from the CellTypes database.  

In [2]:
#Initialize the cache as 'ctc' (cell types cache)
ctc = CellTypesCache(manifest_file='cell_types/manifest.json')

#We will be using cre-lines to choose excitatory and inhibitory cells from the Cell Types Database 
#so we will only import the mouse cells data api because they haven't created cre-line humans  ........ yet
#With this mouse data we will create a dataframe with the 'id' as its index so we can manipulate it
mouse_df = pd.DataFrame(ctc.get_cells(species=[CellTypesApi.MOUSE]))

#Get the physiology data and make a dataframe with 'specimen_id' the index so the two dataframes can be joined
ephys_features = pd.DataFrame(ctc.get_ephys_features())

#ephys_features.head()

This cell will explain what we just did.  

In [3]:
#This renames the 'id' column to 'specimen_id' so I don't get a headahce when I try to merge them.
mouse_df = mouse_df.rename(columns={"id": "specimen_id"})

#This picks out the columns for the creline and brain location information we need from the mouse dataframe to use
mouse_crelines_df = mouse_df[['transgenic_line', 'specimen_id', 'structure_layer_name', 'structure_area_abbrev']]

mouse_crelines_df.head()

Unnamed: 0,transgenic_line,specimen_id,structure_layer_name,structure_area_abbrev
0,Oxtr-T2A-Cre,565871768,5,VISp
1,Pvalb-IRES-Cre,469801138,4,VISp
2,Slc32a1-T2A-FlpO|Vipr2-IRES2-Cre,605889373,2/3,VISp
3,Cux2-CreERT2,485909730,5,VISp
4,Scnn1a-Tg3-Cre,323865917,5,VISp


This is just a little step where we merge the data from the CellTypes database and the Electrophysiology database based on the specimen's id so that we have all the data in one place.  

In [4]:
#Here we merge them
mouse_crelines_ephys_data = mouse_crelines_df.merge(ephys_features,on='specimen_id', how='inner')
mouse_crelines_ephys_data.head()

Unnamed: 0,transgenic_line,specimen_id,structure_layer_name,structure_area_abbrev,adaptation,avg_isi,electrode_0_pa,f_i_curve_slope,fast_trough_t_long_square,fast_trough_t_ramp,...,trough_t_ramp,trough_t_short_square,trough_v_long_square,trough_v_ramp,trough_v_short_square,upstroke_downstroke_ratio_long_square,upstroke_downstroke_ratio_ramp,upstroke_downstroke_ratio_short_square,vm_for_sag,vrest
0,Oxtr-T2A-Cre,565871768,5,VISp,,,-2.825,0.020384,1.0725,14.73798,...,14.738,1.391268,-59.281254,-57.468754,-75.756252,1.564027,1.304349,1.67955,-87.906258,-74.926987
1,Pvalb-IRES-Cre,469801138,4,VISp,0.000643,12.5075,27.185625,1.156789,1.272425,11.763725,...,11.763808,1.290815,-55.875,-52.515627,-69.109379,1.162618,1.197155,1.369171,-80.15625,-72.042976
2,Slc32a1-T2A-FlpO|Vipr2-IRES2-Cre,605889373,2/3,VISp,-0.015098,78.950909,17.437501,0.191853,1.87984,8.427893,...,8.43294,1.31551,-48.1875,-54.364586,-72.640628,3.379321,4.108774,2.680139,-83.593758,-72.712036
3,Cux2-CreERT2,485909730,5,VISp,0.03234,55.895,-55.964379,0.25,1.112495,2.853377,...,2.888133,1.520193,-54.031254,-57.385419,-77.750005,3.042933,3.517684,3.274181,-101.0,-76.928391
4,Scnn1a-Tg3-Cre,323865917,5,VISp,0.026732,94.2335,96.42187,0.164286,1.197855,3.4231,...,3.467847,1.317042,-57.281254,-56.895833,-70.218751,2.974194,3.156117,2.946463,-88.40625,-69.402855


We now want to isolate the creline, layer and brain location that we want to compare in the databases.  Here we are isolating the electrophysiological data for Pvalb-IRES-Cre creline inhibitory neurons in layer 4 of the primary visual cortex and Scnn1a-Tg3-Cre also in layer 4 of the primary visual cortex.

In [5]:
#Each line of code here isolates the rows matching the inhibitory neuron data that we want from the appropiate column
mouse_inhib_df1 = mouse_crelines_ephys_data[(mouse_crelines_ephys_data['transgenic_line'] == 'Pvalb-IRES-Cre')]
mouse_inhib_df2 = mouse_inhib_df1[(mouse_inhib_df1['structure_area_abbrev'] == 'VISp')]
mouse_inhib_ephys = mouse_inhib_df2[(mouse_inhib_df2['structure_layer_name'] == '4')]

mouse_inhib_ephys.head()

Unnamed: 0,transgenic_line,specimen_id,structure_layer_name,structure_area_abbrev,adaptation,avg_isi,electrode_0_pa,f_i_curve_slope,fast_trough_t_long_square,fast_trough_t_ramp,...,trough_t_ramp,trough_t_short_square,trough_v_long_square,trough_v_ramp,trough_v_short_square,upstroke_downstroke_ratio_long_square,upstroke_downstroke_ratio_ramp,upstroke_downstroke_ratio_short_square,vm_for_sag,vrest
1,Pvalb-IRES-Cre,469801138,4,VISp,0.000643,12.5075,27.185625,1.156789,1.272425,11.763725,...,11.763808,1.290815,-55.875,-52.515627,-69.109379,1.162618,1.197155,1.369171,-80.15625,-72.042976
117,Pvalb-IRES-Cre,487405644,4,VISp,0.000996,19.362255,-33.476246,0.772348,1.09425,8.71885,...,8.718913,1.454521,-58.15625,-56.854168,-70.156253,1.27207,1.26661,1.404371,-85.468758,-69.590126
152,Pvalb-IRES-Cre,478793814,4,VISp,0.003198,17.355714,11.16375,0.716402,1.056425,11.89613,...,11.896288,1.531172,-59.531254,-59.354168,-78.781253,1.393663,1.402116,1.609452,-95.34375,-78.447914
210,Pvalb-IRES-Cre,484744867,4,VISp,0.006846,4.658478,72.551243,2.111776,1.024225,,...,,1.285388,-74.0,,-71.387502,0.935694,,0.997145,-100.875008,-70.245308
214,Pvalb-IRES-Cre,475894121,4,VISp,,14.98,5.5025,0.780911,1.05067,,...,,1.729035,-56.281254,,-77.864586,1.265073,,1.635448,-99.187508,-78.509483


In [7]:
#Each line of code here isolates the rows matching the inhibitory neuron data that we want from the appropiate column
mouse_excit_df1 = mouse_crelines_ephys_data[(mouse_crelines_ephys_data['transgenic_line'] == 'Scnn1a-Tg3-Cre')]
mouse_excit_df2 = mouse_excit_df1[(mouse_excit_df1['structure_area_abbrev'] == 'VISp')]
mouse_excit_ephys = mouse_excit_df2[(mouse_excit_df2['structure_layer_name'] == '4')]

mouse_excit_ephys.head()

Unnamed: 0,transgenic_line,specimen_id,structure_layer_name,structure_area_abbrev,adaptation,avg_isi,electrode_0_pa,f_i_curve_slope,fast_trough_t_long_square,fast_trough_t_ramp,...,trough_t_ramp,trough_t_short_square,trough_v_long_square,trough_v_ramp,trough_v_short_square,upstroke_downstroke_ratio_long_square,upstroke_downstroke_ratio_ramp,upstroke_downstroke_ratio_short_square,vm_for_sag,vrest
8,Scnn1a-Tg3-Cre,476135066,4,VISp,0.098665,113.287143,-68.804996,0.139337,1.151005,5.805325,...,5.83725,1.416055,-54.281254,-55.656254,-75.0625,3.386317,3.171764,3.857029,-87.000008,-75.066132
22,Scnn1a-Tg3-Cre,470098860,4,VISp,0.028417,61.052667,-3.065625,0.227151,1.13955,4.441972,...,4.464322,1.495018,-60.656254,-58.97917,-80.862502,3.278334,3.049282,3.712753,-99.187508,-80.129494
25,Scnn1a-Tg3-Cre,479091820,4,VISp,0.258177,104.791667,-10.13375,-0.003381,1.18913,5.958838,...,5.989438,1.352083,-56.15625,-52.583335,-78.898438,3.139299,2.89833,3.893947,-94.625,-78.553154
65,Scnn1a-Tg3-Cre,515202564,4,VISp,0.015462,71.547692,-12.66,0.141274,1.23948,5.741353,...,5.782207,1.605676,-59.250004,-55.645833,-78.375005,3.824073,3.997622,3.83065,-92.0625,-77.858864
211,Scnn1a-Tg3-Cre,476269122,4,VISp,,,-14.635,0.114082,1.1819,4.681013,...,4.71725,1.424983,-54.71875,-57.208337,-76.156255,3.214515,2.78979,3.783612,-88.6875,-76.060822


Not every data set is perfect so some columns have NA instead of actual numbers.  Here we clean up the data so that we don't have to deal with that.  

## Data Analysis & Results

Include cells that describe the steps in your data analysis.

In [31]:
## YOUR CODE HERE
## FEEL FREE TO ADD MULTIPLE CELLS PER SECTION

## Conclusion & Discussion

*Fill in your discussion information here*