# Generate correlations dataset

This notebook will create a nice tidy csv file with the following features for mouse and/or human cells in the Allen Cell Types data:
- (estimated) cell surface area
- dendrite type (spiny, aspiny, or sparsely spiny)
- input resistance
- rheobase
- tau

If you want to skip to working with this data (which, you probably should) go to the **Compare cell features** notebook instead! This is just here so you know where the data came from.

## Setup

In [None]:
# Import our plotting package from matplotlib
import matplotlib.pyplot as plt

# Specify that all plots will happen inline & in high resolution
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

# Import pandas for working with databases
import pandas as pd

# Import numpy below
import numpy as np

The Allen Institute has compiled a set of code and tools called a **Software Development Kit** (SDK). We need to make sure that you have this installed in your environment.

See [Technical Notes](#technical) at the end of this notebook for more information about working with the AllenSDK.

If you receive an error, there are additional instructions on how to install the SDK locally <a href="https://allensdk.readthedocs.io/en/latest/install.html">here</a>.

In [None]:
# This will ensure that the AllenSDK is installed.
try:
    import allensdk
    if allensdk.__version__ == '2.2.0':
        print('allensdk version ' + allensdk.__version__ + ' already installed')
    else:
        print('allensdk installed with an older version. some features may not work.')
except ImportError as e:
    !pip install allensdk

allensdk version 2.2.0 already installed


Now that we have the allensdk installed, we can `import` the CellTypesCache module. This module provides tools to allow us to get information from the Cell Types database.

The CellTypesCache that we're importing provides tools to allow us to get information from the cell types database. We're giving it a **manifest** filename as well. CellTypesCache will create this manifest file, which contains metadata about the cache. If you want, you can look in the cell_types folder in your code directory and take a look at the file.

In [None]:
#Import the "CellTypesCache" and "CellTypesApi" from the AllenSDK core package
from allensdk.core.cell_types_cache import CellTypesCache
from allensdk.api.queries.cell_types_api import CellTypesApi


#Initialize the cache as 'ctc' (cell types cache)
ctc = CellTypesCache(manifest_file='cell_types/manifest.json')

print('CellTypesCache imported.')

CellTypesCache imported.


In [None]:
# Let's now get all of the electrophysiology data for the mouse and human cells, separately.

# Get the ephys features make a dataframe out of it
ephys_features = ctc.get_ephys_features()
ephys_features_df = pd.DataFrame.from_records(ephys_features)

# grab mouse data and merge with dataframe
mouse_df = pd.DataFrame(ctc.get_cells(species=[CellTypesApi.MOUSE]))
mouse_ephys_df = pd.merge(mouse_df,ephys_features_df,left_on='id',right_on='specimen_id',how='left')

# grab human data and merge with dataframe
human_df = pd.DataFrame(ctc.get_cells(species=[CellTypesApi.HUMAN]))
human_ephys_df = pd.merge(human_df,ephys_features_df,left_on='id',right_on='specimen_id',how='left')

# Show the first five rows of the human dataframe
human_ephys_df.head()

Unnamed: 0,reporter_status,cell_soma_location,species,id_x,name,structure_layer_name,structure_area_id,structure_area_abbrev,transgenic_line,dendrite_type,...,trough_t_ramp,trough_t_short_square,trough_v_long_square,trough_v_ramp,trough_v_short_square,upstroke_downstroke_ratio_long_square,upstroke_downstroke_ratio_ramp,upstroke_downstroke_ratio_short_square,vm_for_sag,vrest
0,,"[273.0, 354.0, 216.0]",Homo Sapiens,525011903,H16.03.003.01.14.02,3,12113,FroL,,spiny,...,4.134987,1.375253,-53.968754,-59.51042,-71.197919,2.895461,2.559876,3.099787,-88.843758,-70.561035
1,,"[69.0, 254.0, 96.0]",Homo Sapiens,528642047,H16.06.009.01.02.06.05,5,12141,MTG,,aspiny,...,,1.05116,-67.468758,,-70.875002,1.891881,,1.989616,-101.0,-69.20961
2,,"[322.0, 255.0, 92.0]",Homo Sapiens,537256313,H16.03.006.01.05.02,4,12141,MTG,,spiny,...,5.694547,1.3899,-52.125004,-51.520836,-72.900002,3.121182,3.464528,3.054681,-87.53125,-72.628105
3,,"[79.0, 273.0, 91.0]",Homo Sapiens,519832676,H16.03.001.01.09.01,3,12141,MTG,,spiny,...,9.96278,1.21102,-53.875004,-52.416668,-73.693753,4.574865,3.817988,4.980603,-84.218758,-72.547661
4,,"[66.0, 220.0, 105.0]",Homo Sapiens,596020931,H17.06.009.11.04.02,4,12141,MTG,,aspiny,...,14.66734,1.336668,-63.593754,-63.239583,-75.518753,1.45289,1.441754,1.556087,-82.53125,-74.260269


## Get the good stuff out of the dataset

We'll have to do a bit of work to get interesting features like estimated cell surface area and the rheobase out of the data. The cell below will download *a lot* of data. You should only run it if you'd really like to save a bunch of data on your computer. This repository already contains .csv files with the output of this!

In [None]:
# Choose a dataframe we created above, mouse or human
df = mouse_ephys_df

cell_surface_area = []
dendrite_type = []
input_resistance = []
tau = []
rheobase = []

for i in range(len(df)):
    this_cell = df['specimen_id'][i]
    if df['reconstruction_type'][i] == 'full':
        try:
            morphology = ctc.get_reconstruction(this_cell)
            this_cell_radius = morphology.soma['radius']
            cell_surface_area.append(4*np.pi*this_cell_radius*this_cell_radius)
            input_resistance.append(df['input_resistance_mohm'][i])
            tau.append(df['tau'][i])
            dendrite_type.append(df['dendrite_type'][i])

            # All of this to get the rheobase.
            rheobase_sweep_number = df['rheobase_sweep_number'][i]
            this_data = ctc.get_ephys_data(this_cell)
            rheo_sweep_meta = this_data.get_sweep_metadata(rheobase_sweep_number)
            rheobase.append(rheo_sweep_meta['aibs_stimulus_amplitude_pa'])

        except:
            pass

2020-09-30 16:40:41,401 allensdk.api.api.retrieve_file_over_http INFO     Downloading URL: http://api.brain-map.org/api/v2/well_known_file_download/659446331


In [None]:
if len(rheobase) == len(tau) == len(cell_surface_area) == len(input_resistance) == len(dendrite_type):
    # Get organized
    fields = ['cell_surface_area','dendrite_type','rheobase','input_resistance','tau']
    dataset = pd.DataFrame(list(zip(cell_surface_area,dendrite_type,rheobase,input_resistance,tau)),columns=fields)
else:
    print(len(rheobase))
    print(len(input_resistance))
    print(len(cell_surface_area))
    print(len(tau))

dataset.head()

Unnamed: 0,cell_surface_area,dendrite_type,rheobase,input_resistance,tau
0,542.046417,aspiny,629.999997,107.630696,5.515311
1,494.03623,aspiny,29.999999,209.605296,62.705039
2,302.248338,aspiny,50.000001,594.843904,10.239005
3,605.346305,aspiny,50.000001,218.831968,45.660687
4,298.096557,spiny,30.000002,338.62912,35.177373


## Save the dataset to a csv

In [None]:
dataset.to_csv('mouse_cell_metrics.csv',index=False)