#Integrating Allen Brain Atlas Data with structural and functional MRI

## Presented on behalf of Petra and Tim

### Petra Vertes ![](http://www.cnn.group.cam.ac.uk/directory/pv226@cam.ac.uk/image_normal) 
* MRC Research Fellow

### Timothy Rittman ![](http://www.neuroscience.cam.ac.uk/uploadedFiles/sm_tr332_phpeiafhB.jpg) 
* Clinical Fellow


![](https://github.com/KirstieJane/NSPN_WhitakerVertes_PNAS2016/blob/nhw-presentation/PRESENTATIONS/images/whitakervertes_etal_PNAS.PNG?raw=true)
http://dx.doi.org/10.1073/pnas.1601745113

![](https://github.com/KirstieJane/NSPN_WhitakerVertes_PNAS2016/blob/nhw-presentation/PRESENTATIONS/images/vertes_etal_RoyalSociety.PNG?raw=true)
http://dx.doi.org/10.1098/rstb.2015.0362

## MRI results

Showed cortical thinning and increases in intracortical myelination with age
![](https://github.com/KirstieJane/NSPN_WhitakerVertes_PNAS2016/blob/nhw-presentation/CT_MT_ANALYSES/COMPLETE/FIGS/COVARS_none/Figure2_LowRes.jpg?raw=true)

## Wanted to know how these changes related to gene expression

#### Important to know that we used a parcellation
![](https://github.com/KirstieJane/NSPN_WhitakerVertes_PNAS2016/blob/nhw-presentation/CT_MT_ANALYSES/COMBINED_FIGURES/PARCELLATION/PNGS/Parcellation_308_random_matched_hemis_FourHorBrains.png?raw=true)
#### 308 regions within Desikan-Killiany atlas atlas regions, no more than 500mm<sup>2</sup> surface area

## Downloaded data from 6 brains from the Allen Brain Institute website

![](https://github.com/KirstieJane/NSPN_WhitakerVertes_PNAS2016/blob/nhw-presentation/PRESENTATIONS/images/ABI_downloadpage.PNG?raw=true)



### Each brain has 5 files:
* **MicroarrayExpression.csv**
    * Contains normalized expression values
    * row: probes, column: samples
* **SampleAnnot.csv**
    * Contains location information for the samples
    * native MRI voxel coords and MNI
    * Rows correspond to columns in MicroarrayExpression file.
* Ontology
* PACall
* Probes

In [1]:
import pandas as pd

df = pd.read_csv('sample_data/MicroarrayExpression_100lines.csv', header=None)
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,937,938,939,940,941,942,943,944,945,946
0,1058685,3.615792,2.138074,2.480542,2.964972,2.679803,1.856238,2.280435,3.080857,2.628575,...,3.852665,3.849358,3.018556,3.203562,2.050227,3.48788,2.354469,2.586168,3.495279,3.306209
1,1058684,1.57438,1.687217,1.975735,2.089475,1.912586,1.601138,1.626724,1.855901,1.858343,...,1.698639,2.106493,1.573482,2.028703,2.058318,1.620506,1.802832,1.698847,1.83929,1.703562
2,1058683,1.596431,1.948371,2.19191,2.224042,2.223798,1.557563,1.940634,2.337132,2.253177,...,1.879796,1.576539,1.835648,1.664253,2.195771,1.832431,1.993473,1.864939,2.073033,1.907132
3,1058682,4.482883,6.606044,5.261559,4.013277,5.600743,5.624775,4.552105,4.276418,5.675885,...,4.336135,4.904766,4.305006,5.202678,4.121053,4.507,4.123025,4.020838,4.222393,4.523669
4,1058681,6.291312,8.14989,7.948218,6.964453,8.682156,7.753634,7.462767,6.998209,7.565414,...,6.999358,6.289043,6.515205,6.893379,6.47362,6.326008,6.264416,5.800701,5.901888,6.491646


In [2]:
df = pd.read_csv('sample_data/SampleAnnot.csv')

df.head()

Unnamed: 0,structure_id,slab_num,well_id,slab_type,structure_acronym,structure_name,polygon_id,mri_voxel_x,mri_voxel_y,mri_voxel_z,mni_x,mni_y,mni_z
0,4077,22,594,CX,PCLa-i,"paracentral lobule, anterior part, right, infe...",37470,87,52,116,5.9,-27.7,49.7
1,4323,11,2985,CX,Cl,"claustrum, right",40517,66,92,63,29.2,17.0,-2.9
2,4323,18,2801,CX,Cl,"claustrum, right",41516,66,81,104,28.2,-22.8,16.8
3,4440,18,2273,CX,LGd,"dorsal lateral geniculate nucleus, left",41473,116,94,101,-24.6,-24.6,1.3
4,4266,17,2785,CX,CA4,"CA4 field, right",41142,63,104,106,31.1,-31.3,-7.3


## Match up the MRI regions with the AIBS data

Used: https://github.com/rittman/maybrain


1. Flip all the data into the left hemisphere for all AIBS participants
    * *optional step: remove all samples in SampleAnnot that are not from cortex*
2. Find the sample with the closest MNI coordinate across all AIBS participants
3. Figure out the structure name for that region
4. Find ***all samples within that structure*** and average for all genes

#### Output is a data file that's has 308 columns and >20,000 genes

In [3]:
df = pd.read_csv('../DATA/PLS_gene_predictor_vars.csv')
df.head()

Unnamed: 0,Gene,0,1,2,3,4,5,6,7,8,...,298,299,300,301,302,303,304,305,306,307
0,61E3.4,-0.005118,-0.005118,0.062122,-0.467841,-0.230468,-0.230468,-0.467841,0.627615,0.627615,...,-0.005118,-0.083713,-0.083713,-0.679836,-0.465603,0.712125,-0.013261,0.263997,-0.152726,-0.152726
1,A1BG,0.282118,0.282118,0.031367,0.003455,-0.12091,-0.12091,0.003455,-0.065166,-0.065166,...,0.282118,0.194098,0.194098,0.135712,0.497182,0.01743,0.272689,0.202941,0.355027,0.355027
2,A1CF,0.35029,0.35029,-0.194133,0.117526,-0.149531,-0.149531,0.117526,0.20994,0.20994,...,0.35029,0.180962,0.180962,-0.934635,-0.212663,0.276695,0.230091,0.015073,-0.162013,-0.162013
3,A26C1B,-0.328495,-0.328495,-0.137907,-0.233823,0.124165,0.124165,-0.233823,1.115994,1.115994,...,-0.328495,-0.318345,-0.318345,0.577353,-1.677182,0.449524,-0.388438,-0.466139,-0.622249,-0.622249
4,A2BP1,0.141661,0.141661,-0.944449,-0.327485,0.001236,0.001236,-0.327485,0.872317,0.872317,...,0.141661,0.350908,0.350908,0.454547,-0.828417,0.404121,-0.392952,-0.511171,-0.578898,-0.578898


## Woah that's a lot of data

### Run Partial Least Squares regression

#### PLS is like principal components analyses, except you're looking for combinations of ***predictor variables*** (genes) that explain variance in the ***response variables***.

* Whitaker, Vertes et al: response variables were intercept and slope with age for cortical thickness and magnetisation transfer
    * 4 columns, 308 rows
* Vertes et al: response variables were intra-modular degree, inter-modular degree and mean connection distance of each node from functional network analysis
    * 3 columns, 308 rows
    
* Two outputs of PLS:
    * a weighting for each gene for each component
    * a score for each region

## Gene Ontology analyses

* Gorillia and Revigo
    * http://cbl-gorilla.cs.technion.ac.il/
    * http://revigo.irb.hr/

![](https://github.com/KirstieJane/NSPN_WhitakerVertes_PNAS2016/blob/nhw-presentation/PRESENTATIONS/images/GOrilla_homepage.PNG?raw=true)
![](https://github.com/KirstieJane/NSPN_WhitakerVertes_PNAS2016/blob/nhw-presentation/PRESENTATIONS/images/Revigo_homepage.PNG?raw=true)

## Results - structure

![](https://github.com/KirstieJane/NSPN_WhitakerVertes_PNAS2016/blob/nhw-presentation/CT_MT_ANALYSES/COMPLETE/FIGS/COVARS_none/Figure3_LowRes.jpg?raw=true)

## Results - function

![](https://github.com/KirstieJane/NSPN_WhitakerVertes_PNAS2016/blob/nhw-presentation/PRESENTATIONS/images/vertes_etal_RoyalSociety_Figure3_cropped.jpg?raw=true)
![](https://github.com/KirstieJane/NSPN_WhitakerVertes_PNAS2016/blob/nhw-presentation/PRESENTATIONS/images/vertes_etal_RoyalSociety_Figure4.jpg?raw=true)


## All the code and is available at GitHub:

https://github.com/KirstieJane/NSPN_WhitakerVertes_PNAS2016/

# Thank you!