# Downloading Experimental Data 

This section will serve as a turorial on how to access and downlaod experimental data from the Allen Brain Mouse Connectivity Atlas. In this tutorial you will learn to download metadata by transgenic line and by injection struture. You will also learn about the importance of structure sets as well as the StructureTree class. By the end of this tutorial you will be ready to use this downloaded data for possible analyses.  

In [1]:
# Import common packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline 
print('Packages imported.')

Packages imported.


To work with the connectivity data through the SDK, we first need to `import` the [MouseConnectivityCache class](https://alleninstitute.github.io/AllenSDK/connectivity.html). This module provides metadata about the mouse connectivty database and will enable us to work with the data.

In [2]:
# Import the MouseConnectivityCache
import allensdk
from allensdk.core.mouse_connectivity_cache import MouseConnectivityCache

# Create an instance of the class and assign it to a variable, mcc
mcc = MouseConnectivityCache(manifest_file='connectivity/mouse_connectivity_manifest.json')
print(mcc)

<allensdk.core.mouse_connectivity_cache.MouseConnectivityCache object at 0x7ffe07783610>


## Download experimental metadata by transgenic line

Now that we have our instance of the mouse connectivity cache, we can start downloading our experimental metadata. To do this, we will call `get_experiments()` on our connectivity instance. We'll use the argument `dataframe=True` to automatically make this a dataframe when it is created. 

In [3]:
# Gather all the experiments with transgenic as well as wildtype mice
mouse_exp_df = mcc.get_experiments(dataframe=True)
mouse_exp_df.head()

Unnamed: 0_level_0,gender,injection_structures,injection_volume,injection_x,injection_y,injection_z,product_id,specimen_name,strain,structure_abbrev,structure_id,structure_name,transgenic_line,transgenic_line_id,id,primary_injection_structure
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
527712447,F,"[502, 926, 1084, 484682470]",0.006655,9240,3070,8990,5,Penk-IRES2-Cre-neo-249961,C57BL/6J,SUB,502,Subiculum,Penk-IRES2-Cre-neo,298725927.0,527712447,502
301875966,M,"[574, 931]",0.105746,9170,6850,6200,5,Gabrr3-Cre_KC112-3467,C57BL/6J,PG,931,Pontine gray,Gabrr3-Cre_KC112,177838877.0,301875966,574
520336173,M,"[1, 210, 491, 525, 1004]",0.025762,7810,6550,6450,5,Hdc-Cre_IM1-204103,,TMv,1,"Tuberomammillary nucleus, ventral part",Hdc-Cre_IM1,177839494.0,520336173,1
307160976,F,[304325711],0.01359,5580,7040,6270,31,Cdh4-Cre-215,,retina,304325711,retina,Cdh4-CreER,308603472.0,307160976,304325711
644250774,F,[329],0.006724,6990,2150,2130,36,A930038C07Rik-Tg1-Cre-347488,,SSp-bfd,329,"Primary somatosensory area, barrel field",A930038C07Rik-Tg1-Cre,177838542.0,644250774,329


This gives us metadata on all the expereiments in the dataset. Alternatively, you can specify within the method wether you would like to filter certain experiments by `transgenic_line`. Let's take a look at what trangenic lines are available to us. 

In [4]:
transgenic_lines = mouse_exp_df['transgenic_line'].unique()
print(transgenic_lines)

['Penk-IRES2-Cre-neo' 'Gabrr3-Cre_KC112' 'Hdc-Cre_IM1' 'Cdh4-CreER'
 'A930038C07Rik-Tg1-Cre' 'Ai75(RCL-nT)' 'Scnn1a-Tg2-Cre' 'Etv1-CreERT2'
 None 'Slc17a6-IRES-Cre' 'Nos1-CreERT2' 'Gad2-IRES-Cre' 'Grm2-Cre_MR90'
 'Prkcd-GluCla-CFP-IRES-Cre' 'Gpr26-Cre_KO250' 'Tlx3-Cre_PL56'
 'Chrna2-Cre_OE25' 'Ins2-Cre_25' 'Syt6-Cre_KI148' 'Calb2-IRES-Cre'
 'Rbp4-Cre_KL100' 'Cux2-IRES-Cre' 'Emx1-IRES-Cre' 'Dbh-Cre_KH212'
 'Slc6a4-CreERT2_EZ13' 'Grp-Cre_KH288' 'Slc32a1-IRES-Cre' 'Gng7-Cre_KH71'
 'Oxt-IRES-Cre' 'Ntsr1-Cre_GN220' 'Pdzk1ip1-Cre_KD31' 'Slc17a8-IRES2-Cre'
 'Grik4-Cre' 'Cart-Tg1-Cre' 'Ntrk1-IRES-Cre' 'Chat-IRES-Cre-neo'
 'Scnn1a-Tg3-Cre' 'Ppp1r17-Cre_NL146' 'Gnrh1-Cre' 'Npr3-IRES2-Cre'
 'Sim1-Cre_KJ18' 'Pvalb-IRES-Cre' 'Hcrt-Cre' 'Cort-T2A-Cre'
 'Cnnm2-Cre_KD18' 'Drd1a-Cre_EY262' 'Calb1-T2A-dgCre' 'Syt17-Cre_NO14'
 'Pcp2-Cre_GN135' 'Drd3-Cre_KI196' 'Slc6a4-Cre_ET33' 'Htr2a-Cre_KM207'
 'Slc17a7-IRES2-Cre' 'Crh-IRES-Cre_BL' 'Nxph4-T2A-CreERT2' 'Vipr2-Cre_KE2'
 'Sst-IRES-Cre' 'Oxtr-T2A-Cre' 'Gal

Let's start by creating a dataframe that only contains experiments with the first three Cre lines in the list above *(Penk-IRES2-Cre-neo, Gabrr3-Cre_KC112, Hdc-Cre_IM1)*. You can change the Cre lines by changing the values in the list assigned to `transgenic_lines`. Remember to copy the Cre line of interest exactly, including the single quotes. We'll then use this list in the argument `cre = ` in our call to `get_experiments`.

In [5]:
# Choose your Cre lines
transgenic_lines = ['Penk-IRES2-Cre-neo','Gabrr3-Cre_KC112','Hdc-Cre_IM1'] 

# Filter experiments from only the first 3 cre lines 
transgenic_line_df = mcc.get_experiments(cre = transgenic_lines, dataframe=True)

# Print the length of our dataframe 
print('There are' + ' ' + str(len(transgenic_line_df)) + ' ' + 'experiments in these Cre lines: ' + str(transgenic_lines))

transgenic_line_df.head()

There are 39 experiments in these Cre lines: ['Penk-IRES2-Cre-neo', 'Gabrr3-Cre_KC112', 'Hdc-Cre_IM1']


Unnamed: 0_level_0,gender,injection_structures,injection_volume,injection_x,injection_y,injection_z,product_id,specimen_name,strain,structure_abbrev,structure_id,structure_name,transgenic_line,transgenic_line_id,id,primary_injection_structure
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
527712447,F,"[502, 926, 1084, 484682470]",0.006655,9240,3070,8990,5,Penk-IRES2-Cre-neo-249961,C57BL/6J,SUB,502,Subiculum,Penk-IRES2-Cre-neo,298725927,527712447,502
301875966,M,"[574, 931]",0.105746,9170,6850,6200,5,Gabrr3-Cre_KC112-3467,C57BL/6J,PG,931,Pontine gray,Gabrr3-Cre_KC112,177838877,301875966,574
520336173,M,"[1, 210, 491, 525, 1004]",0.025762,7810,6550,6450,5,Hdc-Cre_IM1-204103,,TMv,1,"Tuberomammillary nucleus, ventral part",Hdc-Cre_IM1,177839494,520336173,1
602828622,F,[993],0.014113,2510,2440,3850,45,Penk-IRES2-Cre-neo-321021,,MOs,993,Secondary motor area,Penk-IRES2-Cre-neo,298725927,602828622,993
168664192,F,"[91, 217, 372, 867, 920, 928, 589508455]",0.062617,10890,3280,7430,5,Gabrr3-Cre-123,,IP,91,Interposed nucleus,Gabrr3-Cre_KC112,177838877,168664192,91


## Download experimental metadata by injection structure 

We can also filter out the experiments by the `injection_structure_ids`. If the IDs of the injection structures are already known, one can input the list of ID numbers to filter out the experiments as so:

In [6]:
# Primary Motor Area experiments have ID 985
MOp_df = mcc.get_experiments(injection_structure_ids = [985], dataframe=True)
MOp_df.head()

Unnamed: 0_level_0,gender,injection_structures,injection_volume,injection_x,injection_y,injection_z,product_id,specimen_name,strain,structure_abbrev,structure_id,structure_name,transgenic_line,transgenic_line_id,id,primary_injection_structure
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
272697944,M,"[985, 993]",0.371044,3940,3410,7780,5,378-1862,C57BL/6J,MOp,985,Primary motor area,,,272697944,985
180720175,M,"[337, 361, 894, 985, 993]",1.015068,5760,820,6460,5,378-1823,C57BL/6J,MOp,985,Primary motor area,,,180720175,337
100141563,M,"[337, 985]",0.172596,5370,1590,7490,5,378-697,C57BL/6J,MOp,985,Primary motor area,,,100141563,337
288169135,M,"[985, 993]",0.316101,4200,1670,7070,5,Efr3a-Cre_NO108-84,FVB.CD1(ICR),MOp,985,Primary motor area,Efr3a-Cre_NO108,182761781.0,288169135,985
127084296,M,[985],0.344318,5160,2270,6980,5,378-1401,C57BL/6J,MOp,985,Primary motor area,,,127084296,985


## Get structure IDs

In order to use the `injection_structure_ids` argument above, you need the structure IDs. The MouseConnectivityCache has a method for retrieving the adult mouse structure tree as an StructureTree class instance. The StructureTree class has many methods that allows you to access lists of brain structures through their ID, name, acronym, and many other properties. This is done by executing the `get_structure_tree()` method on your MouseConnectivityCache instance (`mcc`).

Below we will access information on the hypothalamus via its name by calling `get_structures_by_name()` on our StructureTree instance. 

In [7]:
# Grab the StructureTree instance
structure_tree = mcc.get_structure_tree()

# Get info on hypothalamus by its name 
hypothalamus = structure_tree.get_structures_by_name(['Hypothalamus'])[0]
hypothalamus

{'acronym': 'HY',
 'graph_id': 1,
 'graph_order': 715,
 'id': 1097,
 'name': 'Hypothalamus',
 'structure_id_path': [997, 8, 343, 1129, 1097],
 'structure_set_ids': [2,
  112905828,
  691663206,
  12,
  184527634,
  112905813,
  687527670,
  114512891,
  114512892],
 'rgb_triplet': [230, 68, 56]}

This gives us a dictionary with metadata about our brain structure of interest. 

So far, we know that the Primary Motar Area is a brain structure available to us in our experiments, but what about the rest of the brain structures? How do we find what are all the brain structures availabe to us? To do so, we can take a look at the unique values under the `name` column, in our summary of brain structures. 

**Note:** we will go over structure set ids, `get_structure_sets()`, and the `get_structures_by_set_id()` methods later in this notebook. We will just be using `get_structures_by_set_id()` to access our Summary Structures Data.

In [8]:
# From the above table, "Brain - Summary Structures" has ID 167587189
summary_structures = structure_tree.get_structures_by_set_id([167587189])
summary_structures_df = pd.DataFrame(summary_structures)

# Determine how many different structures are within our experiments 
structure_name = summary_structures_df['name'].unique()
print("%d Total Available Brain Structures" % len(structure_name))

# print the first 20 brain structures in our data
print(structure_name[:19])

316 Total Available Brain Structures
['Frontal pole, cerebral cortex' 'Primary motor area'
 'Secondary motor area' 'Primary somatosensory area, nose'
 'Primary somatosensory area, barrel field'
 'Primary somatosensory area, lower limb'
 'Primary somatosensory area, mouth'
 'Primary somatosensory area, upper limb'
 'Primary somatosensory area, trunk'
 'Primary somatosensory area, unassigned'
 'Supplemental somatosensory area' 'Gustatory areas' 'Visceral area'
 'Dorsal auditory area' 'Primary auditory area' 'Posterior auditory area'
 'Ventral auditory area' 'Anterolateral visual area'
 'Anteromedial visual area']


We know that the Motar Cortex Area has ID 985, but what if we do not know the structure ID? That is not a hard probelm to fix. Like we did earlier, we can access a dictionary of metadata for our structure of interest using our StructureTree helper methods. 

In [9]:
# get info on Ventral tegmental area by its name 
VTA = structure_tree.get_structures_by_name(['Ventral tegmental area'])[0]

# specify the strucure id by indexixing into the 'id' of `VTA`
VTA_df = pd.DataFrame(mcc.get_experiments(injection_structure_ids = [VTA['id']]))
VTA_df.head()

Unnamed: 0,gender,injection_structures,injection_volume,injection_x,injection_y,injection_z,product_id,specimen_name,strain,structure_abbrev,structure_id,structure_name,transgenic_line,transgenic_line_id,id,primary_injection_structure
0,M,"[12, 100, 128, 214, 749, 795, 886, 946]",0.778492,8160,5040,5690,5,Erbb4-2A-CreERT2-D-5682,C57BL/6J,VTA,749,Ventral tegmental area,Erbb4-T2A-CreERT2,177838266.0,127867804,12
1,F,"[12, 128, 246, 591, 749]",0.252069,8630,5090,5930,5,Cck-IRES-Cre-91,,VTA,749,Ventral tegmental area,Cck-IRES-Cre,177839159.0,171021829,12
2,F,"[128, 214, 749, 795]",0.010615,8410,4830,6180,5,Th-Cre_FI172-135967,B6.FVB,VTA,749,Ventral tegmental area,Th-Cre_FI172,177837797.0,304337288,128
3,M,"[58, 128, 246, 374, 381, 749]",0.164738,8560,5140,6770,5,Slc18a2-Cre_OZ14-3970,,VTA,749,Ventral tegmental area,Slc18a2-Cre_OZ14,177837324.0,292958638,58
4,M,"[12, 100, 128, 197, 749, 795, 946, 607344830]",0.060049,8280,5220,5940,5,378-1474,C57BL/6J,VTA,749,Ventral tegmental area,,,127796728,12


## Putting it All Together 

Below is an example of how we can combine both filtering by Cre line and by injection structure to get a more refined set of data.

In [10]:
# select cortical experiments 
isocortex = structure_tree.get_structures_by_name(['Isocortex'])[0]

# same as before, but restrict the cre line
rbp4_cortical_experiments = mcc.get_experiments(cre=[ 'Rbp4-Cre_KL100' ], 
                                                injection_structure_ids=[isocortex['id']])

# convert to a dataframe 
rbp4_cortical_df = pd.DataFrame(rbp4_cortical_experiments).set_index('id')
rbp4_cortical_df.head()

Unnamed: 0_level_0,gender,injection_structures,injection_volume,injection_x,injection_y,injection_z,product_id,specimen_name,strain,structure_abbrev,structure_id,structure_name,transgenic_line,transgenic_line_id,primary_injection_structure
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
249402048,M,"[104, 345, 378, 1057]",0.321212,4410,4110,9390,5,Rbp4-Cre_KL100-217,C57BL/6J,GU,1057,Gustatory areas,Rbp4-Cre_KL100,177838435,104
523718823,M,"[385, 533, 879, 886, 894]",0.075229,8680,1050,4050,36,Rbp4-Cre_KL100-245457,,VISpm,533,posteromedial visual area,Rbp4-Cre_KL100,177838435,385
294481346,F,"[385, 533]",0.129882,8650,710,7560,5,Rbp4-Cre_KL100-125732,,VISpm,533,posteromedial visual area,Rbp4-Cre_KL100,177838435,385
657046319,M,"[385, 879, 894]",0.020779,9600,1260,3800,35,Rbp4-Cre_KL100-363600,,VISp,385,Primary visual area,Rbp4-Cre_KL100,177838435,385
166153483,M,"[104, 119, 583, 985]",0.415792,3530,4540,8340,5,Rbp4-Cre-103,,AIv,119,"Agranular insular area, ventral part",Rbp4-Cre_KL100,177838435,104


As a convenience, structures are grouped in to named collections called "structure sets". These sets can be used to quickly gather a useful subset of structures from the tree. The criteria used to define structure sets are eclectic; a structure set might list:

- structures that were used in a particular project.
- structures that coarsely partition the brain.
- structures that bear functional similarity.

To see only structure sets relevant to the adult mouse brain, use the StructureTree:

In [11]:
from allensdk.api.queries.ontologies_api import OntologiesApi

oapi = OntologiesApi()

# get the ids of all the structure sets in the tree
structure_set_ids = structure_tree.get_structure_sets()

# query the API for information on those structure sets
pd.DataFrame(oapi.get_structure_sets(structure_set_ids))

Unnamed: 0,description,id,name
0,List of structures in Isocortex layer 5,667481446,Isocortex layer 5
1,List of structures in Isocortex layer 6b,667481450,Isocortex layer 6b
2,Summary structures of the cerebellum,688152368,Cerebellum
3,List of structures for ABA Differential Search,12,ABA - Differential Search
4,List of valid structures for projection target...,184527634,Mouse Connectivity - Target Search
5,Structures whose surfaces are represented by a...,691663206,Mouse Brain - Has Surface Mesh
6,Summary structures of the midbrain,688152365,Midbrain
7,Summary structures of the medulla,688152367,Medulla
8,Summary structures of the striatum,688152361,Striatum
9,Structures representing subdivisions of the mo...,687527945,Mouse Connectivity - Summary


As you can see from the table above, there are many different sets that our available brain structures can be grouped in. Below we will look into our Mouse Connectivity Summary data by specifying the set ID using the `get_structure_by_set_id()` method. 

In [12]:
# From the above table, "Mouse Connectivity - Summary" has id 687527945
summary_connectivity = structure_tree.get_structures_by_set_id([687527945])
summary_connectivity_df = pd.DataFrame(summary_connectivity)
summary_connectivity_df.head()

Unnamed: 0,acronym,graph_id,graph_order,id,name,structure_id_path,structure_set_ids,rgb_triplet
0,FRP,1,6,184,"Frontal pole, cerebral cortex","[997, 8, 567, 688, 695, 315, 184]","[3, 112905828, 688152357, 691663206, 687527945...","[38, 143, 69]"
1,MOp,1,18,985,Primary motor area,"[997, 8, 567, 688, 695, 315, 500, 985]","[112905828, 688152357, 691663206, 687527945, 1...","[31, 157, 90]"
2,MOs,1,24,993,Secondary motor area,"[997, 8, 567, 688, 695, 315, 500, 993]","[112905828, 688152357, 691663206, 687527945, 1...","[31, 157, 90]"
3,SSp-n,1,44,353,"Primary somatosensory area, nose","[997, 8, 567, 688, 695, 315, 453, 322, 353]","[112905828, 688152357, 691663206, 687527945, 1...","[24, 128, 100]"
4,SSp-bfd,1,51,329,"Primary somatosensory area, barrel field","[997, 8, 567, 688, 695, 315, 453, 322, 329]","[112905828, 688152357, 691663206, 687527945, 1...","[24, 128, 100]"


## Additional Resources 

For more information on the different methods to access information on brain structures as well as the StructureTree class, visit <a href="https://alleninstitute.github.io/AllenSDK/allensdk.core.structure_tree.html">here</a>. 