# Downloading Experimental Data 

This section will serve as a tutorial on how to access and download experimental data from the Allen Brain Mouse Connectivity Atlas. In this tutorial you will learn to download metadata by transgenic line as well as by injection structure. By the end of this tutorial you will be ready to use this downloaded data to check for connections between brain areas and build out analyses of your own.  

In [1]:
import numpy as np
import pandas as pd 

## Setup
To access the mouse connectivity data through the SDK, we first need to `import` the [MouseConnectivityCache class](https://alleninstitute.github.io/AllenSDK/connectivity.html). This class caches metadata about the mouse connectivty database and provides methods needed to download and analyze the data. For a full list of methods for the Mouse Connectivity Class object, please visit the <a href = 'https://alleninstitute.github.io/AllenSDK/allensdk.core.mouse_connectivity_cache.html'> original documentation</a>.

In [2]:
# Import the MouseConnectivityCache
from allensdk.core.mouse_connectivity_cache import MouseConnectivityCache

# Create an instance of the class and assign it to a variable, mcc
mcc = MouseConnectivityCache(manifest_file='manifest.json')
print(mcc)

<allensdk.core.mouse_connectivity_cache.MouseConnectivityCache object at 0x7fb2a819d950>


## Download experimental metadata by transgenic line

Now that we have our instance of the mouse connectivity cache (`mcc`), we can download our experimental metadata. To do this, we will call `get_experiments()` on our connectivity instance. The method takes in the arguments `cre` and `injection_structure_ids`, to filter the downoaded data to match your given criteria (cre line and injection structure, respectively). We'll also use the argument `dataframe=True` to automatically assign this dowloaded data into a dataframe.

In [3]:
#Download meta data for all of the experiments, without filtering for cre line or injection structure
mouse_exp_df = mcc.get_experiments(dataframe=True)
mouse_exp_df.head()

Unnamed: 0_level_0,gender,injection_structures,injection_volume,injection_x,injection_y,injection_z,product_id,specimen_name,strain,structure_abbrev,structure_id,structure_name,transgenic_line,transgenic_line_id,id,primary_injection_structure
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
527712447,F,"[502, 926, 1084, 484682470]",0.006655,9240,3070,8990,5,Penk-IRES2-Cre-neo-249961,C57BL/6J,SUB,502,Subiculum,Penk-IRES2-Cre-neo,298725927.0,527712447,502
301875966,M,"[574, 931]",0.105746,9170,6850,6200,5,Gabrr3-Cre_KC112-3467,C57BL/6J,PG,931,Pontine gray,Gabrr3-Cre_KC112,177838877.0,301875966,574
520336173,M,"[1, 210, 491, 525, 1004]",0.025762,7810,6550,6450,5,Hdc-Cre_IM1-204103,,TMv,1,"Tuberomammillary nucleus, ventral part",Hdc-Cre_IM1,177839494.0,520336173,1
307160976,F,[304325711],0.01359,5580,7040,6270,31,Cdh4-Cre-215,,retina,304325711,retina,Cdh4-CreER,308603472.0,307160976,304325711
644250774,F,[329],0.006724,6990,2150,2130,36,A930038C07Rik-Tg1-Cre-347488,,SSp-bfd,329,"Primary somatosensory area, barrel field",A930038C07Rik-Tg1-Cre,177838542.0,644250774,329


This gives us metadata on all the expereiments in the dataset. Let's take a look at what trangenic lines are available in these experiments. 

In [4]:
transgenic_lines = mouse_exp_df['transgenic_line'].unique()
print(transgenic_lines)

['Penk-IRES2-Cre-neo' 'Gabrr3-Cre_KC112' 'Hdc-Cre_IM1' 'Cdh4-CreER'
 'A930038C07Rik-Tg1-Cre' 'Ai75(RCL-nT)' 'Scnn1a-Tg2-Cre' 'Etv1-CreERT2'
 None 'Slc17a6-IRES-Cre' 'Nos1-CreERT2' 'Gad2-IRES-Cre' 'Grm2-Cre_MR90'
 'Prkcd-GluCla-CFP-IRES-Cre' 'Gpr26-Cre_KO250' 'Tlx3-Cre_PL56'
 'Chrna2-Cre_OE25' 'Ins2-Cre_25' 'Syt6-Cre_KI148' 'Calb2-IRES-Cre'
 'Rbp4-Cre_KL100' 'Cux2-IRES-Cre' 'Emx1-IRES-Cre' 'Dbh-Cre_KH212'
 'Slc6a4-CreERT2_EZ13' 'Grp-Cre_KH288' 'Slc32a1-IRES-Cre' 'Gng7-Cre_KH71'
 'Oxt-IRES-Cre' 'Ntsr1-Cre_GN220' 'Pdzk1ip1-Cre_KD31' 'Slc17a8-IRES2-Cre'
 'Grik4-Cre' 'Cart-Tg1-Cre' 'Ntrk1-IRES-Cre' 'Chat-IRES-Cre-neo'
 'Scnn1a-Tg3-Cre' 'Ppp1r17-Cre_NL146' 'Gnrh1-Cre' 'Npr3-IRES2-Cre'
 'Sim1-Cre_KJ18' 'Pvalb-IRES-Cre' 'Hcrt-Cre' 'Cort-T2A-Cre'
 'Cnnm2-Cre_KD18' 'Drd1a-Cre_EY262' 'Calb1-T2A-dgCre' 'Syt17-Cre_NO14'
 'Pcp2-Cre_GN135' 'Drd3-Cre_KI196' 'Slc6a4-Cre_ET33' 'Htr2a-Cre_KM207'
 'Slc17a7-IRES2-Cre' 'Crh-IRES-Cre_BL' 'Nxph4-T2A-CreERT2' 'Vipr2-Cre_KE2'
 'Sst-IRES-Cre' 'Oxtr-T2A-Cre' 'Gal

Let's start by creating a dataframe that only contains experiments with the first three Cre lines in the list above *(Penk-IRES2-Cre-neo, Gabrr3-Cre_KC112, Hdc-Cre_IM1)*. You can change the Cre lines by changing the values in the list assigned to `transgenic_lines`. Remember to copy the Cre line of interest exactly, including the single quotes. We'll then use this list in the argument `cre = ` in our call to `get_experiments`.

*Note*: We could have also selected for transgenic lines by [subsetting the dataframe](https://neuraldatascience.github.io/Chapter_2/Pandas.html#subsetting) containing all of the experiments.

In [5]:
# Choose your Cre lines
transgenic_lines = ['Penk-IRES2-Cre-neo','Gabrr3-Cre_KC112','Hdc-Cre_IM1'] 

# Filter experiments from only the first 3 cre lines 
transgenic_line_df = mcc.get_experiments(cre = transgenic_lines, dataframe=True)

# Print the length of our dataframe 
print('There are'+' '+ str(len(transgenic_line_df))+' '+'experiments in these Cre lines: \n'+str(transgenic_lines))

transgenic_line_df.head()

There are 39 experiments in these Cre lines: 
['Penk-IRES2-Cre-neo', 'Gabrr3-Cre_KC112', 'Hdc-Cre_IM1']


Unnamed: 0_level_0,gender,injection_structures,injection_volume,injection_x,injection_y,injection_z,product_id,specimen_name,strain,structure_abbrev,structure_id,structure_name,transgenic_line,transgenic_line_id,id,primary_injection_structure
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
527712447,F,"[502, 926, 1084, 484682470]",0.006655,9240,3070,8990,5,Penk-IRES2-Cre-neo-249961,C57BL/6J,SUB,502,Subiculum,Penk-IRES2-Cre-neo,298725927,527712447,502
301875966,M,"[574, 931]",0.105746,9170,6850,6200,5,Gabrr3-Cre_KC112-3467,C57BL/6J,PG,931,Pontine gray,Gabrr3-Cre_KC112,177838877,301875966,574
520336173,M,"[1, 210, 491, 525, 1004]",0.025762,7810,6550,6450,5,Hdc-Cre_IM1-204103,,TMv,1,"Tuberomammillary nucleus, ventral part",Hdc-Cre_IM1,177839494,520336173,1
602828622,F,[993],0.014113,2510,2440,3850,45,Penk-IRES2-Cre-neo-321021,,MOs,993,Secondary motor area,Penk-IRES2-Cre-neo,298725927,602828622,993
168664192,F,"[91, 217, 372, 867, 920, 928, 589508455]",0.062617,10890,3280,7430,5,Gabrr3-Cre-123,,IP,91,Interposed nucleus,Gabrr3-Cre_KC112,177838877,168664192,91


## Download experimental metadata by injection structure 

In order to use the `injection_structure_ids` argument above, you need the structure IDs. The MouseConnectivityCache has a method for retrieving the adult mouse structure tree as an StructureTree class instance. The StructureTree class has many methods that allows you to access lists of brain structures through their ID, name, acronym, and many other properties. This is done by executing the `get_structure_tree()` method on your MouseConnectivityCache instance (`mcc`).

Below we will access information on the hypothalamus via its name by calling `get_structures_by_name()` on our StructureTree instance. 

In [6]:
# Grab the StructureTree instance
structure_tree = mcc.get_structure_tree()

# Get info on hypothalamus by its name 
hypothalamus = structure_tree.get_structures_by_name(['Hypothalamus'])[0]
hypothalamus

{'acronym': 'HY',
 'graph_id': 1,
 'graph_order': 715,
 'id': 1097,
 'name': 'Hypothalamus',
 'structure_id_path': [997, 8, 343, 1129, 1097],
 'structure_set_ids': [2,
  112905828,
  691663206,
  12,
  184527634,
  112905813,
  687527670,
  114512891,
  114512892],
 'rgb_triplet': [230, 68, 56]}

This gives us a dictionary with info about our brain structure of interest. The value stored within `id` is the injection strucuture id. We can download our experimental metadata by injection structure by inputting this value into `cre =` when calling `get_experiments`.

In [7]:
# Hypothalamus experiments have ID 1097
injection_structure = hypothalamus['id']
hyp_df = mcc.get_experiments(injection_structure_ids = [injection_structure], 
                             dataframe=True)

hyp_df.head()

Unnamed: 0_level_0,gender,injection_structures,injection_volume,injection_x,injection_y,injection_z,product_id,specimen_name,strain,structure_abbrev,structure_id,structure_name,transgenic_line,transgenic_line_id,id,primary_injection_structure
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
520336173,M,"[1, 210, 491, 525, 1004]",0.025762,7810,6550,6450,5,Hdc-Cre_IM1-204103,,TMv,1,"Tuberomammillary nucleus, ventral part",Hdc-Cre_IM1,177839494.0,520336173,1
286318327,M,"[118, 223, 693, 830, 10671]",0.074341,6910,6830,5830,5,Ins2-Cre-138,C57BL/6J,ARH,223,Arcuate hypothalamic nucleus,Ins2-Cre_25,177837788.0,286318327,118
127710392,M,"[194, 491, 525, 830, 879, 886, 946, 980, 1004,...",0.348138,7680,5950,5690,5,378-1442,C57BL/6J,PH,946,Posterior hypothalamic nucleus,,,127710392,194
305379705,M,"[194, 356, 364, 470, 685, 797]",0.420432,6990,5660,7160,5,Slc32a1-IRES-Cre-123530,B6.129,LHA,194,Lateral hypothalamic area,Slc32a1-IRES-Cre,177839090.0,305379705,194
113037759,F,"[614, 693]",0.002397,6540,6890,6620,5,Oxt-IRES-Cre-1161,B6.129,TU,614,Tuberal nucleus,Oxt-IRES-Cre,177838953.0,113037759,614


## Putting it All Together 

Below is an example of how we can combine both filtering by both transgenic mouse line and by injection structure to get a more refined set of data.

In [8]:
# Choose desired structure 
hypothalamus = structure_tree.get_structures_by_name(['Hypothalamus'])[0]
injection_structure = hypothalamus['id']

# Choose your Cre lines
transgenic_lines = 'Hdc-Cre_IM1' 

# Filter experiments using cre line and injection structure 
penk_hypothalamus_exps = mcc.get_experiments(cre = [transgenic_lines],
                                            injection_structure_ids = [injection_structure])

# Assign as dataframe
penk_hypothalamus_df = pd.DataFrame(penk_hypothalamus_exps).set_index('id')
penk_hypothalamus_df

Unnamed: 0_level_0,gender,injection_structures,injection_volume,injection_x,injection_y,injection_z,product_id,specimen_name,strain,structure_abbrev,structure_id,structure_name,transgenic_line,transgenic_line_id,primary_injection_structure
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
520336173,M,"[1, 210, 491, 525, 1004]",0.025762,7810,6550,6450,5,Hdc-Cre_IM1-204103,,TMv,1,"Tuberomammillary nucleus, ventral part",Hdc-Cre_IM1,177839494,1
157952068,M,"[1, 126, 210, 491, 980, 1004]",0.02094,7740,6520,6460,5,Hdc-Cre-88,,PMv,1004,Ventral premammillary nucleus,Hdc-Cre_IM1,177839494,1
519164644,F,"[1, 210, 491, 525, 1004]",0.013686,8190,6090,6490,5,Hdc-Cre_IM1-204100,,SUM,525,Supramammillary nucleus,Hdc-Cre_IM1,177839494,1
168616827,M,"[194, 830, 946]",0.004834,7300,5930,6090,5,Hdc-Cre-168,,PH,946,Posterior hypothalamic nucleus,Hdc-Cre_IM1,177839494,194
157952778,M,"[194, 210, 525]",0.005338,8070,5990,6570,5,Hdc-Cre-90,,LM,210,Lateral mammillary nucleus,Hdc-Cre_IM1,177839494,194
520619072,M,"[1, 210, 491, 525, 1004]",0.02839,7790,6580,6390,5,Hdc-Cre_IM1-204106,,LM,210,Lateral mammillary nucleus,Hdc-Cre_IM1,177839494,1
520342605,M,"[1, 210, 491, 525, 1004]",0.022724,7810,6560,6460,5,Hdc-Cre_IM1-204105,,PMv,1004,Ventral premammillary nucleus,Hdc-Cre_IM1,177839494,1


## Structure Tree

So far, we know that the hypothalamus is a brain structure available to us in our experiments, but what about the rest of the brain structures? How do we find what are all the brain structures availabe to us? To do so, we can take a look at the unique values under the `name` column, in our summary of brain structures. 

In [9]:
# From the above table, "Brain - Summary Structures" has ID 167587189
summary_structures = structure_tree.get_structures_by_set_id([167587189])
summary_structures_df = pd.DataFrame(summary_structures)

# Determine how many different structures are within our experiments 
structure_name = summary_structures_df['name'].unique()
print("%d Total Available Brain Structures" % len(structure_name))

# print the first 20 brain structures in our data
print(structure_name[:19])

316 Total Available Brain Structures
['Frontal pole, cerebral cortex' 'Primary motor area'
 'Secondary motor area' 'Primary somatosensory area, nose'
 'Primary somatosensory area, barrel field'
 'Primary somatosensory area, lower limb'
 'Primary somatosensory area, mouth'
 'Primary somatosensory area, upper limb'
 'Primary somatosensory area, trunk'
 'Primary somatosensory area, unassigned'
 'Supplemental somatosensory area' 'Gustatory areas' 'Visceral area'
 'Dorsal auditory area' 'Primary auditory area' 'Posterior auditory area'
 'Ventral auditory area' 'Anterolateral visual area'
 'Anteromedial visual area']


As a convenience, structures are grouped in to named collections called "structure sets". These sets can be used to quickly gather a useful subset of structures from the tree. The criteria used to define structure sets are eclectic; a structure set might list:

- structures that were used in a particular project.
- structures that coarsely partition the brain.
- structures that bear functional similarity.

To see only structure sets relevant to the adult mouse brain, use the [StructureTree](https://allensdk.readthedocs.io/en/latest/allensdk.core.structure_tree.html):

In [10]:
from allensdk.api.queries.ontologies_api import OntologiesApi
import pandas as pd

oapi = OntologiesApi()

# get the ids of all the structure sets in the tree
structure_set_ids = structure_tree.get_structure_sets()

# query the API for information on those structure sets
pd.DataFrame(oapi.get_structure_sets(structure_set_ids)).head()

Unnamed: 0,description,id,name
0,List of structures in Isocortex layer 5,667481446,Isocortex layer 5
1,List of structures in Isocortex layer 6b,667481450,Isocortex layer 6b
2,Summary structures of the cerebellum,688152368,Cerebellum
3,List of structures for ABA Differential Search,12,ABA - Differential Search
4,List of valid structures for projection target...,184527634,Mouse Connectivity - Target Search


As you can see from the table above, there are many different sets that our available brain structures can be grouped in, either by brain area (e.g. isocortex) or by dataset (e.g. ABA - Differential Search). Below we will look into our Mouse Connectivity Summary data by specifying the set ID using the `get_structure_by_set_id()` method. 

In [11]:
# From the above table, "Mouse Connectivity - Summary" has id 687527945
summary_connectivity = structure_tree.get_structures_by_set_id([687527945])
summary_connectivity_df = pd.DataFrame(summary_connectivity)
summary_connectivity_df.head()

Unnamed: 0,acronym,graph_id,graph_order,id,name,structure_id_path,structure_set_ids,rgb_triplet
0,FRP,1,6,184,"Frontal pole, cerebral cortex","[997, 8, 567, 688, 695, 315, 184]","[3, 112905828, 688152357, 691663206, 687527945...","[38, 143, 69]"
1,MOp,1,18,985,Primary motor area,"[997, 8, 567, 688, 695, 315, 500, 985]","[112905828, 688152357, 691663206, 687527945, 1...","[31, 157, 90]"
2,MOs,1,24,993,Secondary motor area,"[997, 8, 567, 688, 695, 315, 500, 993]","[112905828, 688152357, 691663206, 687527945, 1...","[31, 157, 90]"
3,SSp-n,1,44,353,"Primary somatosensory area, nose","[997, 8, 567, 688, 695, 315, 453, 322, 353]","[112905828, 688152357, 691663206, 687527945, 1...","[24, 128, 100]"
4,SSp-bfd,1,51,329,"Primary somatosensory area, barrel field","[997, 8, 567, 688, 695, 315, 453, 322, 329]","[112905828, 688152357, 691663206, 687527945, 1...","[24, 128, 100]"


For more information on the different methods to access information on brain structures as well as the StructureTree class, visit <a href="https://alleninstitute.github.io/AllenSDK/allensdk.core.structure_tree.html">here</a>. 