In [1]:
import sys
import numpy as np
import time
import pickle
import pandas as pd
import copy
import matplotlib.pyplot as plt

import allensdk
import pandas as pd

In [2]:
from allensdk.core.mouse_connectivity_cache import MouseConnectivityCache

# Mouse Connectivity

This notebook analyzes the mouse connectivity data for the purpose of calculating the information flow in the brain. Some definitions are in order

projection density = sum of detected pixels / sum of all pixels in division  
projection energy = sum of detected pixel intensity / sum of all pixels in division  
injection_fraction = fraction of pixels belonging to manually annotated injection site  
injection_density = density of detected pixels within the manually annotated injection site  
injection_energy = energy of detected pixels within the manually annotated injection site  
data_mask = binary mask indicating if a voxel contains valid data (0=invalid, 1=valid). Only valid voxels should be used for analysis  


## Fetch all the Experiment list


In [5]:
# MouseConnectivityCache allows access to data stored within
mcc=MouseConnectivityCache(manifest_file='connectivity/mouse_connectivity_manifest.json')

#get the id of all experiments.
#Create one for dataframe and the other in terms of dictionary
experiment_list = mcc.get_experiments(dataframe=True)
experiment_list_dict=mcc.get_experiments(dataframe=False)

where the former is a pandas.dataframe while the latter is a list that contains dictionary with respect to each experiment where the keys describe their properties

In [6]:
experiment_list[:3]

Unnamed: 0_level_0,gender,id,injection_structures,injection_volume,injection_x,injection_y,injection_z,primary_injection_structure,product_id,specimen_name,strain,structure_abbrev,structure_id,structure_name,transgenic_line,transgenic_line_id
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
527712447,F,527712447,"[502, 926, 1084, 484682470]",0.006655,9240,3070,8990,502,5,Penk-IRES2-Cre-neo-249961,C57BL/6J,SUB,502,Subiculum,Penk-IRES2-Cre-neo,298725927.0
301875966,M,301875966,"[574, 931]",0.105746,9170,6850,6200,574,5,Gabrr3-Cre_KC112-3467,C57BL/6J,PG,931,Pontine gray,Gabrr3-Cre_KC112,177838877.0
520336173,M,520336173,"[1, 210, 491, 525, 1004]",0.025762,7810,6550,6450,1,5,Hdc-Cre_IM1-204103,,TMv,1,"Tuberomammillary nucleus, ventral part",Hdc-Cre_IM1,177839494.0


Each experiment contains the following properties

In [7]:
for i in experiment_list_dict[0].keys():
    print(i)

injection_y
injection_x
injection_z
product_id
structure_name
gender
transgenic_line
injection_volume
structure_abbrev
transgenic_line_id
primary_injection_structure
strain
id
specimen_name
injection_structures
structure_id


We wish to create a list that contains the experiment id

In [8]:
exp_id=[]
for i in experiment_list_dict:
    exp_id.append(i['id'])

In [9]:
print '%s number of experiments' %len(exp_id)

2995 number of experiments


## Annotation Volume
We download annotation volume, a matrix that denotes the structure type at a specific spatial coordinate (index of the matrix)

In [10]:
annt=mcc.get_annotation_volume()[0]

the annotation volume has the shape

In [11]:
annt.shape

(528L, 320L, 456L)

In [12]:
annt_structures=[]
for i in range(annt.shape[0]):
    for j in range(annt.shape[1]):
        for k in range(annt.shape[2]):
            if annt[i,j,k] not in annt_structures:
                annt_structures.append(annt[i,j,k])


## Structure_tree

In [13]:
structure_tree=mcc.get_structure_tree()

Allenmouse structure ontology is organized as a tree where each edge indicates the physical containment  

Structure tree can be used to fetch properties of a given structure, for example for thalamus, we can retrieve:

In [14]:
pd.DataFrame(structure_tree.get_structures_by_name(['Thalamus','Hypothalamus']))

Unnamed: 0,acronym,graph_id,graph_order,id,name,rgb_triplet,structure_id_path,structure_set_ids
0,TH,1,641,549,Thalamus,"[255, 112, 128]","[997, 8, 343, 1129, 549]","[2, 112905828, 691663206, 12, 184527634, 11290..."
1,HY,1,715,1097,Hypothalamus,"[230, 68, 56]","[997, 8, 343, 1129, 1097]","[2, 112905828, 691663206, 12, 184527634, 11290..."


In [15]:
for i in structure_tree.get_structures_by_name(['Thalamus'])[0]:
    print(i)

rgb_triplet
graph_id
name
acronym
graph_order
structure_id_path
structure_set_ids
id


## structure set
AllenMouse SDK provides a convenient way to group structures together through the use of structure sets. Therefore fetching a structure is relatively easy.

In [16]:
from allensdk.api.queries.ontologies_api import OntologiesApi

#initialize oapi_inst instance
oapi_inst=OntologiesApi()

In [17]:
structure_set_ids=structure_tree.get_structure_sets()
structure_set=pd.DataFrame(oapi_inst.get_structure_sets(structure_set_ids))
structure_set

Unnamed: 0,description,id,name
0,List of structures in Isocortex layer 5,667481446,Isocortex layer 5
1,List of structures in Isocortex layer 6b,667481450,Isocortex layer 6b
2,Summary structures of the cerebellum,688152368,Cerebellum
3,List of structures for ABA Differential Search,12,ABA - Differential Search
4,List of valid structures for projection target...,184527634,Mouse Connectivity - Target Search
5,Structures whose surfaces are represented by a...,691663206,Mouse Brain - Has Surface Mesh
6,Summary structures of the midbrain,688152365,Midbrain
7,Summary structures of the medulla,688152367,Medulla
8,Summary structures of the striatum,688152361,Striatum
9,Structures representing subdivisions of the mo...,687527945,Mouse Connectivity - Summary


Here our interest is in the projection research, therefore we fetch the summary structures from 184527634 and 184527634	, for injection structures and projection target structures respectively.

In [18]:
projection_structures_inj=pd.DataFrame(structure_tree.get_structures_by_set_id([184527634]))
projection_structures_target=pd.DataFrame(structure_tree.get_structures_by_set_id([184527634]))

In [19]:
inj_values=projection_structures_inj["id"].values.tolist()
target_values=projection_structures_target["id"].values.tolist()

In [20]:
inj_values==target_values

True

184527634 and 184527634 seem to share the same structures

### how many structures in each?

In [21]:
printme={'experiment id':[],'number of structures':[]}

for i in structure_set[structure_set.columns[1]]:
    printme['experiment id'].append(i)
    printme['number of structures'].append(len(structure_tree.get_structures_by_set_id([i])))
print pd.DataFrame(printme)

    experiment id  number of structures
0       667481446                    43
1       667481450                    43
2       688152368                    18
3              12                   843
4       184527634                   840
5       691663206                   840
6       688152365                    39
7       688152367                    45
8       688152361                    14
9       687527945                   293
10      688152359                    15
11      514166994                     7
12      688152358                    11
13      167587189                   316
14      667481445                    26
15      687527670                    12
16      688152362                     9
17      114512892                    79
18      112905813                   111
19             10                    74
20      112905828                   397
21      667481449                    43
22              3                    51
23      667481440                    43


# Maps

Each structure can be denoted by all kinds of variables such as its id, acronym, or its full name. Therefore it is convenient to define maps that will take care of this. We want the map to function as a converter between variables.


In [24]:
projection_structures_inj=pd.DataFrame(projection_structures_inj)

acronym=projection_structures_inj['acronym'].values.tolist()
acronym_to_id={}
acronym_to_name={}
acronym_to_volume={}
id_to_idpath={}
id_to_name={}

for i in range(len(acronym)):
    a=projection_structures_inj['acronym'][i] #structure acronym
    b=projection_structures_inj['structure_id_path'][i][-1] #sturcture id
    c=projection_structures_inj['name'][i] #structure name
    d=projection_structures_inj['structure_id_path'][i] #structure id path
    acronym_to_id[a]=b
    acronym_to_name[a]=c
    id_to_name[b]=c
    id_to_idpath[b]=d
    id_to_injcoordinates=


### structure volumes

as far as I am aware, getting the structure volume is not straight forward

define inverse map for the dictionary

In [25]:
def invert_me(arg):
    """inverts a map
        args:
            arg(dictionary): input dictionary
        returns:
            inverted input dictionary"""
    return {v: k for k,v in arg.iteritems()}

In [26]:
id_to_acronym=invert_me(acronym_to_id)
name_to_acronym=invert_me(acronym_to_name)
name_to_id=invert_me(id_to_name)

Since the structures organized as trees, it would be nice to have a dictionary that can return all its daughter structures (or all structures beneath it). Hence here we define a dictionary that contains its direct substructures (only one level below): Substr

In [27]:
projection_structures_inj.columns[-2]

'structure_id_path'

In [28]:
Substr={}
Substr[997]=[]
for acr in acronym:
    Substr[acronym_to_id[acr]]=[]

    
unknown_structures=[]


for i in projection_structures_inj['structure_id_path']:
    try:
        Substr[i[-2]].append(i[-1])
    except:
        unknown_structures.append(i[-2])

In [29]:
unknown_structures

[81, 81]

This structure shows up twice. what is this? move on for now

Now we define functions that unpacks a list of all structures below a particular structure using Substr

In [30]:
def Unpack(structure_id,MST,include_self=False):
    """returns a list of substructures of a given structure
        args:
            structures_id(int): id of the structure
            MST(dictionary): My Structure Tree
            include_self(bool): decide whether to include the initial structure, default=False
        
        returns:
            king(list) """
    #Check for Error
    #if int(structure_id)!=structure_id:
    #    raise ValueError("not a valid structure_id. structure_id must be an integer")
    
    
    unpacked=[]
    
    
    
    def Unpack_(structure_id,MST):
        """Primarily used for 'recursion' of the unpack"""
        
        #check error
        #if structure_id  in structure_id_list_exception:
        #    raise ValueError("structure id not recognized")
         
        leng=len(MST[structure_id])

        if leng>=1:
            unpacked.append(structure_id)
            for i in MST[structure_id]:
                Unpack_(i,MST)

        elif leng==0:
            unpacked.append(structure_id)
        

    Unpack_(structure_id,MST)
    
    if include_self:
        pass
    elif not include_self:
        unpacked.remove(structure_id)
    
    return unpacked

In [31]:
name_to_id['Thalamus']

549

In [32]:
len(Unpack(997,Substr,include_self=False))

838

## Injection Fraction 

We wish to find how many fractions of the total injection volume of an experiment belong to the structure of interest.

id_to_idpath defined earlier is not complete without structure id 997 and 0

In [33]:
id_to_idpath[0]=[]
id_to_idpath[997]=[997]

structure_id_list_exception=[81]

In [34]:
def create_structure_dictionary(annt_values,inj_density_values,include_density=True):
    """args:
        annt_values: (array, int 32): array of structure ids obtained from annotation volume[injection_mask]
        inj_density_values (array): 
        
        returns:
            structuer_dict(dictionary): a map from a structure id to its fraction in the injection volume
            """
    
    #check that annt_values and inj_density are the same size
    if len(annt_values)!=len(inj_density_values):
        raise ValueError('annt volume and inj density must be the same size')
    
    structure_dict={}
    leng=len(annt_values)
    exception_count=0
    exception_id_list=[]
    
    if include_density:
        
        for i in range(leng):
            
            if annt_values[i] in structure_id_list_exception:
                exception_id_list.append(annt)
                pass
            elif annt_values[i] not in structure_id_list_exception:
                if annt_values[i] not in structure_dict:
                    structure_dict[annt_values[i]]=0
                elif annt_values[i] in structure_dict:
                    structure_dict[annt_values[i]]+=inj_density_values[i]
                    
                    
                    
                    
                    
        #need to add the id of each superstructures of structure_dict
        extra=[]
        
        for i in structure_dict:
            try:
                sup=id_to_id_path[i]
                for j in sup:
                    if j not in structure_dict:
                        extra.append(j)
            except:
                exception_count+=1
                pass
        for i in extra:
            structure_dict[i]=0
                
        return structure_dict,exception_count,exception_id_list
    
    
    elif not include_density:
        for i in arg:
            try:
                if i not in structure_dict:
                    structure_dict[i]=0
            except:
                exception_count+=1
                pass

            if i in structure_dict:
                structure_dict[i]+=1
        return structure_dict,exception_count,exception_id_list
    
def generate_experiment_structure_fractions( experiment_list, annt,MST,include_density=True):
    """returns a dictionary that contains fraction information about the injection volume of the structure"""
    
    """args:
        experiment_list(list): list of experiment ids
        annt (ndarray) annotation volume
        include_density (bool): decide whether to calculate using the density of simple volume number.
        Default=True"""
    
    #returning dictionary
    return_dict={}
    
    #iterate over the given experiment list.
    for i in experiment_list:
        
        #fetch injection density information
        inj_density=mcc.get_injection_density(i)[0]
        #create booleean array that is True on the injection sites
        inj_density_bool=inj_density!=0.0
        
        #np.where returns a tuple (x,y,z) for 3d, which are to be unpacked immediately
        a,b,c=np.where(inj_density_bool)
        #using the voxel coordinates a,b,c, find density values at each coordinate. 
        inj_density_values=inj_density[a,b,c]
        #and annotation values that contain id list
        annt_values=annt[a,b,c]
        
        
        
        #call the function create_structure_dictionary that analyzies the fraction information
        sub_dict,exception_count_sub,exception_id_list_sub=create_structure_dictionary(annt_values,inj_density_values,include_density=True)
        
        
        
        for p in sub_dict:
            if not include_density:
                leng=annt_values.shape[0]
                sub_dict[p]=float(sub_dict[p])/leng
            elif include_density:
                #normalize the fraction
                sub_dict[p]=float(sub_dict[p])/sum(inj_density_values)
        
        return_dict[i]=sub_dict
    
    exception_count=0
    exception_id_list=[]
    
    # now add the fraction of substructures to the superstructure
    #normal copy would not really 'copy' the dictionary. Thus use copy.deepcopy
    return_dict_modified=copy.deepcopy(return_dict)
    for i in return_dict:
        for j in return_dict[i]:
            #call unpack function to list all the substructures
            if j in projection_structures_inj:
                unpackedpre=Unpack(j,MST,include_self=True)

                for k in unpackedpre:
                    if j!=k:
                        try:
                            aa=return_dict[i][k].copy()
                            return_dict_modified[i][j]+=aa
                        except:
                            exception_count+=1
                            pass
            else:
                exception_count+=1
                exception_id_list.append(j)
                pass
    return return_dict_modified,exception_count,exception_id_list,exception_count_sub,exception_id_list_sub

def map_to_acronym(dict_):
    return_dict={}
    exception_count,exception_id=(0,[])
    for i in dict_:
        try:
            return_dict[id_to_acronym[i]]=dict_[i]
        except:
            exception_count+1
            exception_id.append(i)
            pass
    return return_dict

In [37]:
run=raw_input("Computationally intensive (roughly 6-8 hours) Run? (y/n)")
if run=="y":
    Master_injection_fraction_dictionary={}
    ec=0
    ecl=[]

    for i in experiment_list['id'].tolist():
        ecp,ec,ecl,ecs,ecls=generate_experiment_structure_fractions([i],annt,Substr,include_density=True)
        Master_injection_fraction_dictionary[i]=ecp[i]
        
elif run=="n":
    print "operation cancelled"


Computationally intensive (roughly 6-8 hours) Run? (y/n)n
operation cancelled


### Save & Load station

In [38]:
"""SAVE"""
run=raw_input("Save? (y/n)")

if run=="y":
    with open ("Master_injection_fraction_dictionary",'wb') as f:
        pickle.dump(Master_injection_fraction_dictionary,f)
    
elif run=="n":
    print "operation cancelled"

Save? (y/n)n
operation cancelled


In [45]:
"""Load"""
run=raw_input("Load? (y/n)")

if run=="y":
    with open("Master_injection_fraction_dictionary",'r') as f:
        Master_injection_fraction_dictionary=pickle.load(f)
elif run=="n":
    print "operation cancelled"

Load? (y/n)y


## Reality Check

In [68]:
Master_injection_fraction_dictionary[129564675]

{313: 0.05470134485921032,
 795: 0.15601174794869346,
 872: 0.7890787559832352,
 614454277: 1.61519661731961e-05}

#### Filter 


In [87]:
exception_count=0
exception_list=[]

for i in experiment_list['id'].values.tolist():
    try:
        for j in Master_injection_fraction_dictionary[i]:
            if Master_injection_fraction_dictionary[i][j]>1:
                raise ValueError("A fraction cannot be larger than 1")
    except:
        exception_count+=1
        exception_list.append(i)

In [93]:
len(exception_list)

77

77 experiments are not valid keys for Master_injection_fraction_dictionary. Investigate

For good reasons, we wish to remove experiments that are too close to one another. First get a map from experiment id to their injection_coordinates

In [95]:
len(mcc.get_experiments(cre=True))

2497

In [90]:
"""
import mcmodels
from mcmodels.core import VoxelModelCache

cache = VoxelModelCache()
"""
None

In [91]:
'voxel_array, source_mask, target_mask=cache.get_voxel_connectivity_array()'
None

In [92]:
'mcmodels.models.voxel.VoxelConnectivityArray'
None

# Comparison between annotation volume and structure_tree

First we make sure the set of structures obtained from the structure_tree and the annotation volume contain each other:

In [None]:
for i in 

In [206]:
projection_structures=pd.DataFrame(projection_structures)

for index,row in projection_structures.iterrows():
    print(row['id'])
    if row['id'] not in annt_structures:
        print row['id']

1
2
4
4
6
7
8
8
9
10
12
15
17
19
20
21
21
22
22
23
26
27
28
30
31
31
33
35
36
38
39
39
41
42
44
44
46
46
48
48
50
51
51
52
54
56
58
59
62
63
64
66
67
68
72
74
75
78
83
84
88
91
93
95
95
96
97
98
100
101
102
104
104
105
106
108
111
111
113
114
115
117
118
119
119
120
121
122
123
125
126
127
127
128
131
132
133
135
135
136
138
138
139
141
141
143
146
148
149
151
151
154
154
155
156
157
157
158
159
162
163
165
165
14
14
169
170
170
171
173
177
178
180
181
184
184
186
187
188
189
607344830
194
196
197
607344838
201
607344842
203
204
607344846
207
209
210
211
214
215
689
218
607344862
223
225
226
229
230
231
233
234
235
235
312782546
312782546
238
239
239
242
242
243
246
247
247
249
250
251
252
254
254
255
257
258
260
312782550
262
263
266
268
269
271
272
728
274
275
275
278
278
279
280
281
312782554
286
287
288
289
290
290
292
294
294
295
295
296
298
302
302
303
304
305
307
312782558
310
311
313
314
315
315
318
319
319
320
321
322
322
323
323
325
326
327
328
329
329
330
331
331
332
3127825