The purpose of this notebook is to demonstrate how to load data for platform paper analysis using the SDK and VBA functions

In [1]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import seaborn as sns
sns.set_context('notebook', font_scale=1.5, rc={'lines.markeredgewidth': 2})

In [2]:
%load_ext autoreload
%autoreload 2

%matplotlib inline

In [3]:
import visual_behavior.data_access.loading as loading 

### Get `VisualBehaviorOphysProjectCache` using a `cache_dir` containing NWB files downloaded from AWS

In [4]:
from allensdk.brain_observatory.behavior.behavior_project_cache import VisualBehaviorOphysProjectCache

#### This cache directory contains the final manifest and NWB files downloaded from AWS

In [5]:
cache_dir = loading.get_platform_analysis_cache_dir()
print(cache_dir)

//allen/programs/braintv/workgroups/nc-ophys/visual_behavior/platform_paper_cache


#### create a cache object using this cache_dir

In [6]:
cache = VisualBehaviorOphysProjectCache.from_s3_cache(cache_dir=cache_dir)

## Load the `ophys_experiment_table` from the cache

This table includes ALL released data

In [7]:
experiments_table = cache.get_ophys_experiment_table()

In [8]:
len(experiments_table)

1941

#### Remove VisualBehaviorMultiscope4areasx2d and Ai94 (GCaMP6s) data

These experiments should not be included in the platform paper analysis

In [9]:
# remove 4x2 and Ai94 data
experiments_table = experiments_table[(experiments_table.project_code!='VisualBehaviorMultiscope4areasx2d')&
                                     (experiments_table.reporter_line!='Ai94(TITL-GCaMP6s)')]

In [10]:
len(experiments_table)

1249

#### Add useful columns for analysis

In [11]:
import visual_behavior.data_access.utilities as utilities

In [12]:
experiments_table = utilities.add_cell_type_column(experiments_table)
experiments_table.cell_type.unique()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[key] = _infer_fill_value(value)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[item] = s


array(['Sst Inhibitory', 'Vip Inhibitory', 'Excitatory'], dtype=object)

### Load experiments table using VBA

This function does the same thing as the code above. It will return only the experiments we want to analyze for the platform paper, and adds extra useful columns like `cell_type`, `n_relative_to_first_novel` or `last_familiar`. 

In [13]:
experiments_table = loading.get_platform_paper_experiment_table()
len(experiments_table)

1249

In [14]:
experiments_table.cell_type.unique()

array(['Excitatory', 'Vip Inhibitory', 'Sst Inhibitory'], dtype=object)

In [15]:
# this table also includes several additional columns for filtering sessions based on their experience level and relationship to the first novel session
experiments_table.keys()[-9:]

Index(['cell_type', 'depth', 'first_novel', 'n_relative_to_first_novel',
       'last_familiar', 'last_familiar_active', 'second_novel',
       'second_novel_active', 'experience_exposure'],
      dtype='object')

Use `loading.get_filtered_ophys_experiment_table(`) if you want to load data from lims, and/or get a broader set of experiments. For example, if you want to include failed experiments in addition to passed experiments. 

#### Check what project_codes are included

In [16]:
experiments_table.project_code.unique()

array(['VisualBehavior', 'VisualBehaviorTask1B',
       'VisualBehaviorMultiscope'], dtype=object)

#### This means there is a mix of data from mice trained on A and mice trained on B

This is justified because we see that novelty effects are observed regardless of which image set was used for training

## Load a `behavior_ophys_experiment` (aka the dataset object) 



### Using SDK directly

In [17]:
from allensdk.brain_observatory.behavior.behavior_project_cache import VisualBehaviorOphysProjectCache

cache_dir = loading.get_platform_analysis_cache_dir()
cache = VisualBehaviorOphysProjectCache.from_s3_cache(cache_dir=cache_dir)

experiments_table = cache.get_ophys_experiment_table()

In [18]:
experiment_id = experiments_table.index.values[0]

In [19]:
dataset = cache.get_behavior_ophys_experiment(experiment_id)

In [20]:
dataset.stimulus_presentations

Unnamed: 0_level_0,start_time,stop_time,duration,image_name,image_index,is_change,omitted,start_frame,end_frame,image_set
stimulus_presentations_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
0,309.27537,309.52557,0.25020,im065,0,False,False,17986,18001.0,Natural_Images_Lum_Matched_set_training_2017.0...
1,310.02598,310.27619,0.25021,im065,0,False,False,18031,18046.0,Natural_Images_Lum_Matched_set_training_2017.0...
2,310.77660,311.02680,0.25020,im065,0,False,False,18076,18091.0,Natural_Images_Lum_Matched_set_training_2017.0...
3,311.52721,311.77740,0.25019,im065,0,False,False,18121,18136.0,Natural_Images_Lum_Matched_set_training_2017.0...
4,312.27782,312.52806,0.25024,im065,0,False,False,18166,18181.0,Natural_Images_Lum_Matched_set_training_2017.0...
...,...,...,...,...,...,...,...,...,...,...
4796,3909.81737,3910.06757,0.25020,im065,0,False,False,233842,233857.0,Natural_Images_Lum_Matched_set_training_2017.0...
4797,3910.56798,3910.81819,0.25021,im065,0,False,False,233887,233902.0,Natural_Images_Lum_Matched_set_training_2017.0...
4798,3911.31860,3911.56880,0.25020,im065,0,False,False,233932,233947.0,Natural_Images_Lum_Matched_set_training_2017.0...
4799,3912.06921,3912.31939,0.25018,im065,0,False,False,233977,233992.0,Natural_Images_Lum_Matched_set_training_2017.0...


### Using VBA

This function loads the dataset from NWB files using the SDK method shown above, then adds `extended_stimulus_presentations` and `behavior_movie_timestamps`. This is the default behavior of this function. See documentation for alternate parameter settings such as `load_from_lims`. If you do not need `extended_stimulus_presentations`, you can set `get_extended_stimulus_presentations` to False, which will speed up loading of the dataset. 

In [21]:
import visual_behavior.data_access.loading as loading

In [22]:
dataset = loading.get_ophys_dataset(experiment_id)

In [23]:
dataset.extended_stimulus_presentations

Unnamed: 0_level_0,start_time,stop_time,duration,image_name,image_index,is_change,omitted,start_frame,end_frame,image_set,...,flash_after_omitted,flash_after_change,image_name_next_flash,image_index_next_flash,image_name_previous_flash,image_index_previous_flash,lick_on_next_flash,lick_rate_next_flash,lick_on_previous_flash,lick_rate_previous_flash
stimulus_presentations_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,309.27537,309.52557,0.25020,im065,0,False,False,17986,18001.0,Natural_Images_Lum_Matched_set_training_2017.0...,...,,,im065,0.0,,,False,0.000000,,
1,310.02598,310.27619,0.25021,im065,0,False,False,18031,18046.0,Natural_Images_Lum_Matched_set_training_2017.0...,...,False,False,im065,0.0,im065,0.0,True,0.148148,False,0.000000
2,310.77660,311.02680,0.25020,im065,0,False,False,18076,18091.0,Natural_Images_Lum_Matched_set_training_2017.0...,...,False,False,im065,0.0,im065,0.0,False,0.250000,False,0.000000
3,311.52721,311.77740,0.25019,im065,0,False,False,18121,18136.0,Natural_Images_Lum_Matched_set_training_2017.0...,...,False,False,im065,0.0,im065,0.0,False,0.266667,True,0.148148
4,312.27782,312.52806,0.25024,im065,0,False,False,18166,18181.0,Natural_Images_Lum_Matched_set_training_2017.0...,...,False,False,im065,0.0,im065,0.0,False,0.259259,False,0.250000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4796,3909.81737,3910.06757,0.25020,im065,0,False,False,233842,233857.0,Natural_Images_Lum_Matched_set_training_2017.0...,...,False,True,im065,0.0,im065,0.0,False,0.001432,False,0.001536
4797,3910.56798,3910.81819,0.25021,im065,0,False,False,233887,233902.0,Natural_Images_Lum_Matched_set_training_2017.0...,...,False,False,im065,0.0,im065,0.0,False,0.001380,False,0.001484
4798,3911.31860,3911.56880,0.25020,im065,0,False,False,233932,233947.0,Natural_Images_Lum_Matched_set_training_2017.0...,...,False,False,im065,0.0,im065,0.0,False,0.001328,False,0.001432
4799,3912.06921,3912.31939,0.25018,im065,0,False,False,233977,233992.0,Natural_Images_Lum_Matched_set_training_2017.0...,...,False,False,im065,0.0,im065,0.0,False,0.001276,False,0.001380


## Load the `ophys_cells_table`

### Using the SDK

This gets a table with all `cell_roi_ids`, `cell_specimen_ids`, and `ophys_experiment_ids` in the released dataset

This is the preferred method for loading the cell table

In [24]:
cell_table = cache.get_ophys_cells_table()

In [25]:
len(cell_table)

133227

In [26]:
cell_table.head()

Unnamed: 0_level_0,cell_specimen_id,ophys_experiment_id
cell_roi_id,Unnamed: 1_level_1,Unnamed: 2_level_1
1080884343,1086496928,775614751
1080884173,1086496914,775614751
1080883843,1086496838,775614751
1080886674,1086491756,775614751
1080885658,1086491699,775614751


#### Merge with experiments_table to get metadata for cells

In [27]:
cell_table = cell_table.merge(experiments_table, on='ophys_experiment_id')

In [28]:
cell_table.head(3)

Unnamed: 0,cell_specimen_id,ophys_experiment_id,equipment_name,full_genotype,mouse_id,reporter_line,driver_line,sex,age_in_days,cre_line,...,ophys_container_id,project_code,imaging_depth,targeted_structure,date_of_acquisition,session_type,experience_level,passive,image_set,file_id
0,1086496928,775614751,CAM2P.5,Slc17a7-IRES2-Cre/wt;Camk2a-tTA/wt;Ai93(TITL-G...,403491,Ai93(TITL-GCaMP6f),"[Slc17a7-IRES2-Cre, Camk2a-tTA]",F,160.0,Slc17a7-IRES2-Cre,...,782536745,VisualBehavior,375,VISp,2018-11-08 18:38:05.000000,OPHYS_1_images_A,Familiar,False,A,945253901
1,1086496914,775614751,CAM2P.5,Slc17a7-IRES2-Cre/wt;Camk2a-tTA/wt;Ai93(TITL-G...,403491,Ai93(TITL-GCaMP6f),"[Slc17a7-IRES2-Cre, Camk2a-tTA]",F,160.0,Slc17a7-IRES2-Cre,...,782536745,VisualBehavior,375,VISp,2018-11-08 18:38:05.000000,OPHYS_1_images_A,Familiar,False,A,945253901
2,1086496838,775614751,CAM2P.5,Slc17a7-IRES2-Cre/wt;Camk2a-tTA/wt;Ai93(TITL-G...,403491,Ai93(TITL-GCaMP6f),"[Slc17a7-IRES2-Cre, Camk2a-tTA]",F,160.0,Slc17a7-IRES2-Cre,...,782536745,VisualBehavior,375,VISp,2018-11-08 18:38:05.000000,OPHYS_1_images_A,Familiar,False,A,945253901


### Using VBA

There are two ways to load a cell table in VBA. One function, `load_cell_table` loads the cells table from the SDK using the same code as shown above, and optionally filters for platform paper experiments only (no 4x2, no Ai94). The second method, `load_cell_table_from_lims` gets cell ROI information from lims for all experiments, regardless of whether they are passed or failed, unless otherwise specified by the input params. 

#### From SDK for platform paper cache

In [29]:
cell_table = loading.get_cell_table()
print(len(cell_table.ophys_experiment_id.unique()))

1249


In [30]:
# setting platform_paper_only to True filters out Ai94 and 4x2 data
cell_table = loading.get_cell_table(platform_paper_only=True)
print(len(cell_table.ophys_experiment_id.unique()))

1249


In [31]:
cell_table.head(3)

Unnamed: 0_level_0,cell_specimen_id,ophys_experiment_id,equipment_name,full_genotype,mouse_id,reporter_line,driver_line,sex,age_in_days,cre_line,...,file_id,cell_type,depth,first_novel,n_relative_to_first_novel,last_familiar,last_familiar_active,second_novel,second_novel_active,experience_exposure
cell_roi_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1080884343,1086496928,775614751,CAM2P.5,Slc17a7-IRES2-Cre/wt;Camk2a-tTA/wt;Ai93(TITL-G...,403491,Ai93(TITL-GCaMP6f),"[Slc17a7-IRES2-Cre, Camk2a-tTA]",F,160.0,Slc17a7-IRES2-Cre,...,945253901,Excitatory,350,False,,False,False,False,False,Familiar 5
1080884173,1086496914,775614751,CAM2P.5,Slc17a7-IRES2-Cre/wt;Camk2a-tTA/wt;Ai93(TITL-G...,403491,Ai93(TITL-GCaMP6f),"[Slc17a7-IRES2-Cre, Camk2a-tTA]",F,160.0,Slc17a7-IRES2-Cre,...,945253901,Excitatory,350,False,,False,False,False,False,Familiar 5
1080883843,1086496838,775614751,CAM2P.5,Slc17a7-IRES2-Cre/wt;Camk2a-tTA/wt;Ai93(TITL-G...,403491,Ai93(TITL-GCaMP6f),"[Slc17a7-IRES2-Cre, Camk2a-tTA]",F,160.0,Slc17a7-IRES2-Cre,...,945253901,Excitatory,350,False,,False,False,False,False,Familiar 5


#### From lims 

This function loads a cell table that includes all available data in lims, unless you provide a list of `ophys_experiment_ids`, or set `platform_paper_only` to True. It will include invalid ROIs unless you set `valid_rois_only` to True.

In [32]:
lims_cell_table = loading.get_cell_table_from_lims(ophys_experiment_ids=None, valid_rois_only=False, platform_paper_only=False)

In [33]:
len(lims_cell_table)

253537

The table includes `cell_roi_ids`, `cell_specimen_ids`, and `ophys_experiment_ids`, as well as information about the ROI masks, such as their x and y location in the FOV.   

In [34]:
lims_cell_table.head(3)

Unnamed: 0,cell_roi_id,cell_specimen_id,ophys_experiment_id,x,y,width,height,valid_roi,mask_matrix,max_correction_up,max_correction_down,max_correction_right,max_correction_left,mask_image_plane,ophys_cell_segmentation_run_id
0,1080887111,1086497181,775614751,261,130,31,26,False,"[[False, False, False, False, False, False, Fa...",13.0,16.0,8.0,23.0,2,1080818871
1,1080886741,1086497151,775614751,318,263,19,15,False,"[[False, False, False, False, False, False, Fa...",13.0,16.0,8.0,23.0,0,1080818871
2,1080886246,1086497095,775614751,108,399,36,21,False,"[[False, False, False, False, False, False, Fa...",13.0,16.0,8.0,23.0,2,1080818871
