# Using [International Brain Laboratory](https://www.internationalbrainlab.com/) behavior data for an example analysis

Christopher S Krasniak, Cold Spring Harbor Laboratory, 2020-01-22

In order to encourage access and use the IBL data used for and released with the [bioRxiv](https://www.biorxiv.org/content/10.1101/2020.01.17.909838v2) paper detailing the standardized training of the IBL, the Outreach Working Group of the IBL created this tutorial. The purpose of this document is to encourage the use of IBL data, specifically as a resource for teaching the use of python for the analysis of neuroscience and psychology data. Many simple data analysis questions can be explored with this data set, a few examples of which are in the accompanying document DOCUMENT. We hope these questions will help future neuroscientists and psychologists explore this dataset and perhaps make their own unique discoveries as they learn to use python for data analysis.

To proceed with the tutorial, make sure you have completed the installation steps in the [README](https://github.com/cskrasniak/behavior_analysis_demo/blob/master/README.md)

What follows is a tutorial that can be used as an example of how to access the IBL data and perform a simple analysis to answer a simple question. The data used in this tutorial are from mice that have been trained on a basic visual detection task, please read the [paper](https://www.biorxiv.org/content/10.1101/2020.01.17.909838v2) to understand the dataset you will be working with.

## Question: Who performs more trials, male or female mice?

### Import packages
The first step, as with any python code, is to import all of the packages we will need to work with the data, this is a good set to start with when working with IBL data. You may need more or fewer for specific questions, but this is a good start.

In [2]:
import numpy as np
import pandas as pd
import sys
import matplotlib.pyplot as plt
import seaborn as sns
import datajoint as dj
import os
import matplotlib as mpl
from ibl_pipeline import subject, behavior, acquisition
from paper_behavior_functions import query_sessions

Please enter DataJoint username: cskrasniak
Connecting cskrasniak@datajoint.internationalbrainlab.org:3306
Connected to https://alyx.internationalbrainlab.org as chrisk


### Fetch the data we'll need
Now that that's all setup, the next thing we have to do is retrieve the data from the database. To do that we will be using [DataJoint](https://docs.datajoint.io/python/), we will be running queries that will return the data we are looking for, queries in DataJoint are run using mySQL syntax. Read more about DataJoint in the link above, and the IBL Data Architecture [here](https://www.biorxiv.org/content/10.1101/827873v1). If you are going to be running your own analyses, it will be useful to familarize yourself with using datajoint with specific tutorials on [using datajoint with IBL data](https://github.com/int-brain-lab/IBL-pipeline/tree/master/notebooks/notebooks_tutorial/202001_behavior_paper).

Included in the _simple_analysis_demo_ folder is the list of universially unique identifiers (UUIDs) of the mice we will use to answer our question, and we already imported a function `query_sessions` (last line above) to query the sessions that we will use data from.

In [None]:
dj.

To compare sex of mouse with how many trials they completed, we need to know how many trials there were in a session, and if the mice were male or female. I'll show you where we can find this.

In [6]:
## The describe method is handy for datajoint objects, it lets you know what type of data is in what object
behavior.TrialSet.describe()  # the number of trials completed per session is in behavior.TrialSet

# information about behavioral trials
-> acquisition.Session
---
n_trials             : int                          # total trial numbers in this set
n_correct_trials=null : int                          # number of the correct trials
trials_start_time    : float                        # start time of the trial set (seconds)
trials_end_time      : float                        # end time of the trial set (seconds)



'# information about behavioral trials\n-> acquisition.Session\n---\nn_trials             : int                          # total trial numbers in this set\nn_correct_trials=null : int                          # number of the correct trials\ntrials_start_time    : float                        # start time of the trial set (seconds)\ntrials_end_time      : float                        # end time of the trial set (seconds)\n'

In [10]:
subject.Subject.describe()  # the sex of the mouse is in subject.Subject


subject_uuid         : uuid                         
---
subject_nickname     : varchar(255)                 # nickname
sex                  : enum('M','F','U')            # sex
subject_birth_date=null : date                         # birth date
ear_mark=null        : varchar(255)                 # ear mark
-> [nullable] subject.Line.proj(subject_line="line_name")
-> [nullable] subject.Source.proj(subject_source="source_name")
protocol_number      : tinyint                      # protocol number
subject_description=null : varchar(1024)                
subject_ts=CURRENT_TIMESTAMP : timestamp                    



'subject_uuid         : uuid                         \n---\nsubject_nickname     : varchar(255)                 # nickname\nsex                  : enum(\'M\',\'F\',\'U\')            # sex\nsubject_birth_date=null : date                         # birth date\near_mark=null        : varchar(255)                 # ear mark\n-> [nullable] subject.Line.proj(subject_line="line_name")\n-> [nullable] subject.Source.proj(subject_source="source_name")\nprotocol_number      : tinyint                      # protocol number\nsubject_description=null : varchar(1024)                \nsubject_ts=CURRENT_TIMESTAMP : timestamp                    \n'

Now that we know where the information is, we can combine it and retrieve the data.

In [26]:
data_to_fetch =  behavior.TrialSet * subject.Subject
data_to_fetch

subject_uuid,session_start_time  start time,n_trials  total trial numbers in this set,n_correct_trials  number of the correct trials,trials_start_time  start time of the trial set (seconds),trials_end_time  end time of the trial set (seconds),subject_nickname  nickname,sex  sex,subject_birth_date  birth date,ear_mark  ear mark,subject_line  name,subject_source  name of source,protocol_number  protocol number,subject_description,subject_ts
00778394-c956-408d-8a6c-ca3b05a611d5,2019-08-10 11:24:59,196,72,0.0,2764.65,KS019,F,2019-06-18,,C57BL/6J,,2,,2019-08-13 17:07:33
00778394-c956-408d-8a6c-ca3b05a611d5,2019-08-12 09:21:03,140,56,0.0,2775.9,KS019,F,2019-06-18,,C57BL/6J,,2,,2019-08-13 17:07:33
00778394-c956-408d-8a6c-ca3b05a611d5,2019-08-13 10:28:45,223,91,0.0,3265.18,KS019,F,2019-06-18,,C57BL/6J,,2,,2019-08-13 17:07:33
00778394-c956-408d-8a6c-ca3b05a611d5,2019-08-14 09:37:17,55,22,0.0,1306.78,KS019,F,2019-06-18,,C57BL/6J,,2,,2019-08-13 17:07:33
00778394-c956-408d-8a6c-ca3b05a611d5,2019-08-14 11:35:16,289,134,0.0,2759.22,KS019,F,2019-06-18,,C57BL/6J,,2,,2019-08-13 17:07:33
00778394-c956-408d-8a6c-ca3b05a611d5,2019-08-15 10:29:21,141,60,0.0,2713.52,KS019,F,2019-06-18,,C57BL/6J,,2,,2019-08-13 17:07:33
00778394-c956-408d-8a6c-ca3b05a611d5,2019-08-16 17:00:11,340,159,0.0,2692.64,KS019,F,2019-06-18,,C57BL/6J,,2,,2019-08-13 17:07:33
00778394-c956-408d-8a6c-ca3b05a611d5,2019-08-19 09:12:02,224,95,0.0,2688.67,KS019,F,2019-06-18,,C57BL/6J,,2,,2019-08-13 17:07:33
00778394-c956-408d-8a6c-ca3b05a611d5,2019-08-20 12:06:48,252,111,0.0,2915.64,KS019,F,2019-06-18,,C57BL/6J,,2,,2019-08-13 17:07:33
00778394-c956-408d-8a6c-ca3b05a611d5,2019-08-21 11:21:03,269,85,0.0,2701.59,KS019,F,2019-06-18,,C57BL/6J,,2,,2019-08-13 17:07:33


In [36]:
## Fetch the data as a pandas dataframe 
data = data_to_fetch.fetch('n_trials','n_correct_trials','trials_end_time','sex',as_dict=True)
data = pd.DataFrame(data)
print(data)

Unnamed: 0,n_trials,n_correct_trials,trials_end_time,sex
0,196,72,2764.65,F
1,140,56,2775.90,F
2,223,91,3265.18,F
3,55,22,1306.78,F
4,289,134,2759.22,F
...,...,...,...,...
13902,387,221,2721.40,U
13903,1155,757,5703.48,U
13904,902,644,4495.77,U
13905,821,551,4386.03,U


In [37]:
print(data)

       n_trials  n_correct_trials  trials_end_time sex
0           196                72          2764.65   F
1           140                56          2775.90   F
2           223                91          3265.18   F
3            55                22          1306.78   F
4           289               134          2759.22   F
...         ...               ...              ...  ..
13902       387               221          2721.40   U
13903      1155               757          5703.48   U
13904       902               644          4495.77   U
13905       821               551          4386.03   U
13906       427               328          3130.53   U

[13907 rows x 4 columns]


To inspire original questions for students, the following line can be run to find what the available data types are for analysis.