### SparklyRGT Template: Baseline and Acquisition Analysis 

**Requirements**
* The data must be an excel file from MEDPC2XL (trial by trial data) 
* The data, sparklyRGT.py file, and this notebook must all be in the same folder

**Getting started: Please make a copy of this (sparklyRGT_template_2) for each analysis**
- Refer to sparklyRGT_documentation for function information
- Note: depending on your analysis, you will only have to complete certain sections of the sparklyRGT_documentation
- Note: feel free to create a personal template once you've become comfortable - this is just an example

In [1]:
import os
os.chdir('..')
import sparklyRGT as rgt 
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
import scipy.stats as stats
pd.options.mode.chained_assignment = None
pd.set_option('display.max_rows',100)

I am being executed!


***

# 1) Load data into Python



In [3]:
#remove the leading 'M' from the TF file subject numbers and convert to integer/float 

In [4]:
#add header names from BH03 to other files so they can be concatenated properly 

In [10]:
from os import listdir
#data needs to be loaded in from OSF
#either download them from OSF and upload to github, or load in directly from OSF 

#OSF files are CSVs (except BH03 is xlsx) and load_multiple_data loads in excel files
#sparklyrgt.py needs to be edited so that either excel files or CSVs can be loaded in

path = '../sparklyRGT_tutorial/data/'
file_names = [f for f in listdir(path)]

df = rgt.load_multiple_data(file_names, path, reset_sessions = False)

df.head()

Unnamed: 0,MSN,StartDate,StartTime,Subject,Group,Box,Experiment,Comment,Session,Trial,...,Pun_Persev_H5,Pun_HeadEntry,Pun_Dur,Premature_Resp,Premature_Hole,Rew_Persev_H1,Rew_Persev_H2,Rew_Persev_H3,Rew_Persev_H4,Rew_Persev_H5
0,rGT_A-cue,01/23/16,8:13:19,173,0.0,1,0.0,,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,rGT_A-cue,01/23/16,8:13:19,173,0.0,1,0.0,,1.0,2.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,rGT_A-cue,01/23/16,8:13:19,173,0.0,1,0.0,,1.0,3.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,rGT_A-cue,01/23/16,8:13:19,173,0.0,1,0.0,,1.0,4.0,...,2.0,1.0,10.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,rGT_A-cue,01/23/16,8:13:19,173,0.0,1,0.0,,1.0,5.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## Data cleaning

### Check session numbers for each rat and drop subjects

In [11]:
#missing data

#we need to only take rats that have consecutive sessions from 1 to 5
#some rats will be missing sessions and need to be excluded

# removes subjects where the first 5 sessions are not consecutive
subjects = df.drop_duplicates(subset=['Subject', 'Session'])
subjects_n = subjects[['Subject', 'Session']]
subjects_n = subjects_n.sort_values(by=['Subject', 'Session'])

n = subjects_n['Subject'].nunique()
drop_subs = []

i = 0
temp = subjects_n
while i < n:
    check_consec = temp.head()
    num = check_consec['Subject'].iloc[0]
    con_list = check_consec['Session'].tolist()
    con_list = list(map(int, con_list))
    if sorted(con_list) != list(range(min(con_list), max(con_list)+1)):
        drop_subs.append(num)
    i = i + 1
    temp = temp[temp.Subject != num]
print(drop_subs)
#to drop subjects:
df2 = rgt.drop_subjects(df, drop_subs)
df2

[165, 166, 167, 168, 169, 170, 171, 172, 303, 304, 305, 306, 309, 310, 311, 312, 313, 314, 315, 316, 525, 526, 527, 528, 707, 717, 723, 801, 802, 803, 804, 805, 806, 807, 808, 809, 810, 811, 812, 813, 814, 815, 816, 817, 818, 819, 820, 821, 822, 823, 824, 826, 1025, 1026, 1027, 1029, 1030, 1109, 1110, 1111, 1112, 1113, 1114, 1115, 1116, 1209, 1210, 1211, 1212, 1213, 1214, 1215, 1216, 1218, 1418, 1422, 1426]


KeyboardInterrupt: 

In [7]:
#check if there are any subjects that were run on more than one task version (including 5CSRT, FC)

#to display all the task names in the dataframe:
# df2.MSN.unique()

tasks = df2.drop_duplicates(subset=['Subject', 'MSN'])
tasks_n = tasks[['Subject', 'MSN']]
tasks_n = tasks_n.sort_values(by=['Subject'])

tasks_n_dup = tasks_n[tasks_n.duplicated(['Subject'], keep=False)]
tasks_n_dup = tasks_n_dup.Subject.unique()
duplicate_tasks = []
for i in tasks_n_dup:
    duplicate_tasks.append(i)
print(duplicate_tasks)
#drop any subjects that were run on more than one task 
final_subjects = rgt.drop_subjects(df2, duplicate_tasks)

In [7]:
final_subjects.to_csv('sockeye_data.csv')

[708, 711, 715, 724, 727, 731, 732, 1404, 1407, 1408, 1412, 1419, 1420, 1424, 1425, 1427, 1429, 1430]


In [None]:
# split the dataframe by cued and classic rats, and save them as two separate CSVs (both with column headers)

#all cued tasks will have 'cue' in the MSN (A and B version)
#all classic tasks should have 'Classic' (A and B) - either rGT or RGT 

#upload to sparklyRGT/data 