# MEDPC Data Tutorial

**About:** The purpose of this tutorial is to tidy and plot data. 

**Contact:**
* Dexter Kim: dexterkim2000@gmail.com
* Brett Hathaway: bretthathaway@psych.ubc.ca

**Requirements**
* The data must be an excel file from MEDPC2XL###
* The data, rgt_functions.py file, and this notebook must be in your current working directory

**Note: This tutorial is split into multiple sections**
* Section 1: Setting Variables (objects) 
* Section 2: Loading Data into Python 
* Section 3: Acquisition analysis and plotting
* Section 4: Latin Square analysis and plotting
* Section 5: Miscellaneous

**Please run the following cell!**

In [2]:
%load_ext autoreload
%autoreload 2

In [3]:
#MEDPC rat gambling task functions imports, will print "I am being executed!" if functional
import rgt_functions as rgt

#main imports 
import os
import pandas as pd
import numpy as np

# plotting imports 
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator

# stats imports 
import scipy.stats as stats

#the following line prevents pandas from giving unecessary errors 
pd.options.mode.chained_assignment = None

I am being executed!


# 1) Setting Variables (objects) 

Set your variables! These will be used later in the code. Example arguments are left in for clarity.

###Brett will edit

In [30]:
#we need to set a few variables for loading in the data - these will change depending on the dataset

#in file_names (List[str]), add the file names you wish to read into Python 
# file_names = ['BH07_raw_free_S29-30.xlsx']

# #in group_names (List[str]), add the names of the control and experimental group, respectively 
# group_names = ['Tg negative','Tg positive'] 

# title = 'Nigrostriatal activation during acquisition' #title for figures, describing the experiment
# startsess = 29 #first session in this dataset
# endsess = 30 #last session in this dataset

# group_names = {0: 'tg negative',
#               1: 'tg positive'} ###group_names is used twice (as a list, and as a dict)

# #the following two lines of code assign the rat subject numbers to the experimental and control group lists
# exp_group = [1, 2, 7, 8, 11, 12, 16, 19, 20, 21, 22, 25, 26, 29, 32] #Tg positive

# control_group = [3, 4, 5, 6, 9, 13, 14, 15, 17, 18, 23, 24, 27, 28, 30, 31] #Tg negative

###for Latin Square analysis
file_names = ['BH06_raw_round1-infusions.xlsx', 'BH06_raw_round1-makeup.xlsx']

ls_group_names = {0:'lOFC',
               1:'PrL'} 

lOFC = [1,2,3,4,5,8,9,11,12,13,14,15,16]

PrL = [6,7] ##have to change this back with the new data

groups = [lOFC, PrL]

# 2) Load data into Python
* `load_data()` takes in one argument: file_names 
* `load_data()` outputs a table similar to the excel sheet(s) you loaded in. (in the order established in `file_names`) 
* note: `df` means dataframe, and is an object that will store your dataframe (table containing your data) 
* passing `reset_sessions = True` ###makes the sessions start from 1 again (you may want to do this for baseline acquisition analysis)
* `load_multiple_data()` ###loads in multiple cohorts (with the same subject numbers) and assigns them unique subject numbers (ex. subject 1 of cohort 1 --> subject 101) 

In [5]:
df = rgt.load_data(file_names)

#load_data won't output a dataframe itself. Use the following function to view the top of your dataframe. Note: it should look the exact same as your first excel file. 

df.head()

Unnamed: 0,MSN,StartDate,StartTime,Subject,Group,Box,Experiment,Comment,Session,Trial,...,Pun_Persev_H5,Pun_HeadEntry,Pun_Dur,Premature_Resp,Premature_Hole,Rew_Persev_H1,Rew_Persev_H2,Rew_Persev_H3,Rew_Persev_H4,Rew_Persev_H5
0,rGT_A-cue,02/13/20,15:35:41,9,1,1,0,,55,1.1,...,0,0,0,1,5,0,0,0,0,0
1,rGT_A-cue,02/13/20,15:35:41,9,1,1,0,,55,1.1,...,0,0,0,1,4,0,0,0,0,0
2,rGT_A-cue,02/13/20,15:35:41,9,1,1,0,,55,1.0,...,2,1,30,0,0,0,0,0,0,0
3,rGT_A-cue,02/13/20,15:35:41,9,1,1,0,,55,2.1,...,0,0,0,1,5,0,0,0,0,0
4,rGT_A-cue,02/13/20,15:35:41,9,1,1,0,,55,2.1,...,0,0,0,1,5,0,0,0,0,0


# 3A) Acquisition Analysis Section (Analysis by Session)

**Check your session data**
* `check_sessions` gives us a summary for each rat (subject) including session numbers, session dates and # trials for each session.
* This allows us to see if there are any missing/incorrect session numbers, and if MED-PC exported all of the desired data into the Excel file you loaded in (`file_names`). 
* `edit_sessions()` can change session numbers (not included in the tutorial) 

In [6]:
# rgt.check_sessions(df)

**To drop/remove data from certain session(s)**
* replace `session_num` with the session number data you want to remove
* for example, to remove all data from session 28 and 29, I would write: `rgt.drop_sessions(df, [28, 29])`
* **make sure to remove the correct session(s)**, if you remove the wrong session and want to put the data back, you'll have to restart the Kernel and restart from `load_data`

###do you want drop_sessions to be part of the tutorial? Or edit_sessions()?

In [7]:
# rgt.drop_sessions(df, session_num) #session_num is a list of integers
# df2 = rgt.drop_sessions(df, [28])

In [9]:
### this is a cell showing that it works --> 30 becomes 29, and 31 becomes 30, for subjects 17 to 32

# rgt.edit_sessions(df2, [30, 31], [29, 30], subs = list(range(17,33)))

**Check that you dropped the session desired (in this example, we dropped data from session 28)**

In [8]:
# rgt.check_sessions(df2)

**Run the following cell to acquire a summary of your data.**

The rows represent subjects (rats 1 to n)

The columns are explained below:
* `##P#` represents the percent choice of each option. For example, `29P1` represents the percentage of times P1 was selected during the 29th session. 
* `risk##` represents the risk score for each session: (P1 + P2) - (P3 + P4) 
* `collect_lat##` represents the mean collect latency for each session
* `choice_lat##` represents the mean choice latency for each session 
* `trial##` represents the number of trials (not including premature responses or omissions) for each session
* `prem##` represents the number of premature responses for each session

In [10]:
# df_sum = rgt.get_summary_data(df2)
# df_sum #prints the dataset 

**Get the risk status of the rats using the following code**
* Note: 
    * `risk_status == 1` indicates a positive risk score (optimal) 
    * `risk_status == 2` indicates a negative risk score (risky)
    * `mean_risk` is the mean risk score averaged across the sessions between `startsess` and `endsess` for a given subject
        * you can change `startsess` and `endsess` by passing the session numbers instead. For example, `rgt.get_risk_status(df_sum, 28, 30)`
    * `print(risky, optimal)` prints out 2 list of rat subjects: the risky rats, and the optimal rats 

In [11]:
# df_summary, risky, optimal = rgt.get_risk_status(df_sum, startsess, endsess)

# print(df_summary[['mean_risk','risk_status']]) #printed 2 of many columns in df_summary ###this could be removed?
# print(risky, optimal) #prints 2 lists: the subject numbers of the risky rats, and the subject numbers of the optimal rats

**Export your data to an Excel file!** 
* Note: `'tg_status'` is the column name that specifies the control vs. experimental group
* Note2: `'BH07_free_S29-30.xlsx'` specifies the name of the **new** Excel file 

###may want to change file_name to new_file_name for clarity 

In [12]:
# rgt.export_to_excel(df_summary, [control_group, exp_group], column_name = 'tg_status', file_name = 'BH07_free_S29-30.xlsx')

# ##Brett will edit control_group

**Summarize your data by experimental/control set**
* if you only want to view certain columns, specify them in mean_scores 
    * For example, `mean_scores[['risk29', 'risk30']]` will create a table with only those 2 columns
    * Each value is the mean for that column (ex. `29P1`) within the set (`tg negative` or `tg positive`) ###is this correct?
    
###in addition, it's strange that we have control_group and exp_group as the objects for 'tg_negative' and 'tg_positive', but I think you already noticed this. 

In [13]:
# mean_scores, stderror = rgt.get_group_means_sem(df_summary, [control_group, exp_group], group_names)
# mean_scores #all mean scores
# # mean_scores[['risk29', 'risk30']] #specify columns

# 3B) Acquisition Analysis (Plotting Section)

**Graph of the table above**
* `variable` specifies the variable you want to plot. 
    * For example, if I want to plot `choice_lat` over sessions for the experimental and control group, I would replace `variable` with `'choice_lat'`
* `startsess` and `endsess` can also be replaced with the range of session numbers you'd like to plot 
    * For example, if I want to plot `choice_lat` over sessions 29 to 31, I would replace `startsess` and `endsess` with `29` and `31` respectively

###this could be improved in description - 
###why does typing 'risk' work?

In [14]:
# rgt.rgt_plot('risk', startsess, endsess, ['tg negative','tg positive'], title, mean_scores, stderror, var_title = 'risk score')

**Transforms the above data from a line plot, to a bar plot** 
* Must use the same arguments ##hard for me to tell why this would occur

In [15]:
# rgt.rgt_bar_plot('risk', startsess, endsess, group_names, title, mean_scores, stderror, var_title = None)

**Bar plot of P1-P4 choices**
* The following bar plot plots the mean P1-P4 choices for the tg negative and tg positive groups 

In [16]:
# rgt.choice_bar_plot(startsess, endsess, mean_scores, stderror)

# 4A) Latin Square Analysis (Analysis by Group) 

* Please note! This is the same workflow as the acquisition analysis 

**Check your group data**
* This will show you the number of trials performed by each subject-group pairing. 

In [17]:
rgt.check_groups(df)

Subject  Group
1        1         50.1
         2         79.1
         3         40.0
         4        111.0
2        1         77.0
         2         74.0
         3         70.0
         4         94.0
3        1         50.1
         2         38.1
         3         45.0
         4         40.0
4        1         86.0
         2         82.0
         3         95.0
         4         79.0
5        0         87.0
         1         79.1
         2         79.0
         4        128.0
6        1        125.1
         2         98.0
         3        105.0
         4        121.0
7        1         76.1
         2         80.1
         3         68.0
         4         57.1
8        1         75.0
         2         58.0
         3         57.1
         4         66.0
9        1         65.1
         2         69.0
         3         55.0
         4         48.0
11       1         56.1
         2         58.0
         3         61.0
         4         55.0
12       1         63.0
 

**Edit your group data**
* For example, if I want to change 0 to 1 for all subjects, I would write `rgt.edit_groups(df, orig_sess = [0], new_sess = [1], subs = "all")`
    * If I want to do the same thing but only for subject 2 and 3, change `subs = "all"` to `subs = [2,3]`
* For example, if I want to remove the data for subjects 5, 9 and 12, I would write `rgt.drop_subjects(df, subs = [5, 9, 12])`

In [18]:
rgt.edit_groups(df, orig_group = [0], new_group = [3], subs = [5])

# rgt.drop_subjects(df, subs = [5, 9, 12])

Unnamed: 0,MSN,StartDate,StartTime,Subject,Group,Box,Experiment,Comment,Session,Trial,...,Pun_Persev_H5,Pun_HeadEntry,Pun_Dur,Premature_Resp,Premature_Hole,Rew_Persev_H1,Rew_Persev_H2,Rew_Persev_H3,Rew_Persev_H4,Rew_Persev_H5
0,rGT_A-cue,02/13/20,15:35:41,9,1,1,0,,55,1.1,...,0,0,0,1,5,0,0,0,0,0
1,rGT_A-cue,02/13/20,15:35:41,9,1,1,0,,55,1.1,...,0,0,0,1,4,0,0,0,0,0
2,rGT_A-cue,02/13/20,15:35:41,9,1,1,0,,55,1.0,...,2,1,30,0,0,0,0,0,0,0
3,rGT_A-cue,02/13/20,15:35:41,9,1,1,0,,55,2.1,...,0,0,0,1,5,0,0,0,0,0
4,rGT_A-cue,02/13/20,15:35:41,9,1,1,0,,55,2.1,...,0,0,0,1,5,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6415,rGT_A-cue,03/19/20,13:48:44,9,4,1,0,,78,46.0,...,0,0,0,0,0,0,0,0,0,0
6416,rGT_A-cue,03/19/20,13:48:44,9,4,1,0,,78,47.1,...,0,0,0,1,5,0,0,0,0,0
6417,rGT_A-cue,03/19/20,13:48:44,9,4,1,0,,78,47.1,...,0,0,0,1,5,0,0,0,0,0
6418,rGT_A-cue,03/19/20,13:48:44,9,4,1,0,,78,47.0,...,0,0,0,0,0,0,0,0,0,0


**Check that you edited the group desired**

In [19]:
rgt.check_groups(df)

Subject  Group
1        1         50.1
         2         79.1
         3         40.0
         4        111.0
2        1         77.0
         2         74.0
         3         70.0
         4         94.0
3        1         50.1
         2         38.1
         3         45.0
         4         40.0
4        1         86.0
         2         82.0
         3         95.0
         4         79.0
5        1         79.1
         2         79.0
         3         87.0
         4        128.0
6        1        125.1
         2         98.0
         3        105.0
         4        121.0
7        1         76.1
         2         80.1
         3         68.0
         4         57.1
8        1         75.0
         2         58.0
         3         57.1
         4         66.0
9        1         65.1
         2         69.0
         3         55.0
         4         48.0
11       1         56.1
         2         58.0
         3         61.0
         4         55.0
12       1         63.0
 

**Run the following cell to acquire a summary of your data.**

The rows represent subjects (rats 1 to n)

The columns are explained below:
* `##P#` represents the percent choice of each option. 
* `risk##` represents the risk score for each group: (P1 + P2) - (P3 + P4) 
* `collect_lat##` represents the mean collect latency for each group
* `choice_lat##` represents the mean choice latency for each group
* `trial##` represents the number of trials (not including premature responses or omissions) for each group
* `prem##` represents the number of premature responses for each group

In [20]:
df1 = rgt.get_summary_data(df, mode = 'Group')
df1

Unnamed: 0,1P1,1P2,1P3,1P4,2P1,2P2,2P3,2P4,3P1,3P2,...,omit3,omit4,trial1,trial2,trial3,trial4,prem1,prem2,prem3,prem4
1,0.0,34.0,64.0,2.0,0.0,57.1429,35.0649,7.79221,5.0,40.0,...,0,2,50.1,79.1,40.0,111.0,54.954955,32.478632,70.588235,14.615385
2,3.8961,59.7403,7.79221,28.5714,12.3288,61.6438,6.84932,19.1781,8.57143,45.7143,...,0,1,77.0,74.0,70.0,94.0,23.0,28.846154,31.372549,17.54386
3,0.0,18.0,78.0,4.0,0.0,13.8889,80.5556,5.55556,2.22222,20.0,...,0,0,50.1,38.1,45.0,40.0,57.264957,70.4,64.0,56.989247
4,1.23457,67.9012,0.0,30.8642,1.21951,65.8537,0.0,32.9268,0.0,77.6596,...,1,2,86.0,82.0,95.0,79.0,24.561404,21.904762,16.666667,15.957447
5,5.12821,35.8974,55.1282,3.84615,3.84615,61.5385,28.2051,6.41026,17.4419,40.6977,...,1,0,79.1,79.0,87.0,128.0,33.898305,41.044776,21.621622,19.496855
6,1.62602,85.3659,0.0,13.0081,2.06186,72.1649,0.0,25.7732,1.90476,69.5238,...,0,0,125.1,98.0,105.0,121.0,13.888889,15.517241,10.25641,1.626016
7,5.33333,62.6667,16.0,16.0,21.25,58.75,2.5,17.5,7.35294,58.8235,...,0,0,76.1,80.1,68.0,57.1,42.307692,40.740741,50.0,52.892562
8,4.0,44.0,0.0,52.0,6.89655,31.0345,5.17241,56.8966,0.0,42.8571,...,0,0,75.0,58.0,57.1,66.0,18.478261,42.574257,47.169811,41.071429
9,0.0,1.5625,98.4375,0.0,0.0,14.9254,82.0896,2.98507,1.85185,11.1111,...,1,2,65.1,69.0,55.0,48.0,26.436782,24.175824,46.078431,54.716981
11,3.63636,1.81818,94.5455,0.0,1.72414,3.44828,93.1034,1.72414,1.63934,4.91803,...,0,0,56.1,58.0,61.0,55.0,35.294118,31.764706,11.594203,29.487179


###Impute your missing data 
* For example, if you have missing data for subject 12, session 2, you can impute (take the mean of the session before and after). 
    * Code: `rgt.impute_missing_data(df, session = 2, subject = 12, choice = 'all', vars = 'all')`

In [21]:
df_group_summary = rgt.impute_missing_data(df1, session = 2, subject = 12, choice = 'all', vars = 'all')
df_group_summary

Unnamed: 0,1P1,1P2,1P3,1P4,2P1,2P2,2P3,2P4,3P1,3P2,...,omit3,omit4,trial1,trial2,trial3,trial4,prem1,prem2,prem3,prem4
1,0.0,34.0,64.0,2.0,0.0,57.1429,35.0649,7.79221,5.0,40.0,...,0,2,50.1,79.1,40.0,111.0,54.954955,32.478632,70.588235,14.615385
2,3.8961,59.7403,7.79221,28.5714,12.3288,61.6438,6.84932,19.1781,8.57143,45.7143,...,0,1,77.0,74.0,70.0,94.0,23.0,28.846154,31.372549,17.54386
3,0.0,18.0,78.0,4.0,0.0,13.8889,80.5556,5.55556,2.22222,20.0,...,0,0,50.1,38.1,45.0,40.0,57.264957,70.4,64.0,56.989247
4,1.23457,67.9012,0.0,30.8642,1.21951,65.8537,0.0,32.9268,0.0,77.6596,...,1,2,86.0,82.0,95.0,79.0,24.561404,21.904762,16.666667,15.957447
5,5.12821,35.8974,55.1282,3.84615,3.84615,61.5385,28.2051,6.41026,17.4419,40.6977,...,1,0,79.1,79.0,87.0,128.0,33.898305,41.044776,21.621622,19.496855
6,1.62602,85.3659,0.0,13.0081,2.06186,72.1649,0.0,25.7732,1.90476,69.5238,...,0,0,125.1,98.0,105.0,121.0,13.888889,15.517241,10.25641,1.626016
7,5.33333,62.6667,16.0,16.0,21.25,58.75,2.5,17.5,7.35294,58.8235,...,0,0,76.1,80.1,68.0,57.1,42.307692,40.740741,50.0,52.892562
8,4.0,44.0,0.0,52.0,6.89655,31.0345,5.17241,56.8966,0.0,42.8571,...,0,0,75.0,58.0,57.1,66.0,18.478261,42.574257,47.169811,41.071429
9,0.0,1.5625,98.4375,0.0,0.0,14.9254,82.0896,2.98507,1.85185,11.1111,...,1,2,65.1,69.0,55.0,48.0,26.436782,24.175824,46.078431,54.716981
11,3.63636,1.81818,94.5455,0.0,1.72414,3.44828,93.1034,1.72414,1.63934,4.91803,...,0,0,56.1,58.0,61.0,55.0,35.294118,31.764706,11.594203,29.487179


**Summarize your data by experimental/control set**
* If you only want to view certain columns, specify them in group_means 
    * For example, `group_means[[]]` will create a table with only those 2 columns ###change
    * Each value is the mean for that column (ex. ) within the set () ###is this correct?
    
###this doesn't work - something about stuff being not supported... didn't have time to rewrite it. I really tried...
###solution: PrL and lOFC cannot contain subject labels that aren't present in the data --> KeyError
###solution: PrL and lOFC must contain at least 2 subjects. 

In [31]:
group_means, sem = rgt.get_group_means_sem(df1, groups, ls_group_names) 
group_means

Unnamed: 0,1P1,1P2,1P3,1P4,2P1,2P2,2P3,2P4,3P1,3P2,...,omit3,omit4,trial1,trial2,trial3,trial4,prem1,prem2,prem3,prem4
lOFC,2.72999,38.9398,45.2701,13.0601,4.87173,41.9868,38.7247,14.4168,4.81256,42.0943,...,0.538462,1.23077,69.3462,70.3769,71.0923,77.4846,28.7019,32.3625,33.5166,27.5362
PrL,3.47967,74.0163,8.0,14.5041,11.6559,65.4575,1.25,21.6366,4.62885,64.1737,...,0.0,0.0,100.6,89.05,86.5,89.05,28.0983,28.129,30.1282,27.2593


**Export your data to an Excel file!** 
* Note: `'tg_status'` is the column name that specifies the control vs. experimental group
* Note2: `'BH07_free_S29-30.xlsx'` specifies the name of the **new** Excel file 

###may want to change file_name to new_file_name for clarity 
###doesn't work yet - have to check the workflow 

In [32]:
rgt.export_to_excel(df1,groups,'brain region','BH06_all-data2.xlsx')

**Get risk status of the vehicle**

###This skipped a lot of steps included in the BH06 LS analysis
###This is obviously not exported... but I can include it before export

In [27]:
df1,risky,optimal = rgt.get_risk_status_vehicle(df1) 
df1

Unnamed: 0,1P1,1P2,1P3,1P4,2P1,2P2,2P3,2P4,3P1,3P2,...,omit4,trial1,trial2,trial3,trial4,prem1,prem2,prem3,prem4,risk_status
1,0.0,34.0,64.0,2.0,0.0,57.1429,35.0649,7.79221,5.0,40.0,...,2,50.1,79.1,40.0,111.0,54.954955,32.478632,70.588235,14.615385,2.0
2,3.8961,59.7403,7.79221,28.5714,12.3288,61.6438,6.84932,19.1781,8.57143,45.7143,...,1,77.0,74.0,70.0,94.0,23.0,28.846154,31.372549,17.54386,1.0
3,0.0,18.0,78.0,4.0,0.0,13.8889,80.5556,5.55556,2.22222,20.0,...,0,50.1,38.1,45.0,40.0,57.264957,70.4,64.0,56.989247,2.0
4,1.23457,67.9012,0.0,30.8642,1.21951,65.8537,0.0,32.9268,0.0,77.6596,...,2,86.0,82.0,95.0,79.0,24.561404,21.904762,16.666667,15.957447,1.0
5,5.12821,35.8974,55.1282,3.84615,3.84615,61.5385,28.2051,6.41026,17.4419,40.6977,...,0,79.1,79.0,87.0,128.0,33.898305,41.044776,21.621622,19.496855,2.0
6,1.62602,85.3659,0.0,13.0081,2.06186,72.1649,0.0,25.7732,1.90476,69.5238,...,0,125.1,98.0,105.0,121.0,13.888889,15.517241,10.25641,1.626016,1.0
7,5.33333,62.6667,16.0,16.0,21.25,58.75,2.5,17.5,7.35294,58.8235,...,0,76.1,80.1,68.0,57.1,42.307692,40.740741,50.0,52.892562,1.0
8,4.0,44.0,0.0,52.0,6.89655,31.0345,5.17241,56.8966,0.0,42.8571,...,0,75.0,58.0,57.1,66.0,18.478261,42.574257,47.169811,41.071429,2.0
9,0.0,1.5625,98.4375,0.0,0.0,14.9254,82.0896,2.98507,1.85185,11.1111,...,2,65.1,69.0,55.0,48.0,26.436782,24.175824,46.078431,54.716981,2.0
11,3.63636,1.81818,94.5455,0.0,1.72414,3.44828,93.1034,1.72414,1.63934,4.91803,...,0,56.1,58.0,61.0,55.0,35.294118,31.764706,11.594203,29.487179,2.0


# 4B) Plotting Section (by Groups) 

**Graph of the table above**
* `variable` specifies the variable you want to plot. 
* For example, if I want to plot `choice_lat` over sessions for the experimental and control group, I would replace `variable` with `'choice_lat'`

##this could be improved in description

In [None]:
rgt.ls_bar_plot

# Functions with no section yet... :( 

# 5) Miscellaneous Section (Advanced Code) 

**Change your working directory**

Instructions: 
* Check your current working directory by running line 2. 
* From your working directory, make a data folder (call it: data), and add your .xlsx file into that folder. 
* Change `('C:\\Users\\dexte\\hathaway_1\\data')` to your current working directory and add '\\data'
* For example, my current working directory is `'C:\\Users\\dexte\\hathaway_1'`, so I enter `'C:\\Users\\dexte\\hathaway_1\\data'` into the brackets (slashes will be different if you are not using windows). 
* This saves all data in your data folder, instead of your current working directory. 

##default: just have their data in their cwd (easier option)
##future: write a function that will save files in separate folder (for them)

In [None]:
#checks current working directory
os.getcwd()

#changes working directory to whatever is included in brackets
os.chdir('C:\\Users\\dexte\\hathaway_1\\data') 