T. Tarantola, D. Kumaran, P. Dayan, & B. De Martino. (in press) Prior preferences beneficially influence social and non-social learning. <it>Nature Communications</it>.

#Data Extraction & Exclusion Tests: Pilot study

In [1]:
import glob
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import matplotlib.gridspec as gridspec
import statsmodels.api as sm
import scipy.stats as ss
from scipy.optimize import curve_fit
import pylab as pl
import math
import seaborn as sns
import rpy2
import pystan
from pystan.external.pymc import plots
import pickle

% matplotlib inline
pd.options.display.max_rows = 999 # Set the maximum display to 999 rows so the entire dataframe is visible in the notebook
pd.options.display.max_columns = 999 # Same for columns



##Data extraction (behavioral)

### Import Data
First, we import and concatenate the CSV files that PsychoPy generates for each experimental session. We put this in our data frame, called "data."

In [2]:
path =r'../data/social_pilot/raw_data' # This is the folder that all data CSV files are saved in
allFiles = glob.glob(path + '/*.csv')
data = pd.DataFrame()
list = []
for files in allFiles: # Read all CSV files in the "path" folder
    df = pd.read_csv(files, index_col=None, header=0)
    list.append(df)
data = pd.concat(list) # Concatenate all CSV files into one big data frame called "data"

# Replace dots in variable names with underscores, to make it easier for pandas to handle
data.rename(columns=lambda x: x.replace('.', '_'), inplace=True)

### Calculate and create new variable columns in the data frame
We sort the data first by participant, then participant's experimental session (there might be more than one if the task crashed, etc), the image pair, the rest block it was presented during, and lastly the order of presentation within that block.

We then index the data frame by this new order.

In [5]:
data = data.sort(['participant','session','img_correct','block_loop_thisN','outcome_loop_thisN']) # Sort the data
data.index = range(1,len(data)+1) # Re-do the index so it conforms to the sorted data

  if __name__ == '__main__':


#### Bid and choice tasks
First, create two variables to explicitly indicate the participant's chosen and unchosen items in the binary choice task:

In [8]:
for x in range(1,len(data)+1):
    if data.loc[x,'key_resp_choice1_keys']=='left':
        data.loc[x,'chosen']=data.loc[x,'choice_left']
        data.loc[x,'unchosen']=data.loc[x,'choice_right']
    elif data.loc[x,'key_resp_choice1_keys']=='right':
        data.loc[x,'chosen']=data.loc[x,'choice_right']
        data.loc[x,'unchosen']=data.loc[x,'choice_left']

Next, calculate the z-scores for each bid, by participant, using the first and second bid task as separate populations.

In [9]:
bid_means = data.groupby(['participant']).bdm_bid1_response.mean() # Calculate each participant's mean bid
bid_sds = data.groupby(['participant']).bdm_bid1_response.std(ddof=0) # Calculate each participant's standard deviation of their bids

for x in range(1,len(data)+1): # For all rows...
    p = data.loc[x,'participant'] # Variable for the participant
    p_mean = bid_means[p] # Variable for that participant's mean bid
    p_sd = bid_sds[p] # That participant's bid SD
    bid = data.loc[x,'bdm_bid1_response'] # If that row is from the bid task, variable for that bid
    data.loc[x,'bdm_bid_response_zscore'] = (bid - p_mean)/p_sd # Add a new variable to the dataframe with that bid's z-score at the participant level

Import the item familiarity and consumption quesitonnaire data, including their within-participant z-scores, into the BDM task rows. 

The value labels are:

Familiarity: "How familiar are you with this item?"

    1 = Not at all
    2 = Somewhat
    3 = Very

Consumption: "How often have you consumed this item?"

    1 = Never
    2 = Sometimes
    3 = Frequently

Then move the BDM bid data, and their z-scores, into the binary choice rows:

In [11]:
for x in range(1,len(data)+1): # For all rows...
    if not pd.isnull(data.loc[x,'chosen']): # ... if there is a value in 'chosen' indicating that row is from the choice task
        bidmatch_all = data[data.loc[x,'chosen']==data.bdm_img] # find all BDM rows in the dataset that match that chosen item
        bidmatch_participant = bidmatch_all[data['participant']==data.loc[x,'participant']] # find the subset of bids from the same participant
        bidmatch_participant_notnull = bidmatch_participant[bidmatch_participant['bdm_bid1_response'].notnull()] # from those data, pick the BDM task
        bidmatch_participant_notnull_index = bidmatch_participant_notnull['bdm_bid1_response'].idxmax() # find the maximum to define as a single value index
        data.loc[x,'chosen_bid'] = bidmatch_participant_notnull.loc[bidmatch_participant_notnull_index,'bdm_bid1_response'] # define 'chosen_bid' on the choice row as the bid for that item
        data.loc[x,'chosen_bid_zscore'] = bidmatch_participant_notnull.loc[bidmatch_participant_notnull_index,'bdm_bid_response_zscore'] # define 'chosen_bid_zscore' on the choice row as the participant's z-score for the bid for that item from the same (first/last) part of the experiment
        
# Do the same for the unchosen items in each choice
for x in range(1,len(data)+1):
    if not pd.isnull(data.loc[x,'unchosen']):
        bidmatch_all = data[data.loc[x,'unchosen']==data.bdm_img]
        bidmatch_participant = bidmatch_all[data['participant']==data.loc[x,'participant']]
        bidmatch_participant_notnull = bidmatch_participant[bidmatch_participant['bdm_bid1_response'].notnull()]
        bidmatch_participant_notnull_index = bidmatch_participant_notnull['bdm_bid1_response'].idxmax()
        data.loc[x,'unchosen_bid'] = bidmatch_participant_notnull.loc[bidmatch_participant_notnull_index,'bdm_bid1_response']
        data.loc[x,'unchosen_bid_zscore'] = bidmatch_participant_notnull.loc[bidmatch_participant_notnull_index,'bdm_bid_response_zscore']
        



Generate a variable ('choice_dv') for the difference in bids between the chosen and unchosen items in each pair, and a variable ('choice_dv_zscore') for the difference between the bids' z-scores:

In [12]:
data['choice_dv'] = data['chosen_bid'] - data['unchosen_bid'] # Difference between the bid amounts
data['choice_dv_zscore'] = data['chosen_bid_zscore'] - data['unchosen_bid_zscore'] # Difference between the z-scores of the items' bids

Generate similar variables, but for the difference between the left and right choices:

In [13]:
data.loc[data['chosen']==data['choice_left'],'choice_dv_left_minus_right'] = data[data['chosen']==data['choice_left']]['choice_dv']
data.loc[data['chosen']==data['choice_right'],'choice_dv_left_minus_right'] = -data[data['chosen']==data['choice_right']]['choice_dv']

data.loc[data['chosen']==data['choice_left'],'choice_dv_zscore_left_minus_right'] = data[data['chosen']==data['choice_left']]['choice_dv_zscore']
data.loc[data['chosen']==data['choice_right'],'choice_dv_zscore_left_minus_right'] = -data[data['chosen']==data['choice_right']]['choice_dv_zscore']

Generate a dummy variable indicating whether the participant chose the item on the left:

In [14]:
data.loc[data['chosen']==data['choice_left'],'left_chosen'] = 1
data.loc[data['chosen']==data['choice_right'],'left_chosen'] = 0

Then calculate the z-scores for the choice confidence ratings, by participant, by task (first or second choice task).

In [15]:
conf_means = data.groupby(['participant']).confidence_rating1_response.mean() # Calculate each participant's mean confidence rating (from 1 to 6) during the choice task
conf_sds = data.groupby(['participant']).confidence_rating1_response.std(ddof=0) # Calculate each participant's standard deviation of their confidence rating in the choice task

for x in range(1,len(data)+1): # For all rows...
    p = data.loc[x,'participant'] # Variable for the participant
    p_mean = conf_means[p] # Variable for that participant's mean confidence in the choice task
    p_sd = conf_sds[p] # That participant's confidence SD for the choice task
    conf = data.loc[x,'confidence_rating1_response'] # If that row is from the choice task, variable for that confidence rating
    data.loc[x,'confidence_rating_response_zscore'] = (conf - p_mean)/p_sd # Add a new variable to the dataframe with that confidence rating's z-score at the participant level

Create a variable for the z-scores, by participant and by choice task, for the response times in each choice task.

In [18]:
choice_rt_means = data.groupby(['participant']).key_resp_choice1_rt.mean()
choice_rt_sds = data.groupby(['participant']).key_resp_choice1_rt.std(ddof=0)

for x in range(1,len(data)+1):
    p = data.loc[x,'participant']
    p_mean = choice_rt_means[p]
    p_sd = choice_rt_sds[p]
    rt = data.loc[x,'key_resp_choice1_rt']
    data.loc[x,'key_resp_choice_rt_zscore'] = (rt - p_mean)/p_sd

####Learning task
Now we create a variable "pair_rep" to indicate how many times each image pair has been presented so far in the experiment.

In [19]:
data_infer = data[data['img_correct'].notnull()] # Subset of "data" where 'img_correct' has a value
data_infer = data_infer[data_infer['practice_loop_thisRepN']!=0] # Subset of "data_infer" that does NOT include practice trials
data_infer['pair_rep'] = range(1,31) * (len(data_infer)/30) # Note: Change range and denominator if number of item presentations differs from 30
data['pair_rep'] = data_infer['pair_rep'] # Populate main data frame with this new variable

Next, we generate a variable "response_correct" to indicate whether the subject responded correctly on each given trial.

In [20]:
# Create series variables from the sorted dataframe for easy handling in the loop below
arr_img_correct = data['img_correct']
arr_img_left = data['img_left']
arr_img_right = data['img_right']
arr_infer_resp = data['infer_resp_keys']

# Generate a variable indicating whether the response was correct on each given trial
for x in range(1,len(data)+1): 
    if arr_img_correct[x]==arr_img_left[x] and arr_infer_resp[x]=='left':
        data.at[x,'response_correct'] = 1
    elif arr_img_correct[x]==arr_img_left[x] and arr_infer_resp[x]=='right':
        data.at[x,'response_correct'] = 0
    elif arr_img_correct[x]==arr_img_right[x] and arr_infer_resp[x]=='right':
        data.at[x,'response_correct'] = 1
    elif arr_img_correct[x]==arr_img_right[x] and arr_infer_resp[x]=='left':
        data.at[x,'response_correct'] = 0

Then, create a variable "feedback_correct" to indicate whether the box appeared around the correct (=1) or incorrect (=0) item after the response was collected on that trial.

In [23]:
for x in range(1,len(data)+1):
    if pd.isnull(data.loc[x,'practice_loop_thisRepN']): # If the row is NOT part of the learning practice block
        if data.loc[x,'set_outcome_outcm_img']==data.loc[x,'img_correct']:
            data.at[x,'feedback_correct'] = 1
        elif data.loc[x,'set_outcome_outcm_img']==data.loc[x,'img_wrong']:
            data.at[x,'feedback_correct'] = 0

Then, create a variable "reward" to indicate whether the outcome yellow box on a particular trial matched the participant's response on that trial. So for example, if the participant chose the correct item, and then the yellow box was displayed around the correct item, that reward would be coded as 1 for that trial. Similarly, if the participant chose the wrong item, but the yellow box then also appeared around the wrong item, the reward would also be coded as 1.

If, however, the participant chose a different item from the one the yellow box then displays around (e.g. the participant chose the correct item but the box displays around the wrong item, or vice versa), the reward is coded as 0 for that trial.

In [24]:
for x in range(1,len(data)+1):
    if pd.isnull(data.loc[x,'practice_loop_thisRepN']): # If the row is not part of the inference practice block
        if data.loc[x,'response_correct']==data.loc[x,'feedback_correct']:
            data.at[x,'reward'] = 1
        elif data.loc[x,'response_correct']!=data.loc[x,'feedback_correct']:
            data.at[x,'reward'] = 0

Then calculate the number of correct feedback boxes (where the yellow box appeared around the correct item) seen so far for that pair.

In [25]:
for x in range(1,len(data)+1):
    if not pd.isnull(data.loc[x,'img_correct']) and pd.isnull(data.loc[x,'practice_loop_thisTrialN']): # If in the main inference task (not the practice block)
        part = data.loc[x,'participant']
        itm = data.loc[x,'img_correct']
        pair = data.loc[x,'pair_rep']
        data_part = data[data['participant']==part]
        data_part_itm = data_part[data_part['img_correct']==itm]
        feedbck_corr_sum = data_part_itm[data_part_itm['pair_rep']<pair].feedback_correct.sum() # Sum of the correct feedback (yellow boxes around the correct answer) of trials in that item for that participant with lower pair_rep number than the current row
        data.loc[x,'feedback_correct_sum'] = feedbck_corr_sum 
        data.loc[x,'feedback_wrong_sum'] = (pair - 1) - feedbck_corr_sum # Subtract to get the number of wrong feedback up until that point (yellow boxes around the wrong answer)

Create a variable for the difference between the number of correct and wrong feedback boxes displayed so far.

In [26]:
data['feedback_correct_sum_diff'] = data['feedback_correct_sum'] - data['feedback_wrong_sum']

Calculate the trial number, and the number of trials (presentation of other item pairs) between the current and last presentation of that item pair (interference)

In [28]:
# Calculate the trial number, for all three blocks combined
for x in range(1,len(data)+1):
    if data.loc[x,'block_loop_thisN']==0:
        data.loc[x,'learning_trial'] = data.loc[x,'outcome_loop_thisN']
    elif data.loc[x,'block_loop_thisN']==1:
        data.loc[x,'learning_trial'] = data.loc[x,'outcome_loop_thisN'] + 200
    elif data.loc[x,'block_loop_thisN']==2:
        data.loc[x,'learning_trial'] = data.loc[x,'outcome_loop_thisN'] + 400

for x in range(1,len(data)+1):
    if data.loc[x,'pair_rep'] > 1:
        data.loc[x,'interference'] = data.loc[x,'learning_trial'] - data.loc[x-1,'learning_trial'] - 1

Calculate the z-scores, by participant, for their reaction times in the inference task.

In [29]:
rt_means = data.groupby(['participant']).infer_resp_rt.mean()
rt_sds = data.groupby(['participant']).infer_resp_rt.std(ddof=0)

for x in range(1,len(data)+1):
    p = data.loc[x,'participant']
    p_mean = rt_means[p]
    p_sd = rt_sds[p]
    rt = data.loc[x,'infer_resp_rt']
    data.loc[x,'infer_resp_rt_zscore'] = (rt - p_mean)/p_sd

Next, move the participant's item bids into new variables aligned with the inference task. These will be signed to indicate congruency with the choice being inferred; the participant's bid for the incorrect item is subtracted from the bid for the correct item to create a correct_bid_dv variable. Strongly negative values for this variable, therefore, indicate a strong preference in the opposite direction as the choice being inferred. Strongly positive values indicate strong congruency with the choice being inferred.

In [30]:
for x in range(1,len(data)+1): # For all rows...
    if not pd.isnull(data.loc[x,'img_correct']) and pd.isnull(data.loc[x,'practice_loop_thisTrialN']): # ... if there is a value in 'img_correct' indicating that row is from the inference task, and it is not in the practice block
        bid_inf_all = data[data.loc[x,'img_correct']==data.bdm_img] # find all BDM rows in the dataset that match that correct item
        bid_inf_participant = bid_inf_all[data['participant']==data.loc[x,'participant']] # find the subset of bids from the same participant
        bid_inf_participant_notnull = bid_inf_participant[bid_inf_participant['bdm_bid1_response'].notnull()] # from those data, pick the BDM task
        bid_inf_participant_notnull_index = bid_inf_participant_notnull['bdm_bid1_response'].idxmax() # find the maximum to define as a single value index
        data.loc[x,'correct_bid'] = bid_inf_participant_notnull.loc[bid_inf_participant_notnull_index,'bdm_bid1_response'] # define 'correct_bid' on the inference row as the bid for that item
        data.loc[x,'correct_bid_zscore'] = bid_inf_participant_notnull.loc[bid_inf_participant_notnull_index,'bdm_bid_response_zscore'] # and add a variable for that participant's zscore of the bid for the correct item
        
# Do the same for the incorrect items in each inference trial
for x in range(1,len(data)+1): # For all rows...
    if not pd.isnull(data.loc[x,'img_wrong']) and pd.isnull(data.loc[x,'practice_loop_thisTrialN']):
        bid_infwr_all = data[data.loc[x,'img_wrong']==data.bdm_img] 
        bid_infwr_participant = bid_infwr_all[data['participant']==data.loc[x,'participant']] 
        bid_infwr_participant_notnull = bid_infwr_participant[bid_infwr_participant['bdm_bid1_response'].notnull()] 
        bid_infwr_participant_notnull_index = bid_infwr_participant_notnull['bdm_bid1_response'].idxmax() 
        data.loc[x,'wrong_bid'] = bid_infwr_participant_notnull.loc[bid_infwr_participant_notnull_index,'bdm_bid1_response']
        data.loc[x,'wrong_bid_zscore'] = bid_infwr_participant_notnull.loc[bid_infwr_participant_notnull_index,'bdm_bid_response_zscore']
        
# Define congruency variables for each BDM task:
data['inf_bid_dv']=data['correct_bid'] - data['wrong_bid'] # Difference in absolute bid amounts from the bid task
data['inf_bid_dv_zscore']=data['correct_bid_zscore'] - data['wrong_bid_zscore'] # Difference in bid amount z-scores from the bid task


Was the correct answer in the learning task pair chosen by the participant in the choice task? Create variables that indicate the number of times it was chosen (each pair was presented in the choice task twice; 0=never chosen; 1=chosen once but not chosen the second time; 2=chosen twice).

In [31]:
chosen_data = pd.DataFrame(data.groupby('participant').chosen.value_counts()) # The number of times each item was chosen in the choice task, grouped by participant
chosen_data.columns = ['number'] # Give the column with this number a name
chosen_data.index.names = ['participant','item'] # Name the hierarchical index of this dataframe
for x in range(1,len(data)+1): # For all rows...
    if not pd.isnull(data.loc[x,'img_correct']) and pd.isnull(data.loc[x,'practice_loop_thisTrialN']): # If the row is part of the inference task, and not part of the practice block...
        part = data.loc[x,'participant'] # The participant
        itm = data.loc[x,'img_correct'] # The chosen item (correct answer) they're trying to learn
        if not chosen_data.xs((part,itm), level=('participant','item')).values: # If the correct answer was not chosen by the participant at least once during the first choice task
            data.loc[x,'choice_correct_congruence'] = 0 # ...the choice congruence score is 0
        else: # Otherwise, it's the number of times that item was chosen in the choice task (either 1 or 2)
            data.loc[x,'choice_correct_congruence'] = chosen_data.number.xs((part,itm), level=('participant','item')).values

Move the choice confidence ratings, their within-participant (grouped by first and second choice task) z-scores, and response times and their z-scores, into the inference task rows. Since each participant indicated their choice in each item pair twice (flipping side of the screen once), there are 4 variables stored (each of these confidence ratings and their z-scores).

In [34]:
for x in range(1,len(data)+1): # For all rows...
    if not pd.isnull(data.loc[x,'img_correct']) and pd.isnull(data.loc[x,'practice_loop_thisTrialN']): # ... if there is a value in 'img_correct' indicating that row is from the inference task, and it is not in the practice block
    # CONFIDENCE 
    # Get the confidence ratings from the choice task, from the trials where the correct item appears on the LEFT   
        conf_left_inf_all = data[data.loc[x,'img_correct']==data.choice_left] # find all choice rows in the dataset that match that correct item, where the correct item was shown on the LEFT
        conf_left_inf_participant = conf_left_inf_all[data['participant']==data.loc[x,'participant']] # find the subset of confidence ratings from the same participant
        conf_left_inf_participant_notnull = conf_left_inf_participant[conf_left_inf_participant['confidence_rating1_response'].notnull()] # from those data, pick the choice task
        conf_left_inf_participant_notnull_index = conf_left_inf_participant_notnull['confidence_rating1_response'].idxmax() # find the maximum to define as a single value index
        data.loc[x,'conf_correct_on_left'] = conf_left_inf_participant_notnull.loc[conf_left_inf_participant_notnull_index,'confidence_rating1_response'] # define 'conf_correct_on_left' on the inference row as the confidence rating for that item where the correct item was displayed on the LEFT
        data.loc[x,'conf_correct_on_left_zscore'] = conf_left_inf_participant_notnull.loc[conf_left_inf_participant_notnull_index,'confidence_rating_response_zscore'] # and add a variable for that participant's zscore of the confidence rating for the correct item from the choice task
    # Get the confidence ratings from the choice task, where the correct item appears on the RIGHT
        conf_right_inf_all = data[data.loc[x,'img_correct']==data.choice_right] # find all choice rows in the dataset that match that correct item, where the correct item was shown on the RIGHT
        conf_right_inf_participant = conf_right_inf_all[data['participant']==data.loc[x,'participant']] # find the subset of confidence ratings from the same participant
        conf_right_inf_participant_notnull = conf_right_inf_participant[conf_right_inf_participant['confidence_rating1_response'].notnull()] # from those data, pick the choice task
        conf_right_inf_participant_notnull_index = conf_right_inf_participant_notnull['confidence_rating1_response'].idxmax() # find the maximum to define as a single value index
        data.loc[x,'conf_correct_on_right'] = conf_right_inf_participant_notnull.loc[conf_right_inf_participant_notnull_index,'confidence_rating1_response'] # define 'conf1_correct_on_right' on the inference row as the confidence rating for that item where the correct item was displayed on the RIGHT
        data.loc[x,'conf_correct_on_right_zscore'] = conf_right_inf_participant_notnull.loc[conf_right_inf_participant_notnull_index,'confidence_rating_response_zscore'] # and add a variable for that participant's zscore of the confidence rating for the correct item from the choice task
   

    # CHOICE RESPONSE TIMES
    # Get the choice response tims from the choice task, from the trials where the correct item appears on the LEFT   
        chc_left_inf_all = data[data.loc[x,'img_correct']==data.choice_left] 
        chc_left_inf_participant = chc_left_inf_all[data['participant']==data.loc[x,'participant']]
        chc_left_inf_participant_notnull = chc_left_inf_participant[chc_left_inf_participant['key_resp_choice1_rt'].notnull()] 
        chc_left_inf_participant_notnull_index = chc_left_inf_participant_notnull['key_resp_choice1_rt'].idxmax()
        data.loc[x,'chc_correct_on_left_rt'] = chc_left_inf_participant_notnull.loc[chc_left_inf_participant_notnull_index,'key_resp_choice1_rt'] 
        data.loc[x,'chc_correct_on_left_rt_zscore'] = chc_left_inf_participant_notnull.loc[chc_left_inf_participant_notnull_index,'key_resp_choice_rt_zscore'] 
    # Get the choice response tims from the choice task, from the trials where the correct item appears on the RIGHT   
        chc_right_inf_all = data[data.loc[x,'img_correct']==data.choice_right] 
        chc_right_inf_participant = chc_right_inf_all[data['participant']==data.loc[x,'participant']]
        chc_right_inf_participant_notnull = chc_right_inf_participant[chc_right_inf_participant['key_resp_choice1_rt'].notnull()] 
        chc_right_inf_participant_notnull_index = chc_right_inf_participant_notnull['key_resp_choice1_rt'].idxmax()
        data.loc[x,'chc_correct_on_right_rt'] = chc_right_inf_participant_notnull.loc[chc_right_inf_participant_notnull_index,'key_resp_choice1_rt'] 
        data.loc[x,'chc_correct_on_right_rt_zscore'] = chc_right_inf_participant_notnull.loc[chc_right_inf_participant_notnull_index,'key_resp_choice_rt_zscore'] 

Calculate the mean confidence response time z-scores of the two presentations in the choice task:

In [35]:
data['conf_mean_zscore'] = (data['conf_correct_on_left_zscore'] + data['conf_correct_on_right_zscore'])/2
data['chc_rt_mean_zscore'] = (data['chc_correct_on_left_rt_zscore'] + data['chc_correct_on_right_rt_zscore'])/2

Create two more dummy variables indicating whether the participant chose the left choice on the first presentation of that pair, and the right choice on the second presentation of that pair.

In [44]:
data.loc[((data['left_chosen']==1) & (data['choice_presentation']==1)),'left_chosen_firstpres'] = 1
data.loc[((data['left_chosen']==0) & (data['choice_presentation']==1)),'left_chosen_firstpres'] = 0
data.loc[((data['left_chosen']==1) & (data['choice_presentation']==2)),'right_chosen_secondpres'] = 0
data.loc[((data['left_chosen']==0) & (data['choice_presentation']==2)),'right_chosen_secondpres'] = 1

Create a dummy variable to indicate whether the correct image in the learning task was displayed on the left side of the screen.

In [46]:
data.loc[(data['img_correct']==data['img_left']), 'correct_on_left'] = 1
data.loc[(data['img_correct']==data['img_right']), 'correct_on_left'] = 0

Finally, create a dummy variable to indicate whether the participant responded "left".

In [47]:
data.loc[data['infer_resp_keys']=='left', 'infer_response_left'] = 1
data.loc[data['infer_resp_keys']=='right', 'infer_response_left'] = 0

Save the dataframe as CSV and PKL files

In [48]:
data.to_csv(path_or_buf=r'../data/social_pilot/data_processed_social_pilot.csv')
data.to_pickle(r'../data/social_pilot/data_processed_social_pilot.pkl')

##Exclusion criteria

We use the same exclusion criteria that we kept for the eye tracking group:

Participants’ data will be excluded if:

1)	They indicate at any point before, during, or after testing that they do not meet the following eligibility criteria:
a.	Are at least 18 years old,
b.	Are proficient in English, and
c.	Do not suffer from any psychiatric or neurological disorder.

2)	They end their participation before the conclusion of the study, including by asking to leave before the post-experiment wait period has ended, or by consuming food or non-water drink that was not purchased during the auction.

3)	They indicate that they did not comply with the requirement to drink only water and refrain from eating for 3 hours before attending the session.

4)	Their responses in the choice task are inconsistent to the extent that a response on the first presentation of an item pair does not significantly and positively predict their response during the second presentation of that item pair. This will be tested using a logistic regression analysis predicting the response during the second presentation of the item pairs, using the response during the first presentation of that pair as the independent variable. The regression coefficient weighting this variable must be greater than zero with a probability of more than 95%.

5 & 6) Not used for eye tracking group

7)	If their bids do not predict their choices. Specifically, participants will be excluded if the inverse temperature parameter for a participant is five or more times larger than the mean for the entire group. (See De Martino, Fleming, Garrett, and Dolan, 2012, Nature Neuroscience.)

8)	If their mean performance on the learning task is not significantly above chance. 


###Criteria 1-3: Failure to abide by study requirements
No participants met any of these criteria. (See participant log at 'recruitment/participant_log.xlsx')

###Criterion 4: Choice consistency

In [51]:
def check_choice_consistency(subj):
    data_subset = data[data['participant']==subj]
    data_subset = data_subset[data_subset['chosen'].notnull()]
    data_subset['intercept'] = 1.0
    logit = sm.Logit(data_subset['right_chosen_secondpres'], data_subset[['left_chosen_firstpres','intercept']])
    result = logit.fit()
    print subj, result.summary()

def check_choice_consistency_linear(subj):
    data_subset = data[data['participant']==subj]
    data_subset = data_subset[data_subset['chosen'].notnull()]
    data_subset['intercept'] = 1.0
    x = np.array(data_subset['left_chosen_firstpres'])
    y = np.array(data_subset['right_chosen_secondpres'])
    slope, intercept, r_value, p_value, slope_std_error = ss.linregress(x, y)
    return subj, slope, p_value

for x in data['participant'].unique():
    if check_choice_consistency_linear(x)[1]<0 or check_choice_consistency_linear(x)[2]>0.05:
        print 'Exclude ',x

For all participants, the first choice significantly predicts the second choice with greater than 95% probability. Because many participants showed perfect consistency, logistic regressions were impossible to estimate, so linear regressions were substituted. 

No participants would be excluded under this criterion.

In [52]:
for x in data['participant'].unique():
    print 'Number of inconsistent choices for participant ', x, len(data.loc[data['participant']==x, 'chosen'].unique())-21
#data.groupby(['participant']).chosen.value_counts()
len(data['chosen'].unique())

Number of inconsistent choices for participant  P1 1
Number of inconsistent choices for participant  P10 1
Number of inconsistent choices for participant  P11 1
Number of inconsistent choices for participant  P12 0
Number of inconsistent choices for participant  P2 0
Number of inconsistent choices for participant  P3 1
Number of inconsistent choices for participant  P4 2
Number of inconsistent choices for participant  P5 0
Number of inconsistent choices for participant  P6 2
Number of inconsistent choices for participant  P7 20
Number of inconsistent choices for participant  P8 0
Number of inconsistent choices for participant  P9 0


41

We exclude participant P7 for having made perfectly inconsistent choices.

In [57]:
data = data[data['participant']!='P7']
data.reset_index(drop=True)
data.to_csv(path_or_buf=r'../data/social_pilot/data_processed_social_pilot_wexclusions.csv')
data.to_pickle(r'../data/social_pilot/data_processed_social_pilot_wexclusions.pkl')