# ST-IAT (Single Target Implict Association Test) Analysis Toolkit

This toolkit is intended to be used with a Single-Target IAT built on Gorilla software with participant recruitment from Prolific, but is applicable to similar experiments.

Some of the tools built here are specific to the input files and data strucutre from my experiement. The D score calculation and means comparison algorithems are the most useful for general ST-IAT analysis.

Some variable names will need to be adjusted depending on the input and your specific experiement design.

## Experiment Design

This toolkit was built to analyse an ST-IAT in a 5 block configuration as follows: 
1. Training - Two concepts
2. Training - Congruent
3. Test - Congruent
4. Training - incongruent
5. Test - incongruent




Participants were shown either the congruent condition first or incongruent condition first, randomaly asigned and split evenly. 

# Split results into individual CSVs

Gorilla exports the experiement data for all participants in each condition as one CSV. I this steps splits the file into individual participant CSVs. 

## Runs on output file from Gorilla

## Run on both output files - Congruent & Incongruent - to generate 


In [None]:
import pandas as pd

# Read the CSV file into a pandas DataFrame
file_path = "yourfile.csv"
data = pd.read_csv(file_path)

# Extract the header
header = data.iloc[0:1]

# Extract all unique Participant Private IDs
unique_participants = data['Participant Private ID'].unique()

# Loop through each unique participant ID and create a separate CSV file
for participant in unique_participants:
    # Subset data for the current participant
    participant_data = data[data['Participant Private ID'] == participant]
    
    # Prepend the header to the participant's data
    participant_data_with_header = pd.concat([header, participant_data])
    
    # Define the output file path
    output_file_path = f"Participant_{participant}.csv"
    
    # Write the participant's data to a CSV file
    participant_data_with_header.to_csv(output_file_path, index=False)
    
    print(f"Written file for Participant ID: {participant}")

print("Done!")

# Calculate D Score & Means for Each Block

This is based on the improved IAT D score protocol (Greenwald et al., 2003) 

## Run on folder with all individual participant files - creates unified CSV with D score

## Creates 2 files: D score & Block mean calculation

In [None]:
import pandas as pd
import os
import chardet

def detect_encoding(file):
    with open(file, 'rb') as f:
        result = chardet.detect(f.read())
        return result['encoding']

def process_file(file):
    if not file.endswith('.csv'):
        return None, "File is not a CSV"

    # Detect file encoding
    encoding = detect_encoding(file)

    # Process .CSV file with detected encoding
    df = pd.read_csv(file, header=0, skiprows=[1], encoding=encoding)

    # Check if 'Reaction Time' exists in the columns
    if 'Reaction Time' not in df.columns:
        return None, "'Reaction Time' column not found"

    # 1. Delete all trials with RT greater than 10,000 msec
    df = df[df['Reaction Time'] < 10000]

    # 2. Delete subjects for whom more than 10% of trials have RT less than 300 msec
    if df[df['Reaction Time'] < 300].shape[0] / df.shape[0] > 0.1:
        return None, "Excluded due to more than 10% of trials with RT < 300 ms"

    # 3. Delete all "incorrect = 1" results
    df = df[df['Incorrect'] != 1]

    # Compute SD & Mean for each block
    results = {}
    mean_results = {}
    for condition in df['metaVirtual'].unique():
        condition_data = df[df['metaVirtual'] == condition]['Reaction Time']

        # Check for NaN or non-numeric values
        if condition_data.isnull().values.any() or not pd.api.types.is_numeric_dtype(condition_data):
            return None, f"Non-numeric or NaN values found in condition: {condition}"

        # Check if we have at least two data points to calculate standard deviation
        if condition_data.size < 2:
            return None, f"Not enough data points to compute standard deviation for condition: {condition}"

        results["stdev_" + condition] = condition_data.std()
        results["mean_" + condition] = condition_data.mean()
        
        # Save means to mean_results for CSV
        mean_results["mean_" + condition] = condition_data.mean()
        
    # Ensure all necessary conditions were found
    required_conditions = ["congruent_1", "congruent_2", "incongruent_1", "incongruent_2"]
    for condition in required_conditions:
        if f"mean_{condition}" not in results or f"stdev_{condition}" not in results:
            return None, f"Missing mean or standard deviation for condition: {condition}"

    # 4a. Compute inclusive sd for combined conditions
    sd24_data = pd.concat([df[df['metaVirtual'] == "congruent_1"]['Reaction Time'], 
                           df[df['metaVirtual'] == "incongruent_1"]['Reaction Time']])
    results["sd24"] = sd24_data.std()

    sd35_data = pd.concat([df[df['metaVirtual'] == "congruent_2"]['Reaction Time'], 
                           df[df['metaVirtual'] == "incongruent_2"]['Reaction Time']])
    results["sd35"] = sd35_data.std()

    # 5. Compute two mean differences
    results["mean_diff_1"] = results["mean_incongruent_1"] - results["mean_congruent_1"]
    results["mean_diff_2"] = results["mean_incongruent_2"] - results["mean_congruent_2"]

    # 6. Divide each difference score by its associated "inclusive" standard deviation
    results["ratio_1"] = results["mean_diff_1"] / results["sd24"]
    results["ratio_2"] = results["mean_diff_2"] / results["sd35"]

    # 7. D = the equal weight average of the two resulting ratios
    output = {}
    output["D"] = (results["ratio_1"] + results["ratio_2"]) / 2
    output["Participant Private ID"] = int(df["Participant Private ID"].unique()[0])
    output["Spreadsheet"] = df["Spreadsheet"].unique()[0]

    # Include the mean results in the output
    output.update(mean_results)
    
    return output, None

if __name__ == '__main__':
    results = []
    data_path = '/directory_path' #enter directory path
    for file in os.listdir(data_path):
        # full path to the file
        full_file_path = os.path.join(data_path, file)
        result, error_message = process_file(full_file_path)
        if result:
            results.append(result)
        else:
            print(f"File {full_file_path} excluded: {error_message}")

    results_df = pd.DataFrame(results)
    results_df.to_csv('results_mean_calc.csv', index=False)

    # Separate CSV for mean values #This file may be redundant
    mean_columns = [col for col in results_df.columns if col.startswith('mean_')]
    mean_df = results_df[["Participant Private ID", "Spreadsheet"] + mean_columns]
    mean_df.to_csv('mean_results.csv', index=False)

# D Score Significance & Conditions Stats

Calculates D score for each condition participant and checks for significant difference. I have not seen this used in literature but wanted to check on my own data.

## Run on D score result file - results_mean_calc

In [None]:
import numpy as np
import pandas as pd
from scipy import stats

# Path to the CSV file
file_path = "results_mean_calc.csv"

# Load the CSV file into a DataFrame
data = pd.read_csv(file_path)

# Extract the D scores and conditions
d_scores = data['D']
conditions = data['Spreadsheet']

# Count the number of D scores
num_d_scores = len(d_scores)
print(f"Number of D scores: {num_d_scores}")

# Calculate the overall mean and standard deviation of D scores
mean_d = np.mean(d_scores)
std_d = np.std(d_scores, ddof=1)
print(f"Overall mean D score: {mean_d}")
print(f"Overall standard deviation of D scores: {std_d}")

# Perform a one-sample t-test against the population mean of 0
t_stat, p_value = stats.ttest_1samp(d_scores, 0)
print(f"t-statistic: {t_stat}")
print(f"p-value: {p_value}")

alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis. The mean D score is significantly different from zero.")
else:
    print("Fail to reject the null hypothesis. There is no significant difference from zero.")

# Calculate the Mean, SD and count for each condition
conditions = data['Spreadsheet'].unique()
for condition in conditions:
    condition_scores = data[data['Spreadsheet'] == condition]['D']
    mean_condition = np.mean(condition_scores)
    std_condition = np.std(condition_scores, ddof=1)
    count_condition = len(condition_scores)
    print(f"\nCondition: {condition}")
    print(f"Number of D scores: {count_condition}")
    print(f"Mean D score: {mean_condition}")
    print(f"Standard Deviation: {std_condition}")

# Perform independent samples t-test between the two conditions
congruent_scores = data[data['Spreadsheet'] == 'Congruent_First']['D']
incongruent_scores = data[data['Spreadsheet'] == 'Incongruent_First']['D']

# Number of D scores in each condition
num_congruent = len(congruent_scores)
num_incongruent = len(incongruent_scores)

print(f"\nNumber of D scores in 'Congruent_First': {num_congruent}")
print(f"Number of D scores in 'Incongruent_First': {num_incongruent}")

# Perform t-test
t_statistic, p_value = stats.ttest_ind(congruent_scores, incongruent_scores)

# Print the results of the t-test
print("\nIndependent Samples t-Test:")
print(f"t-statistic: {t_statistic}")
print(f"p-value: {p_value}")

# Interpret the result
if p_value < alpha:
    print("Reject the null hypothesis. The difference between the means of the two conditions is significant.")
else:
    print("Fail to reject the null hypothesis. The difference between the means of the two conditions is not significant.")

# Compare condition means

## Runs on mean_calc file

This steps compares the mean RT of each condition block for all participants. Ideally you want to see a significant difference between the congruent and incongruent conditions regardless of the order they were presented.

In [None]:
import pandas as pd
from scipy.stats import ttest_rel

def main():
    # Load the CSV file with mean results
    df = pd.read_csv('results_mean_calc.csv')
    
    # Calculate the combined mean for congruent conditions
    df['mean_congruent_combined'] = df[['mean_congruent_1', 'mean_congruent_2']].mean(axis=1)
    
    # Calculate the combined mean for incongruent conditions
    df['mean_incongruent_combined'] = df[['mean_incongruent_1', 'mean_incongruent_2']].mean(axis=1)

    # Calculate the aggregated mean across all participants for congruent conditions
    total_mean_congruent_combined = df['mean_congruent_combined'].mean()
    
    # Calculate the aggregated mean across all participants for incongruent conditions
    total_mean_incongruent_combined = df['mean_incongruent_combined'].mean()
    
    # Calculate standard deviations
    sd_congruent_combined = df['mean_congruent_combined'].std()
    sd_incongruent_combined = df['mean_incongruent_combined'].std()

    # Perform paired sample t-test
    t_statistic, p_value = ttest_rel(df['mean_congruent_combined'], df['mean_incongruent_combined'])

    # Output the results
    print("Combined Mean for Congruent Conditions (mean congruent_1 & mean_congruent_2):", total_mean_congruent_combined)
    print("Combined Mean for Incongruent Conditions (mean incongruent_1 & mean_incongruent_2):", total_mean_incongruent_combined)
    print("Standard Deviation for Congruent Conditions:", sd_congruent_combined)
    print("Standard Deviation for Incongruent Conditions:", sd_incongruent_combined)
    
    # Report the t-test results
    print("Paired Sample t-test:")
    print("t-statistic:", t_statistic)
    print("p-value:", p_value)
    
    # Determine significance
    alpha = 0.05
    if p_value < alpha:
        print("The difference between the combined means is statistically significant (p < 0.05).")
    else:
        print("The difference between the combined means is not statistically significant (p >= 0.05).")
    
    # Optionally, write the combined means and additional statistics to a new CSV file for further usage
    combined_means = {
        'mean_congruent_combined': [total_mean_congruent_combined],
        'mean_incongruent_combined': [total_mean_incongruent_combined],
        'sd_congruent_combined': [sd_congruent_combined],
        'sd_incongruent_combined': [sd_incongruent_combined],
        't_statistic': [t_statistic],
        'p_value': [p_value],
        'significance': ['Significant' if p_value < alpha else 'Not Significant']
    }
    combined_means_df = pd.DataFrame(combined_means)
    combined_means_df.to_csv('combined_means_comparison.csv', index=False)
    
if __name__ == "__main__":
    main()

# Compare demographics & survey questions

## Run on participant demographic file

I included a demographic questioneer in Gorilla following the experimental task, as well as a 3 question survey. This step generates some demographic statistics and creates a new file with participants responses to the survey. Feel free to modify this step or ignore it. 

## Gives demographic statistics & creates file with survey responses for each participant 

In [None]:
import numpy as np
import pandas as pd

# Path to the input CSV file
input_file_path = "/Users/Michael/Library/Mobile Documents/com~apple~CloudDocs/Documents/School/PHD/Extended Essay/Implicit Association Test/Data/IAT Data/UK/Full 100/data_exp_176416-v11 (5)/data_exp_176416-v11_questionnaire-7y5d.csv"
# Path to the output CSV file
output_file_path = "/Users/Michael/IAT/UKRun/UK_Demographic_Final100.CSV"

# Read the input CSV file into a DataFrame
data = pd.read_csv(input_file_path)

# Prepare a dictionary to collect the necessary information for each participant
participant_info = {}

# Fill the dictionary with participant information
for index, row in data.iterrows():
    participant_id = row['Participant Private ID']
    
    if participant_id not in participant_info:
        participant_info[participant_id] = {
            'Participant ID': participant_id,
            'Participant Age': None,
            'Participant Gender': None,
            'response-familiar': None,
            'response-purchase': None,
            'response-use': None,
            'cronbach_alpha': None  # Initialize Cronbach's alpha column
        }
    
    question_key = row['Question Key']
    response = row['Response']
    
    if question_key == 'age':
        participant_info[participant_id]['Participant Age'] = response
    elif question_key == 'gender':
        participant_info[participant_id]['Participant Gender'] = response
    elif question_key == 'response-familiar':
        participant_info[participant_id]['response-familiar'] = response
    elif question_key == 'response-purchase':
        participant_info[participant_id]['response-purchase'] = response
    elif question_key == 'response-use':
        participant_info[participant_id]['response-use'] = response

# Calculate Cronbach's alpha for each participant
for participant_id, info in participant_info.items():
    # Extract responses for the three questions
    responses = {
        'response-familiar': info['response-familiar'],
        'response-purchase': info['response-purchase'],
        'response-use': info['response-use']
    }
    
    # Check if any response is None (missing)
    if None not in responses.values():
        # Convert to DataFrame
        response_df = pd.DataFrame([responses], dtype=float)
        # Calculate Cronbach's alpha
        alpha = cronbach_alpha(response_df)
        participant_info[participant_id]['cronbach_alpha'] = alpha

# Create a DataFrame from the participant_info dictionary
output_data = pd.DataFrame.from_dict(participant_info, orient='index')

# Convert 'Participant Age' to numeric (it will convert NaN for non-numeric entries)
output_data['Participant Age'] = pd.to_numeric(output_data['Participant Age'], errors='coerce')

# Calculate mean age and standard deviation
mean_age = output_data['Participant Age'].mean()
std_age = output_data['Participant Age'].std()

# Count number of participants by gender
gender_counts = output_data['Participant Gender'].value_counts()

# Write the output DataFrame to a new CSV file
output_data.to_csv(output_file_path, index=False)

# Print the results
print("Data extraction complete. New CSV file created.")
print(f"Mean age: {mean_age}")
print(f"Age standard deviation: {std_age}")
print("Gender counts:")
print(gender_counts)