# Experiment Design and Analysis

The objective of this notebook is to explore Trust Game & First-Price auction theory using real world data.

We'll be using the following dataset for notebook exploration:
https://www.sciencedirect.com/science/article/abs/pii/S0022053107000178

## Local files:

Below, please amend the inputs according to your pathways/needs.



In [None]:
TRUST_DATA_PATH = ""
FPA_DATA_PATH = ""
README_PATH = ""

In [None]:
import pandas as pd
import numpy as np
import statsmodels.api as sm
import statsmodels.stats.api as sms
from scipy import stats

trust_data = pd.read_csv(TRUST_DATA_PATH)
fpa_data = pd.read_csv(FPA_DATA_PATH)





In [None]:
#uncomment the below line to view readme files for this dataset (includes explanation of variable names)
#!cat README_PATH

# uncomment the below line to view snippet of csv file
# trust_data.head()
# fpa_data.head()

For the Trust Game, subjects are grouped in pairs, paired with one assigned the role of an investor and another a recipient. Let's examine the correlation between the amounts the investors invest and the amounts the recipients return

In [None]:
def inv_rec_corrcoef(provided_data, dec_col, decision_1, decision_2, cols_to_check):
    """
    Compute the inverse of the repeated measures correlation coefficient between two decisions
    across specified columns in the provided DataFrame.
    Parameters:
    provided_data (pd.DataFrame): The input DataFrame containing the data.
    dec_col (str): The name of the column representing different decisions.
    decision_1 (str): The name of the first decision column.
    decision_2 (str): The name of the second decision column.
    cols_to_check (list): List of column names to check for repeated measures.
    Returns:
    float: The inverse of the repeated measures correlation coefficient.
    """

    df = provided_data.copy()
    df.columns = df.columns.str.strip()
    df.columns = df.columns.str.replace(r"\s+", " ", regex=True)

    dec_1 = df[df[dec_col] == decision_1][cols_to_check]
    dec_2 = df[df[dec_col] == decision_2][cols_to_check]

    merged = dec_1.merge(dec_2, on=cols_to_check[:1], suffixes=('_1', '_2'))
    
    r = np.corrcoef(merged[f'{cols_to_check[1]}_1'], merged[f'{cols_to_check[1]}_2'])[0, 1]
    return np.float64(round(r, 2))

In [None]:
inv_rec_corrcoef(provided_data=trust_data,
                dec_col='decision type',
                decision_1="INVEST",
                decision_2="RETURN",
                cols_to_check=['Period','group #','decision']
                )


For the first-price auctions experiment, there are ten experimental sessions, with eight subjects per session. In this context, subjects are tasked with completing auction and lottery (Holt-Laury 2002) tasks in two orders. In five of the ten sessions, subjects first complete a lottery task, followed by 30 rounds of auctions. In the other five sessions, subjects first complete 30 rounds of auctions, followed by a lottery task. At the end of each session, subjects complete a demographics survey. The data sets extract the first period auction data for each treatment.

In [None]:
def ate_fpa_payoff(provided_data, bid, val, treatment, k1_8_lot_exp='k1_8_lot_exp', k1_8_exp_lot='k1_8_exp_lot'):
    """
    Compute the average treatment effect (ATE) on the bidval_ratio outcome variable
    between the two treatment groups.
    Parameters
    ----------
    provided_data : pd.DataFrame
        The input data containing the necessary columns.
    required_cols : list
        A list of required column names that must be present in the provided_data.
    treatment_cols : list
        A list of treatment column names to be used for the analysis.
    """

    # Ensure input is a DataFrame
    if not isinstance(provided_data, pd.DataFrame):
        raise TypeError("Input must be a pandas DataFrame.")
    provided_data = provided_data.copy()

    # Ensure bid,val,treatment are strings
    if not all(isinstance(col, str) for col in [bid, val, treatment]):
        raise TypeError("Column names must be strings.")
    
    # 1) Create bidval_ratio in-place
    provided_data['bidval_ratio'] = provided_data[bid] / provided_data[val]
  
    # 2) Compute means
    lot_auc = provided_data.loc[provided_data['treatment']==k1_8_lot_exp, 'bidval_ratio'].mean()
    auc_lot = provided_data.loc[provided_data['treatment']==k1_8_exp_lot, 'bidval_ratio'].mean()

    # 3) Return results DataFrame
    return pd.DataFrame({
        'lot_auc_mean':  [round(lot_auc,  2)],
        'auc_lot_mean':  [round(auc_lot,  2)],
        'diff in means':[round(lot_auc - auc_lot, 2)]
    })


In this case, say that the control for the first-price auction experiment is the order in which subjects complete the lottery task followed by the auction task (k1_8_lot_exp) and the outcome variable we want to measure is the bid-value ratio (b/v).

In [None]:
display(ate_fpa_payoff(provided_data=fpa_data,
                      bid='b',
                      val='v',
                      treatment='treatment',
                      k1_8_lot_exp='k1_8_lot_exp',
                      k1_8_exp_lot='k1_8_exp_lot'
                        ))

We can also use a more objective measure to identify if our treatment groups were properly randomized.



In [None]:
def objective_randomization(provided_data, variables):
    """
    
    Function to perform t-tests for given variables between two treatment groups in a DataFrame.
    Args:
        provided_data (pd.DataFrame): DataFrame containing the data.
        variables (list): List of variable names to perform t-tests on.
    
    Returns:
        pd.DataFrame: DataFrame containing t-statistics and p-values for each variable.

    
    """

    ttest_df = pd.DataFrame(columns=['variable','t-statistic','p-value'])
    ttest_df['variable'] = variables
    
    g1 = provided_data[provided_data['treatment']=='k1_8_lot_exp']
    g2 = provided_data[provided_data['treatment']=='k1_8_exp_lot']

    for var in ttest_df['variable']:
        arr1 = g1[var].dropna()
        arr2 = g2[var].dropna()
        t_stat, p_val = stats.ttest_ind(arr1, arr2, equal_var=False)
        ttest_df.loc[ttest_df['variable']==var, 't-statistic'] = round(t_stat, 2)
        ttest_df.loc[ttest_df['variable']==var, 'p-value']     = round(p_val, 2)

    ttest_df.set_index('variable', inplace=True)
    return ttest_df


Let's analyze the differences between the two treatment groups (k1_8_exp_lot and k1_8_lot_exp) for the female, age, and hispanic demographic variables by completing the following objective_randomization function. (4 points)

In [None]:
objective_randomization(provided_data=fpa_data,
                         variables=['age', 'gender'])


We would expect subjects to bid a certain fraction of their value in a first-price sealed bid auction depending on their risk attitudes (e.g., risk neutral, risk averse). Let's explore what effect gender has on bid-value ratios when controlled with risk. This time, let's calculate this average treatment effect using an ordinary least-squares regression.

In [None]:
def ols_riskgender_on_bidvalue(df,bid='b', val='v', female='female', ra='ra'):
    """
    Simple OLS regression to evaluate how subjects' risk attitudes and gender affect their bid/value ratio.
    inputs:
    df (pd.DataFrame): DataFrame containing the data.
    bid (str): Column name for bids.
    val (str): Column name for values
    female (str): Column name
    ra (str): Column name for risk attitudes
    returns:
    statsmodels summary object with OLS regression results.
    """
    df = df.copy()
    df.columns = df.columns.str.strip()
    df.columns = df.columns.str.replace(r"\s+", " ", regex=True)
    required = {bid, val, female, ra}
    # 1) Ensure required columns are present
    if not required.issubset(df.columns):
        missing = required - set(df.columns)
        raise ValueError(f"Missing required columns: {missing}")
    
    # 2) Drop any rows where those columns are NaN
    df = df.dropna(subset=list(required))

    # 3) Create bid/value ratio
    if 'bidval_ratio' not in df.columns:
        df = df.assign(bidval_ratio=lambda d: d[bid] / d[val])

    # 4) Build design matrix (with intercept) and response
    X = df[[female, ra]]
    X = sm.add_constant(X)
    y = df['bidval_ratio']

    # 5) Fit OLS and return the summary object
    model = sm.OLS(y, X).fit()
    return model.summary()

In [None]:
ols_riskgender_on_bidvalue(fpa_data)

Let's make a simple function used to extract the model's coefficients and p-values and return a presentable dataframe.


In [None]:
def ols_riskgender_on_bidvalue_df(data):
    """
    Extract model parameters and p-values from a statsmodels OLS regression result and return a presentable dataframe.
    inputs:
    data: the dataframe containing the data.
    returns:
    a dataframe with the model's parameters and p-values.
    """
    model = ols_riskgender_on_bidvalue(data)
    params = model.params
    pvals = model.pvalues
    results_df = pd.DataFrame({
        'variable': params.index,
        'coefficient': np.round(params, 2),
        'p-value':     np.round(pvals, 2)
    }).set_index('variable')
    return results_df

In [None]:
ols_riskgender_on_bidvalue_df(fpa_data)

Let's check what happens when we remove  the risk attitudes variable from the model, does it have a significant effect on how gender contributes to bid/value ratios

In [None]:
def ols_gender_on_bidvalue_df_no_ra(data, bid='b', value ='v',gender='female'):
    """
    Extract model parameters and p-values from a statsmodels OLS regression result and return a presentable dataframe.
    inputs:
    data: the dataframe containing the data.
    returns:
    a dataframe with the model's parameters and p-values.
    """
    df = data.copy()
    df['bidval_ratio'] = df[bid] / df[value] if 'bidval_ratio' not in df.columns else df['bidval_ratio']
    X = sm.add_constant(df[[gender]])
    y = df['bidval_ratio']
    model = sm.OLS(y, X).fit()
    params = model.params
    pvals = model.pvalues
    results_df = pd.DataFrame({
        'variable': params.index,
        'coefficient': np.round(params, 2),
        'p-value':     np.round(pvals, 2)
    }).set_index('variable')
    return results_df

In [None]:
ols_gender_on_bidvalue_df_no_ra(fpa_data)
