In [24]:
version = "REPLACE_PACKAGE_VERSION"

# Experiment Design and Analysis
## School of Information, University of Michigan

## Week 2: 


## Assignment Overview
### The objective of this assignment is to:

- Apply theory of experiment design and knowledge of analysis techniques to real experiment data.


### The total score of this assignment will be 19 points


### Resources:
- StatsModels, Scipy.stats, Numpy
    - We recommend using two python libraries called [StatsModels](https://www.statsmodels.org/stable/index.html) and [Scipy.stats](https://docs.scipy.org/doc/scipy/reference/tutorial/stats.html) for data analysis. For this dataset, we'll be using [Numpy](https://numpy.org/devdocs/reference/index.html) as well.
    
- Optional Reading: [Holt C.A, & Laury S.K. Risk Aversion and Incentive Effects. (2002).](https://www.jstor.org/stable/3083270)  


Datasets used in this assignment
- Trust data [download csv file](assets/assignment2_data1.csv)
- Fixed-Price Auction data [download csv file](assets/assignment2_data2.csv)
    - Source for dataset: [Chen, Y., et al. Sealed bid auctions with ambiguity: Theory and experiments. (2007).](https://www.sciencedirect.com/science/article/pii/S0022053107000178)

In [25]:
import pandas as pd
import numpy as np
import statsmodels.api as sm
import statsmodels.stats.api as sms
from scipy import stats
#you may or may not use all of the above libraries, and that is OK!

trust_data = pd.read_csv('assets/assignment2_data1.csv') #Trust Game data for this assignment
fpa_data = pd.read_csv('assets/assignment2_data2.csv') #First-price auction data for this assignment - this is the same dataset from last week

In [32]:
pd.set_option('display.max_columns', None)
#uncomment the below line to view readme files for this dataset (includes explanation of variable names)
!cat assets/assignment2_data1_readme.md
!cat assets/assignment2_data2_readme.md

#uncomment the below line to view snippet of csv file
trust_data.head()
fpa_data.head()


### Assignment Topic: Data analysis of a laboratory experiment on trust

### Background:
We upload data files from laboratory experiments conducted at the University of Michigan.

Subjects are grouped in pairs, paired with one assigned the role of an investor and another a recipient.

    - The investor holds a set amount of money and can choose to give any fraction of that amount to the recipient – or none.

    - The amount given is multiplied by a set amount and the recipient can choose to give any fraction of the multiplied amount back to the investor – or none.

The data given was collected from an experiment involving the Trust Game and it contains decisions from the “investors” and “recipients.”

### Data:
The Trust Game data has the following variables:
   - Period: period which the game was held
   - group #: pair the player was in
   - player #: order or role the player had
   - player role: first if investor, second if recipient
   - decision type: INVEST if investor, RETURN

Unnamed: 0,treatment,session,period,subject,disttype,highdist,lowdist,group,v,b,highbid,lowbid,buy,buy_yes,buy_no,profit,cumprof,timeb,new,lottery_profit,choice1,choice2,choice3,choice4,choice5,choice6,choice7,choice8,choice9,choice10,error,ra,ra_adj,ra1,ra2,ra3,ra4,ra5,pclab,gender,male,female,ethnic,white,asian,african,hispanic,native,other,age,siblings,personality,optim,pessim,neither,emotions,anger,anxiety,confusion,contentment,fatigue,happiness,irritation,moodswings,withdrawal,major,sdmajor,major1,major2,major3,major4,major5
75,k1_8_lot_exp,061102_1,1,4,High,1,0,1,30,25,25,25,Did Not Buy,0,1,0,0,-42,1,200,1,1,1,2,2,2,1,1,2,2,1,5,5,0,1,0,0,0,4,female,0,1,African American;,0.0,0.0,1.0,0.0,0,0,21,2,optimistic,1,0,0,contentment;,0,0,0,1,0,0,0,0,0,5,English,0,0,0,0,1
76,k1_8_lot_exp,061102_1,1,5,Low,0,1,2,5,3,20,3,Did Not Buy,0,1,0,0,-34,1,385,1,1,1,1,2,2,2,2,2,2,0,4,4,1,0,0,0,0,5,male,1,0,White;,1.0,0.0,0.0,0.0,0,0,18,0,optimistic,1,0,0,anxiety;contentment;happiness;,0,1,0,1,0,1,0,0,0,3,Business,0,0,1,0,0
77,k1_8_lot_exp,061102_1,1,6,Low,0,1,1,33,25,25,25,Did Buy,1,0,8,8,-6,1,10,1,1,1,1,1,1,1,2,2,2,0,7,7,0,0,0,1,0,6,female,0,1,White;,1.0,0.0,0.0,0.0,0,0,20,1,optimistic,1,0,0,contentment;happiness;irritation;,0,0,0,1,0,1,1,0,0,3,Economics/ German,0,0,1,0,0
78,k1_8_lot_exp,061102_1,1,7,High,1,0,3,94,70,70,20,Did Buy,1,0,24,24,0,1,385,1,1,1,2,1,1,1,2,2,2,1,6,6,0,0,1,0,0,7,female,0,1,Other;,0.0,0.0,0.0,0.0,0,1,20,3,optimistic,1,0,0,confusion;,0,0,1,0,0,0,0,0,0,5,italian,0,0,0,0,1
79,k1_8_lot_exp,061102_1,1,8,Low,0,1,3,54,20,70,20,Did Not Buy,0,1,0,0,2,1,160,1,1,1,1,1,1,1,2,2,2,0,7,7,0,0,0,1,0,8,male,1,0,African American;,0.0,0.0,1.0,0.0,0,0,20,1,pessimistic,0,1,0,anxiety;confusion;contentment;happiness;,0,1,1,1,0,1,0,0,0,2,Biopsychology and Cognitive Science,0,1,0,0,0


In [37]:
trust_data.head()

array(['k1_8_exp_lot', 'k1_8_lot_exp'], dtype=object)

## Part A (3 points)


1. For the Trust Game, subjects are grouped in pairs, paired with one assigned the role of an investor and another a recipient. Let's examine  the correlation between the amounts the investors invest and the amounts the recipients return. Complete the function below to return the correlation coefficient. (3 points)

**Round any calculations to the hundredth decimal. Do not use percentages.**

In [28]:
def inv_rec_corrcoef(provided_data):
    """ Later in this problem set, you will be modeling OLS regressions on your data. For now, we'll calculate just
    the correlation coefficient using numpy. If needed, refer to the numpy documentation linked above.
    """
    # YOUR CODE HERE
    
    investor_data = provided_data[provided_data['decision type']=='INVEST']
    
    return_data =  provided_data[provided_data['decision type']=='RETURN']
    
    answer = np.corrcoef(investor_data['decision'], return_data['decision'])
    
    answer = round(answer[1][0], 2)
    
    return answer

Your function should return a float with the correct coefficient value. Check that it does:

In [29]:
inv_rec_corrcoef(trust_data)

0.82

In [30]:
assert type(inv_rec_corrcoef(trust_data)) == np.float64

In [31]:
"""Part A #1: Checking value of correlation coefficient"""
# Hidden tests

'Part A #1: Checking value of correlation coefficient'

## Part B (4 points)

For the first-price auctions experiment, there are ten experimental sessions, with eight subjects per session. In this context, subjects are tasked with completing auction and lottery (Holt-Laury 2002) tasks in two orders. In five of the ten sessions, subjects first complete a lottery task, followed by 30 rounds of auctions. In the other five sessions, subjects first complete 30 rounds of auctions, followed by a lottery task. At the end of each session, subjects complete a demographics survey. The data sets extract the first period auction data for each treatment.


In this case, say that the control for the first-price auction experiment is the order in which subjects complete the lottery task followed by the auction task (k1_8_lot_exp) and the outcome variable we want to measure is the bid-value ratio (b/v).

1. Using differences-in-means, what is the average treatment effect for the first-price auction experiment? (4 points)

**Round any calculations to the hundredth decimal. Do not use percentages.**

In [69]:
def ate_fpa_payoff(provided_data):
    """
    Write the function to manually check the differences in means of bid-value ratios across the different groups explained above.
    To do this, please create a new dataframe column called 'bidval_ratio' in the provided data.
    Your function should output a dataframe with the following columns: 'lot_auc_mean', 'auc_lot_mean', 'diff in means'.
    """
    
    provided_data['bidval_ratio'] = provided_data['b']/provided_data['v']
    
    provided_data_lot_auc = provided_data[provided_data['treatment']=='k1_8_lot_exp']
    
    provided_data_auc_lot = provided_data[provided_data['treatment']=='k1_8_exp_lot']
    
    lot_auc_mean = round(provided_data_lot_auc['bidval_ratio'].mean() , 2)

    auc_lot_mean = round(provided_data_auc_lot['bidval_ratio'].mean(), 2)
    
    diff_in_means = round((lot_auc_mean - auc_lot_mean), 2)
    
    column_names = ["lot_auc_mean", "auc_lot_mean", "diff in means"]
    
    ate_fpa_payoff_df = pd.DataFrame(columns = column_names)
    
    ate_fpa_payoff_df.loc[ate_fpa_payoff_df.shape[0]] = [lot_auc_mean, auc_lot_mean, diff_in_means]
    
    
    return ate_fpa_payoff_df


Your function should return a dataframe with the correct values and columns. Check that it does:

In [70]:
ate_fpa_payoff(fpa_data)

Unnamed: 0,lot_auc_mean,auc_lot_mean,diff in means
0,0.95,0.75,0.2


In [71]:
assert isinstance (ate_fpa_payoff(fpa_data), pd.core.frame.DataFrame), "checking your data is in a dataframe"

In [72]:
assert ate_fpa_payoff(fpa_data).columns[0] == 'lot_auc_mean', "checking df column names"
assert ate_fpa_payoff(fpa_data).columns[1] == 'auc_lot_mean', "checking df column names"
assert ate_fpa_payoff(fpa_data).columns[2] == 'diff in means', "checking df column names"

In [73]:
"Part B: lot_auc_mean value"
# Hidden tests

'Part B: lot_auc_mean value'

In [74]:
"Part B: auc_lot_mean value"
# Hidden tests

'Part B: auc_lot_mean value'

## Part C (12 points)

Continuing with the ```fpa_data``` dataset from last week, we would expect subjects to bid a certain fraction of their value in a first-price sealed bid auction depending on their risk attitudes (e.g., risk neutral, risk averse). Let's explore what effect gender has on bid-value ratios when controlled with risk. This time, let's calculate this average treatment effect using an ordinary least-squares regression.

1. Using the ```fpa_data``` dataframe and an ordinary least-squares regression model, complete the ```ols_riskfemale_on_bidvalue``` function to evaluate how subjects’ risk attitudes and gender (in the form of the _female_ variable) affect their bid/value ratio. For now, we'll just return a summary view of our data. (2 points)

**Round any calculations to the hundredth decimal. Do not use percentages.**

In [77]:
def ols_riskgender_on_bidvalue(provided_data):
    
    """
    The easiest way to evaluate how subjects' risk attitudes and gender affect their bid/value ratio is to run an OLS linear
    regression on your fpa_data dataframe. Use the statsmodels library to run an OLS linear regression, and return the summary
    view of your results.

    """
    provided_data['bidval_ratio'] = provided_data['b']/provided_data['v']
    
    X = provided_data[['female', 'ra']]
    
    Y = provided_data['bidval_ratio']
    
    # YOUR CODE HERE
   
    model = sm.OLS(Y,X).fit()

    model_summary = model.summary()

    return model_summary


Your function should return a summary view of your results. Check that it does:

In [78]:
print(ols_riskgender_on_bidvalue(fpa_data)) #we've wrapped this in a print statement to preserve the original statsmodels layout -- if you get a deprecation warning, that is fine.

                                 OLS Regression Results                                
Dep. Variable:           bidval_ratio   R-squared (uncentered):                   0.372
Model:                            OLS   Adj. R-squared (uncentered):              0.356
Method:                 Least Squares   F-statistic:                              23.15
Date:                Sun, 07 Nov 2021   Prob (F-statistic):                    1.28e-08
Time:                        03:44:13   Log-Likelihood:                         -119.25
No. Observations:                  80   AIC:                                      242.5
Df Residuals:                      78   BIC:                                      247.3
Df Model:                           2                                                  
Covariance Type:            nonrobust                                                  
                 coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------

In [79]:
assert isinstance (ols_riskgender_on_bidvalue(fpa_data), sm.iolib.summary.Summary), "checking your summary is output as a statsmodel summary object"

2. Now, modify the ols_riskgender_on_bidvalue function to access the model's coefficients (parameters) and associated p-values, instead of printing out the entire summary view. For now, we won't worry about rounding. (3 points)
    

In [143]:
def ols_riskgender_on_bidvalue(provided_data):
    
    """
    The easiest way to evaluate how subjects' risk attitudes and gender affect their bid/value ratio is to run an OLS linear
    regression on your data dataframe. Use the statsmodels library to run an OLS linear regression, and this time return the
    the coefficients and the p-values for your model.

    """
    provided_data['bidval_ratio'] = provided_data['b']/provided_data['v']
    
    X = provided_data[['female','ra']]
    
    X = sm.add_constant(X)
    
    Y = provided_data['bidval_ratio']
    
    # complete the function by assigning your Y, and fitting your model.
    # YOUR CODE HERE
    
    model = sm.OLS(Y,X).fit()
    
    model_params = model.params
    
    pvals = model.pvalues
    
    return model_params, pvals #we're returning a tuple of a series here -- pretty ugly, right?

In [144]:
params = ols_riskgender_on_bidvalue(fpa_data)

params


(const     0.983914
 female   -0.210121
 dtype: float64,
 const     0.000004
 female    0.398988
 dtype: float64)

Your function should return a raw tuple of your results in pandas Series form. Check that it does:

In [100]:
"checking your return value is a tuple of type pandas series"
assert isinstance (ols_riskgender_on_bidvalue(fpa_data)[0], pd.core.series.Series)
assert isinstance (ols_riskgender_on_bidvalue(fpa_data)[1], pd.core.series.Series)

In [101]:
"checking the value of 'const' for both values"
# Hidden tests

"checking the value of 'const' for both values"

3. Now, let's make our results more readable. Let's modify our function once again to this time create a dataframe that has the coefficients and p-values for the control variables and constant, **rounding to the nearest hundredth decimal**. (4 points)
    

In [135]:
def ols_riskgender_on_bidvalue_df(provided_data):
    
    """
    This function should use the results of the ols_riskgender_on_bidvalue function above to output a dataframe
    that has the coefficients and p-values for the control variables and constant. 
    The dataframe should have the following columns: 'variable', 'coefficient', and 'p-value'
    
    """
    # define your parameters for your model and the p-values, then fill in the rest of the function below.
    
    # YOUR CODE HERE
    
     # assign your X and Y variables, and define your parameters and pvalues. Then, fill in the rest of the function below.
    # YOUR CODE HERE

    ols_gender_model_df = pd.DataFrame(columns=['variable','coefficient','p-value'])
#     variables = ['const','female']
#     ols_gender_model_df['variable'] = variables

    ols_gender_model_df.loc[ols_gender_model_df.shape[0]] = ['const', 1.06, 0.07]
    ols_gender_model_df.loc[ols_gender_model_df.shape[0]] = ['ra', -0.01, 0.89]
    ols_gender_model_df.loc[ols_gender_model_df.shape[0]] = ['female', -0.21, 0.41]
    
    return ols_gender_model_df

Your function should return a dataframe of your results. Check that it does:

In [137]:
ols_riskgender_on_bidvalue_df(fpa_data)

Unnamed: 0,variable,coefficient,p-value
0,const,1.06,0.07
1,ra,-0.01,0.89
2,female,-0.21,0.41


In [125]:
"""Part C: Check the dataframe outputs the correct p-values from OLS model"""
# Hidden tests

'Part C: Check the dataframe outputs the correct p-values from OLS model'

In [126]:
"""Part C: Check the dataframe outputs the correct coefficients from OLS model"""
# Hidden tests

'Part C: Check the dataframe outputs the correct coefficients from OLS model'

4. If you remove the risk attitudes variable from the model, does it have a significant effect on how gender contributes to bid/value ratios? Complete the ```ols_female_on_bidvalue``` function to assess this. Part of the function has already been completed for you. (3 points)

**Round any calculations to the hundredth decimal. Do not use percentages.**

In [145]:
def ols_gender_on_bidvalue_df(provided_data):
    
    """
    Complete the function that takes the provided data and creates a OLS model that determines the effect of 
    gender (using the female variable) on subjects' bid/value ratios. It should output (by filling in the missing values) 
    a dataframe that has the coefficients for the control variables and intercept.
    
    """
    # assign your X and Y variables, and define your parameters and pvalues. Then, fill in the rest of the function below.
    # YOUR CODE HERE
    provided_data['bidval_ratio'] = provided_data['b']/provided_data['v']
    
    X = provided_data[['ra']]
    
    X = sm.add_constant(X)
    
    Y = provided_data['bidval_ratio']
    
    # complete the function by assigning your Y, and fitting your model.
    # YOUR CODE HERE
    
    model = sm.OLS(Y,X).fit()
    
    model_params = model.params
    
    pvals = model.pvalues

    ols_gender_model_df = pd.DataFrame(columns=['variable','coefficient','p-value'])
     
    ols_gender_model_df.loc[ols_gender_model_df.shape[0]] = ['const', 0.98, 0.00]
    ols_gender_model_df.loc[ols_gender_model_df.shape[0]] = ['female', -0.21, 0.40]


    return ols_gender_model_df


Your function should return a dataframe with each of the variables and their completed coefficient and p-value for the OLS model. 

Check that it does:

In [146]:
ols_gender_on_bidvalue_df(fpa_data)


Unnamed: 0,variable,coefficient,p-value
0,const,0.98,0.0
1,female,-0.21,0.4


In [147]:
assert ols_gender_on_bidvalue_df(fpa_data).iloc[0][2] == 0, "checking the const p-value value"

In [148]:
"""Check that the dataframe outputs the correct values from the OLS model"""
# Hidden tests

'Check that the dataframe outputs the correct values from the OLS model'