# Bruhin, Fehr, and Schunk, 2019, "Many Faces of Human Sociality: Uncovering the Distribution and Stability of Social Preferences", Table 1

#### Authors:  

- Massimiliano Pozzi (Bocconi University, pozzi.massimiliano@studbocconi.it)
- Salvatore Nunnari (Bocconi University, salvatore.nunnari@unibocconi.it)

#### Description:

The code in this Jupyter notebook performs the aggregate estimates to replicate Table 1

This notebook was tested with the following packages versions:
- Pozzi:   (Anaconda 4.10.3 on Windows 10 Pro) : python 3.8.3, numpy 1.18.5, pandas 1.0.5, scipy 1.5.0, numdifftools 0.9.40
- Nunnari: (Anaconda 4.10.1 on macOS 10.15.7): python 3.8.10, numpy 1.20.2, pandas 1.2.4, scipy 1.6.2, numdifftools 0.9.39

In [1]:
# Import the necessary libraries

import numpy as np
import pandas as pd
import scipy.optimize as opt
import numdifftools as nd
from scipy.stats import norm

## 1. Data Cleaning and Data Preparation

We import the relevant datasets containing the data on the 39 dictator games and 78 reciprocity games in Session 1 and Session 2. As the authors, we remove from these datasets the subjects who behaved very inconsistenly throughout the games. These subjects are identified from the individual estimates that we do not run in this notebook. The file "dropped_subjects_section4paragraph2.csv" contains the IDs of these subjects.

In [2]:
# Import the three datasets and drop the inconsistent subjects

dt1 = pd.read_csv('../input/choices_exp1.csv') # import data on session 1
dt2 = pd.read_csv('../input/choices_exp2.csv') # import data on session 2

# import data with ID of subjects to drop
data_drop = pd.read_csv('../input/dropped_subjects_section4paragraph2.csv', usecols=[0], names=['sid'], header=None) 

dt1=dt1[~dt1.sid.isin(data_drop.sid)] # drop the session 1 subjects whose IDs are listed in the data_drop dataframe (14 individuals)
dt2=dt2[~dt2.sid.isin(data_drop.sid)] # drop the session 2 subjects whose IDs are listed in the data_drop dataframe (14 individuals)

We now create indicators for allocation x or allocation y: indicators_x is made by the columns s_x, r_x, q, v of the dataframe dt1, indicators_y is made by the columns s_y, r_y, q, v.

In [3]:
# Create indicators_x and indicators_y. These are the columns s, r, q, v

indicators_x = np.column_stack((dt1.s_x,dt1.r_x,dt1.q,dt1.v))
indicators_y = np.column_stack((dt1.s_y,dt1.r_y,dt1.q,dt1.v))

## 2. Define the Model and the Likelihood (Sections 2 and 3 in Paper)

A generic player A's utility is given by:

$$ U^A = (1-\alpha s -\beta r - \gamma q - \delta v)⋅\Pi^A + (\alpha s +\beta r + \gamma q + \delta v)⋅\Pi^B $$ 

where &Pi;<sup>A</sup> represents player A's payoff, &Pi;<sup>B</sup> represents player B's payoff, s=1 if &Pi;<sup>A</sup>&lt; &Pi;<sup>B</sup> and 0 otherwise, r=1 if &Pi;<sup>A</sup>&gt; &Pi;<sup>B</sup> and 0 otherwise. For example &alpha; < 0  implies that the subject is behindness averse, while a value of &beta; > 0 implies that the subject is aheadness averse; q is an indicator that takes value one if B behaved kindly (capturing positive reciprocity), while v is an indicator that takes value one if B behaved unkindly (capturing negative reciprocity).

The authors model heterogeneity with a Random Utility Model, so that individual A prefers allocation X to allocation Y if and only if: 

$$ U^A(X_g;\theta)+\epsilon_{X_g} \geq U^A(Y_g;\theta)+\epsilon_{Y_g} $$ 

where X<sub>g</sub> is the allocation (&Pi;<sup>A</sup><sub>Xg</sub>, &Pi;<sup>A</sup><sub>Yg</sub>, r<sub>Xg</sub>, s<sub>Xg</sub>, q<sub>Xg</sub>, v<sub>Xg</sub>) in game g; &theta; is a vector containing the parameters &alpha;, &beta;, &gamma;, &delta;; and &epsilon;<sub>Xg</sub> is a random noise term which follows a type-1 extreme value distribution with the reciprocal of &sigma; as scale parameter. 

Thus, the probability that individual A chooses allocation X in game g is given by: 

$$ Pr(C_g= X_g ; \theta, \sigma, X_g, Y_g)=Pr(U^A(X_g;\theta)- U^A(Y_g;\theta) \geq \epsilon_{Y_g}-\epsilon_{X_g})=\frac{exp(\sigma U^A(X_g;\theta))}{exp(\sigma U^A(X_g;\theta))+exp(\sigma U^A(Y_g;\theta))} $$

We can then write subject is contribution to the total likelihood of observing the data given the parameters as follows: 

$$ f(\theta,\sigma;X,Y,C_i)=\prod_{g=1}^{G} Pr(C_g= X_g ; \theta, \sigma, X_g, Y_g)^{I(C_{ig}=X_g)}⋅Pr(C_g= Y_g ; \theta, \sigma, X_g, Y_g)^{1-I(C_{ig}=X_g)} $$

where G is the number of games each individual plays (117) and I is an indicator function that is equal to one if player A chooses allocation X. Our goal is to minimize the negative of the logarithm of the sum over individuals of the individual likelihood contribution written above.

In [4]:
# Define the function to minimize. This is the negative of the log likelihood of observing our data. 
# v is the vector of parameters (θ,σ)
# y is the choice of the player
# self_x is the payoff when choosing x (left)
# self_y is the payoff when choosing y (right)
# indicators are the ones explained above

def loglike_dummy(v,y,self_x, other_x, self_y,other_y,indicators_x,indicators_y):
    beta = v[0:-1]              # parameter vector θ. a 1x4 vector
    sigma = np.exp(v[-1])       # choice sensitivity σ. We take the exp since it can only take positive values
    
    lli = (indicators_x @ beta) # @ is matrix product. we obtain a 1x18720 vector. Each element is (αs+βr+γq+δv) for the single game
    rli = (indicators_y @ beta)
    uleft  = sigma*( (1-lli) * self_x + lli * other_x) # utility when choosing allocation X (left)
    uright = sigma*( (1-rli) * self_y + rli * other_y) # utility when choosing allocation Y (right)
    
    # probs is a 1x18720 vector containing the likelihood of each observation in the data (a single game for a generic player A) 
    probs  = (np.exp(uleft)/(np.exp(uleft)+np.exp(uright)))**y * (np.exp(uright)/(np.exp(uleft)+np.exp(uright)))**(1-y) 

    nll = - np.sum(np.log(probs)) # negative log-likelihood
    return nll

## 3. Estimation

### Point Estimates

We now estimate the model. First, we need to initialize a vector with the starting parameters for the minimization algorithm. We then minimize the negative log-likelihood function using the scipy.optimize package and the BFGS algorithm.

In [5]:
# initialize random starting guesses in an interval. 

beta_init = np.random.uniform(0.01,0.02,4) # close to zero for α, β, γ, δ
sigma_init = np.log(np.random.uniform(0.05,0.3,1)/np.mean([np.mean(dt1.self_x),np.mean(dt1.other_x),np.mean(dt1.self_y),
                                                           np.mean(dt1.other_y)])) # log since in function we take exp
v0 = [*beta_init,*sigma_init]

# opt.minimize takes as arguments the function to minimize, the initial guesses, other arguments of the function to minimize and
# the method used for minimization. We do not provide the analytical gradient since the one computed numerically by the algorithm
# is precise enough for this problem

import warnings
warnings.filterwarnings('ignore') # This is to avoid showing RuntimeWarning regarding overflow and divide by zero in the 
                                  # optimization routine. These warnings do not affect the results. We could avoid them by
                                  # adding checks on the values of uright, uleft and probs in the loglike_dummy function but 
                                  # that would only make the function more notationally heavy without other advantages. We are
                                  # using this command since we are sure of the results, we do not suggest suppressing errors otherwise.
            
sol = opt.minimize(loglike_dummy,v0,
                   args=(dt1['choice_x'],dt1['self_x'],dt1['other_x'],dt1['self_y'],dt1['other_y'],indicators_x,indicators_y),
                   method='BFGS')
res_s1 = sol.x # the vector containing our estimates for α, β, γ, δ, log(σ)
results_s1=[*res_s1[0:-1], np.exp(res_s1[-1])]

### Standard Errors

We now estimate individual cluster robust standard errors. These are computed by taking the square root of the diagonal elements of the following matrix: 

$$ Adj⋅(H^{-1} @ G @ H^{-1}) $$ 

Where Adj is an adjustment for the degree of freedoms and the number of clusters:

$$ Adj = \frac{Nr.observations-1}{Nr.observations-Nr.parameters}⋅\frac{Nr.clusters}{Nr.clusters-1} $$ 

H<sup>-1</sup> is the inverse of the hessian of the negative log-likelihood evaluated in the minimum (our estimates), @ stands for matrix multiplication, and G is a 5x5 matrix of gradient contributions. 

We denote the gradient of the log likelihood function for a generic individual i as follows:

$$  g_i(y|\theta) = [log f_i(y|\theta)]' = \frac{\partial}{\partial \theta} log f_i(y|\theta) $$

where &theta; is the parameters vector and f<sub>i</sub>(y|&theta;) the likelihood function. Then G is defined as follows:

$$ G = \sum_j \left[\sum_{i \in c_j}g_i(y|\hat{\theta})\right]^T\left[\sum_{i \in c_j}g_i(y|\hat{\theta})\right] $$

where J is the number of clusters (the number of unique individuals) and c<sub>j</sub> is a generic cluster j, that includes all observations for a specific individual. For more information on how to compute standard errors when using maximum likelihood, we refer the reader to David A. Freedman, 2006, ["On The So-Called 'Huber Sandwich Estimator' and 'Robust Standard Errors'"](https://snunnari.github.io/freedman.pdf), *The American Statistician*, 60:4, 299-302).

In [6]:
# Define the function that computes the matrix of individual gradient contribution G
# Variables have the same meanings as before. clusters is the column containing the individual ids (dt1.sid)
# We transform all arguments into np.arrays to avoid mismatches in how matrices are represented between arrays and dataframes
# Be careful on the dimensions of the vectors/matrices and how Python does/does not do broadcast 

def congradloglike(v,y,self_x,other_x,self_y,other_y,indicators_x,indicators_y,clusters):
    y = np.array(y)
    self_x = np.array(self_x)
    other_x = np.array(other_x)
    self_y = np.array(self_y)
    other_y = np.array(other_y)
    indicators_x = np.array(indicators_x)
    indicators_y = np.array(indicators_y)
    clusters = np.array(clusters)
    
    # similar to the loglike_dummy function
    beta = v[0:-1]
    sigma = np.exp(v[-1])
    lli = (indicators_x @ beta)
    rli = (indicators_y @ beta)
    utl  = sigma*( (1-lli) * self_x + lli * other_x)
    utr = sigma*( (1-rli) * self_y + rli * other_y)
    probs  = (np.exp(utl)/(np.exp(utl)+np.exp(utr)))**y * (np.exp(utr)/(np.exp(utl)+np.exp(utr)))**(1-y)
    
    # compute the analytical gradient
    probsm = ((np.ones((len(indicators_x), len(indicators_x[0])))).T * probs).T
    u = ((np.ones((len(indicators_x), len(indicators_x[0])))).T * (np.exp(utl))).T
    w = ((np.ones((len(indicators_x), len(indicators_x[0])))).T * (np.exp(utl) + np.exp(utr))).T
    up = sigma * indicators_x * (((np.ones((len(indicators_x), len(indicators_x[0])))).T * (other_x - self_x))).T * u
    wp = up + sigma * indicators_y * ((np.ones((len(indicators_x), len(indicators_x[0])))).T * ((other_y-self_y) * np.exp(utr))).T
    bgradi = ((-1)**(1-y)* ((up*w - u*wp)/w**2).T).T /probsm
    up2 = utl * u[:,0]
    wp2 = up2 + utr * np.exp(utr)
    ggradi = (-1)**(1-y) * (((up2*w[:,0]-u[:,0]*wp2) /w[:,0]**2)/probs)
    
    gradi = np.column_stack((bgradi,ggradi)) # 18720x5 matrix of partial derivatives for each obs
    
    cl = np.unique(clusters) 
    j = len(cl) # nr of clusters is length of unique individual = 160
    k = len(gradi[0]) # nr of columns in gradi = 5
    sandwich = np.zeros((k,k))
    for i in range(0,j): # sum columns with the same individual id
        sel = [m for m in range(len(clusters)) if clusters[m]==cl[i]] # indices of individual i 
        gradsel = np.sum(gradi[sel,:],axis=0) # Sum column by column gradient on single obs. We obtain a 5x1 vector
        sandwich += np.outer(gradsel,gradsel)
    return sandwich

In [7]:
# Compute the individual cluster robust standard errors.

hess_fun = nd.Hessian(loglike_dummy) # Function computing the numerical hessian. We use the numdifftools package.
inv_hess = np.linalg.inv(hess_fun(res_s1,dt1.choice_x,dt1.self_x,dt1.other_x,dt1.self_y,dt1.other_y,indicators_x,indicators_y)) # inverse of the hessian 

adj = (len(dt1.self_x)-1) / (len(dt1.self_x)-len(res_s1)) * (len(np.unique(dt1.sid))/(len(np.unique(dt1.sid))-1)) # degree of freedom and nr. of clusters adjustment

grad_contribution = congradloglike(res_s1,dt1.choice_x,dt1.self_x,dt1.other_x,dt1.self_y,dt1.other_y,indicators_x,indicators_y,dt1.sid)

varcov_s1 = adj * (inv_hess @ grad_contribution @ inv_hess) # var-cov of our estimates
se_s1 = np.sqrt(np.diag(varcov_s1)) # standard errors for α, β, γ, δ, log(σ)
se_s1 = [*se_s1[0:-1],np.sqrt(np.exp(res_s1[-1])**2*se_s1[-1]**2)] # use delta method to retrieve standard error for σ

In [8]:
# We do the same for Session 2

# Create indicators for the second dataframe
indicators_x2 = np.column_stack((dt2.s_x,dt2.r_x,dt2.q,dt2.v))
indicators_y2 = np.column_stack((dt2.s_y,dt2.r_y,dt2.q,dt2.v))

# Compute the estimates
sol2 = opt.minimize(loglike_dummy,v0,
                   args=(dt2['choice_x'],dt2['self_x'],dt2['other_x'],dt2['self_y'],dt2['other_y'],indicators_x2,indicators_y2),
                   method='BFGS')
res_s2 = sol2.x # the vector containing our estimates for α, β, γ, δ, log(σ)
results_s2=[*res_s2[0:-1], np.exp(res_s2[-1])]

# Compute the individual cluster robust standard errors
inv_hess2 = np.linalg.inv(hess_fun(res_s2,dt2.choice_x,dt2.self_x,dt2.other_x,dt2.self_y,dt2.other_y,indicators_x2,indicators_y2)) # inverse of the hessian 
adj = (len(dt2.self_x)-1) / (len(dt2.self_x)-len(res_s2)) * (len(np.unique(dt2.sid))/(len(np.unique(dt2.sid))-1)) # degree of freedom adjustment
grad_contribution2 = congradloglike(res_s2,dt2.choice_x,dt2.self_x,dt2.other_x,dt2.self_y,dt2.other_y,indicators_x2,indicators_y2,dt2.sid)

varcov_s2 = adj * (inv_hess2 @ grad_contribution2 @ inv_hess2) # var-cov of our estimates
se_s2 = np.sqrt(np.diag(varcov_s2)) # standard errors for α, β, γ, δ, log(σ)
se_s2 = [*se_s2[0:-1],np.sqrt(np.exp(res_s2[-1])**2*se_s2[-1]**2)] # use delta method to retrieve standard error for σ

### Hypothesis Testing

We now do some hypothesis testing on the parameters we obtained for session 1 and session 2. We first compute the z-test statistics and the corresponding p-values to check if each parameter we obtained is statistically different from zero. We then compute the p-value of a z-test to check if the parameters we obtained in session 1 are statistically different from the parameters we obtained in session 2.

In [9]:
# Compute the z-test statistics and the corresponding p-values to check if the parameters are statistically different from zero

# Session 1. np.array since it supports element-wise operations

zvalues_s1 = np.array(results_s1)/np.array(se_s1)
pvalues_s1 = 2*(1-norm.cdf(np.abs(zvalues_s1),0,1))

# Session 2

zvalues_s2 = np.array(results_s2)/np.array(se_s2)
pvalues_s2 = 2*(1-norm.cdf(np.abs(zvalues_s2),0,1))

# Check if parameters obtained in session 1 are statistically different from parameters in session 2

# First we need the variance for the parameters in session 1 and session2

var_s1 = np.array(se_s1)**2
var_s2 = np.array(se_s2)**2

# Now we compute the p-values of the z-test statistics 

zvalues_s1s2 = (np.abs(np.array(results_s1)-np.array(results_s2))) / np.sqrt(var_s1+var_s2)
pvalues_s1s2 = 2*(1-norm.cdf(zvalues_s1s2,0,1))

## 4. Print and Save Estimation Results

We create a table with point estimates, individual cluster robust standard errors, z-stat, p-stat for Session 1 and Session 2 and the p-values for the hypothesis that parameters in Session 1 are not statistically different from parameters in Session 2. We then save the results as a csv file in the output folder and print the results. This replicates Table 1 in the paper.

In [10]:
# Create a new DataFrame with the results and save it as a csv file in output. We round the results up to the 3rd decimal.

parameters_name = ["α: Weight on other's payoff when behind",
                   "β: Weight on other's payoff when ahead",
                   "γ: Measure of positive reciprocity",
                   "δ: Measure of negative reciprocity",
                   "σ: Choice sensitivity"]

Table_1 = pd.DataFrame({'parameters':parameters_name,'estimates_s1':np.round(results_s1,3),'standarderr_s1':np.round(se_s1,3),
                        'z-stat_s1':np.round(zvalues_s1,3),'p-val_s1':np.round(pvalues_s1,3),'estimates_s2':np.round(results_s2,3),
                        'standarderr_s2':np.round(se_s2,3),'z-stat_s2':np.round(zvalues_s2,3),'p-val_s2':np.round(pvalues_s2,3),
                        'p-val_s1s2':np.round(pvalues_s1s2,3)})

Table_1.to_csv('../output/table1_python.csv')

In [11]:
# Print the results

from IPython.display import display

# Create a table for session 1 and session 2. The last column in each table is the p-value of z-test with H0: session 1 = session 2

print("Table 1. Estimated preferences of the representative agent in session 1:")
table_s1 = pd.DataFrame({'parameters':parameters_name,'estimates_s1':np.round(results_s1,3),'standarderr_s1':np.round(se_s1,3),
                        'z-stat_s1':np.round(zvalues_s1,3),'p-val_s1':np.round(pvalues_s1,3),'p-val_s1s2':np.round(pvalues_s1s2,3)})

display(table_s1)
print("Number of observations:","{:,}".format(len(dt1.sid)))
print("Number of subjects:","{:,}".format(len(np.unique(dt1.sid))))
print("Log likelihood:","{:,.2f}".format(-sol.fun))

print("")
print("Table 1. Estimated preferences of the representative agent in session 2:")
table_s1 = pd.DataFrame({'parameters':parameters_name,'estimates_s2':np.round(results_s2,3),'standarderr_s2':np.round(se_s2,3),
                         'z-stat_s2':np.round(zvalues_s2,3),'p-val_s2':np.round(pvalues_s2,3),'p-val_s1s2':np.round(pvalues_s1s2,3)})
display(table_s1)
print("Number of observations:","{:,}".format(len(dt2.sid)))
print("Number of subjects:","{:,}".format(len(np.unique(dt2.sid))))
print("Log likelihood:","{:,.2f}".format(-sol2.fun))

Table 1. Estimated preferences of the representative agent in session 1:


Unnamed: 0,parameters,estimates_s1,standarderr_s1,z-stat_s1,p-val_s1,p-val_s1s2
0,α: Weight on other's payoff when behind,0.083,0.015,5.635,0.0,0.468
1,β: Weight on other's payoff when ahead,0.261,0.019,13.868,0.0,0.551
2,γ: Measure of positive reciprocity,0.072,0.014,5.311,0.0,0.01
3,δ: Measure of negative reciprocity,-0.042,0.011,-3.687,0.0,0.919
4,σ: Choice sensitivity,0.016,0.001,21.172,0.0,0.006


Number of observations: 18,720
Number of subjects: 160
Log likelihood: -5,472.31

Table 1. Estimated preferences of the representative agent in session 2:


Unnamed: 0,parameters,estimates_s2,standarderr_s2,z-stat_s2,p-val_s2,p-val_s1s2
0,α: Weight on other's payoff when behind,0.098,0.013,7.659,0.0,0.468
1,β: Weight on other's payoff when ahead,0.245,0.019,13.216,0.0,0.551
2,γ: Measure of positive reciprocity,0.029,0.01,3.014,0.003,0.01
3,δ: Measure of negative reciprocity,-0.043,0.008,-5.101,0.0,0.919
4,σ: Choice sensitivity,0.019,0.001,20.172,0.0,0.006


Number of observations: 18,720
Number of subjects: 160
Log likelihood: -4,540.74
