# Randomization inference for spillovers in networks

This uses data from:
Cai, Jing, Alain De Janvry, and Elisabeth Sadoulet. 2015. "Social Networks and the Decision to Insure." American Economic Journal: Applied Economics, 7(2): 81-108.
https://www.aeaweb.org/articles.php?doi=10.1257/app.20130442

This paper examines spillover effects in rural Chinese farmers being encouraged to sign up for insurance. Households were randomly assigned to different periods in which to be encouraged to get insurance and whether that encouragement was 'intensive'.

"The social network survey asked household heads to list five close friends, either within or outside the village, with whom they most frequently discuss rice production or financial issues. Respondents were asked to rank these friends based on which one would be consulted first, second, etc."

We are essentially re-doing Table 2 column 2 (there are some minor differences because of how we have simplified things a bit).


In [1]:
options(digits = 3)
library(ggplot2)
theme_set(theme_bw())
options(repr.plot.width = 6)
options(repr.plot.height = 4)

library(icsw)
library(foreach)
library(Matrix)
library(lfe)

In [2]:
cai <- read.table("cai_data/cai.main.tsv", sep = "\t")

In [3]:
head(cai)

Unnamed: 0,id,address,region,village,takeup_survey,age,male,delay,intensive,info_none,intensive.nondelay.peers,n.peers
1,1111385,fusheng67,1,fusheng,1,37,1,0,1,1,0,4
2,1111035,fusheng21,1,fusheng,1,60,1,0,1,1,0,0
3,1111363,fusheng5,1,fusheng,0,56,1,0,0,1,0,3
4,1111042,fusheng21,1,fusheng,1,57,1,0,0,1,1,1
5,1111045,fusheng21,1,fusheng,1,45,1,1,0,1,2,4
6,1111038,fusheng21,1,fusheng,1,61,1,1,1,1,0,4


The main outcome is whether they sign for insurance `takeup_survey`.

In [4]:
load("cai_data/cai.adjacency.RData")
A[1:10,1:10]

   [[ suppressing 10 column names ‘1111385’, ‘1111035’, ‘1111363’ ... ]]
   [[ suppressing 10 column names ‘1111385’, ‘1111035’, ‘1111363’ ... ]]


10 x 10 sparse Matrix of class "dgCMatrix"
                           
1111385 . . . . . . . . . .
1111035 . . . . . . . . . .
1111363 . . . . . . . . . .
1111042 . . . . . . . . . .
1111045 . . . . . . . 1 . .
1111038 . . . . . . . 1 1 .
1111034 . . . . . . . . . 1
1111055 . . . . 1 . . . . .
1111050 . . . . 1 1 . 1 . 1
1111031 . . . . 1 . . 1 . .

Now let's estimate the relationship between how many peers were given the strong encouragement to sign up for insurance in the prior period, `intensive.nondelay.peers` and the outcome `takeup_survey`. We will only do this for egos who didn't get the treatment in the prior period and who didn't receive information, as part of their treatment, about the adoption rates in their area. (This is also what the paper does.)

Now we can get a point estimate for the effects of peer treatments:

In [6]:
lm.1 <- lm(
    takeup_survey ~ intensive + I(intensive.nondelay.peers/n.peers) + factor(n.peers),
    data = cai
)
summary(lm.1)


Call:
lm(formula = takeup_survey ~ intensive + I(intensive.nondelay.peers/n.peers) + 
    factor(n.peers), data = cai)

Residuals:
   Min     1Q Median     3Q    Max 
-0.576 -0.451 -0.373  0.533  0.688 

Coefficients:
                                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)                           0.3116     0.0340    9.16  < 2e-16 ***
intensive                             0.0779     0.0148    5.26  1.5e-07 ***
I(intensive.nondelay.peers/n.peers)   0.0508     0.0295    1.72  0.08469 .  
factor(n.peers)2                      0.0563     0.0389    1.45  0.14796    
factor(n.peers)3                      0.0610     0.0361    1.69  0.09071 .  
factor(n.peers)4                      0.0833     0.0350    2.38  0.01746 *  
factor(n.peers)5                      0.1356     0.0354    3.83  0.00013 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.494 on 4514 degrees of freedom
  (256 observations deleted due to miss

There could be something here. We see both an effect of one's own treatment, and an effect of the fraction of peers treated in the early period.

Note that depending on your number of peers, the fraction of treated peers can only take on some values. This violates the 'positivity' support assumption for causal inference. Even if this worked with the number of treated peers, and considered only 0 or more than 0 treated peers, the propensity scores would be heterogeneous. The author attempt to deal with this by adding the indicators for each number of friends.

Make a function that, given the data (or permuted data) computes our test statistic -- the regression coefficient from above. We can see that it gives the same results as before:

In [49]:
peer.regression.coef <- function(z, z.peers, n.peers, y) {
  coef(lm(y ~ z + I(z.peers / n.peers) + factor(n.peers)))[3]
}

obs.coef <- with(
  cai,
  peer.regression.coef(intensive, intensive.nondelay.peers, n.peers, takeup_survey)
)
print(obs.coef)

I(z.peers/n.peers) 
            0.0508 


Now we write a function to do the focal-auxillary permuation and compute the test statistic for each permutation.

In [45]:
do.focal.aux.permutation <- function(adj.mat, z, n.peers, y,
                           is.focal, R = 1e3,
                           fnc = peer.regression.coef) {
  foreach(i = 1:R, .combine = 'c') %do% {
    zp <- z
    zp[!is.focal] <- sample(z[!is.focal]) # permute treatments for auxillary vertices only
    zp.peers <- as.vector(adj.mat %*% zp) # re-compute number of peers treated
    fnc(z[is.focal], zp.peers[is.focal], n.peers[is.focal], y[is.focal])
  }
}


We can now call this function to draw from the distribution of the test statistic under the null of no spillovers (but possible direct effects).

Just for illustration, let's start by just selecting a random 2000 units as focal units.

Actually in the paper, they mainly don't focus on contemporaneous influence. Rather the authors look for effects of the assignment of peers treated in period 1 on egos only assigned in round 2.

In [60]:

cai$is.focal <- sample(c(rep(TRUE, 2000), rep(FALSE, nrow(cai) - 2000)))

null.coefs <- do.focal.aux.permutation(
  A,
  cai$intensive,
  cai$n.peers, cai$takeup_survey,
  cai$is.focal,
  R = 1e3
)

summary(null.coefs)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-0.0744  0.0161  0.0353  0.0366  0.0562  0.1420 

In [62]:
obs.coef <- with(
  subset(cai, is.focal),
  peer.regression.coef(intensive, intensive.nondelay.peers, n.peers, takeup_survey)
)
print(obs.coef)

I(z.peers/n.peers) 
             0.076 


In [63]:
two.sided.p.value.perm <- function(obs, null.draws) {
    lower.p <- mean(obs > null.draws)
    upper.p <- mean(obs < null.draws)
    2 * min(lower.p, upper.p)
}

two.sided.p.value.perm(obs.coef, null.coefs)

So actually this doesn't look statistically significant...

But, in fact, in the paper they focus their analysis a bit more in a couple ways. One is onto households that were not, as part of the treatment, randomly assigned to get information about insurance adoption in their area. Perhaps that social information would reduce the impact of other social info. They also restrict attention to households treated in the second period.

In [70]:
# a relevant subset
cai$relevant.case <- with(cai, delay == 1 & info_none == 1)

obs.coef <- with(
  cai,
  peer.regression.coef(intensive, intensive.nondelay.peers, n.peers, takeup_survey)
)
print(obs.coef)

I(z.peers/n.peers) 
            0.0508 


Since only some units have relevant outcomes now, we can make all of them the focal units.

In [76]:
cai$is.focal <- cai$relevant.case

null.coefs <- do.focal.aux.permutation(
  A,
  cai$intensive,
  cai$n.peers, cai$takeup_survey,
  cai$is.focal,
  R = 1e3
)

summary(null.coefs)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-0.1020 -0.0016  0.0311  0.0295  0.0599  0.1600 

In [73]:
obs.coef <- with(
  subset(cai, is.focal),
  peer.regression.coef(intensive, intensive.nondelay.peers, n.peers, takeup_survey)
)
print(obs.coef)

I(z.peers/n.peers) 
             0.197 


In [77]:
two.sided.p.value.perm(obs.coef, null.coefs)

So the p-value is very close to 0. For this subpopulation, there is strong evidence.