# Mean and Variance Estimates for Multiple Randomization Designs

In a two-sided marketplace, multiple sellers and multiple buyers interact. For instance, on an e-commerce platform,  which is a typical type of a two-sided marketplace, buyers see a sellers' offers and choose to buy or not. The platform wants to assess a new info policy's impact on a buyer's likelihood of buying an item. In a traditional A/B test, researchers can randomize buyers to A (treatment group) or B (control group) or sellers to A or B. These single randomization designs are traditional RCTs, and can lead to biased average treatment effect estimate. 

For example, the platform wants to recommend more TVs during black Friday to customers, and wants to test if the strategy is effective using A/B test. If researchers carry out an user side experiments, users in treatment and control group will compete for the same product. If treatment group users buy all TVs, which leads to control group users have nothing to buy, average treatment effect can be over-estimate. Therefore a two-sided A/B test is desired. 

We follow Bajari, Patrick, et al. (2021) (https://arxiv.org/pdf/2112.13495) formulas and provide treatment effect estimators for the following types.  


### Estimators

We have following types of estimators:
* $\tau(p^B,p^S)$: the treatment effect of treated buyer and treated seller pairs vs control buyer and control seller pair. The estimator is $\bar{\bar{Y_t}}-\bar{\bar{Y_{c}}}$, variance is 
\begin{equation}
\begin{aligned}
V(\tau(p^B, p^S)) = V(\bar{\bar{Y_t}}-\bar{\bar{Y_{c}}})=
&V(\bar{\bar{Y_t}})+V(\bar{\bar{Y_{c}}})-2C(\bar{\bar{Y_t}},\bar{\bar{Y_{c}}})\\
\end{aligned}
\end{equation}
* $\tau_{direct}$: the treatment effect of treated buyer and treated seller pairs vs control buyer and control seller pair, minus spill-over effect. The estimator is $\bar{\bar{Y_t}}-\bar{\bar{Y_{ib}}}-\bar{\bar{Y_{is}}}+\bar{\bar{Y_{c}}}$, variance is 
\begin{equation}
\begin{aligned}
V(\tau_{direct}) = V(\bar{\bar{Y_t}}-\bar{\bar{Y_{ib}}}-\bar{\bar{Y_{is}}}+\bar{\bar{Y_{c}}})=
&V(\bar{\bar{Y_t}})+V(\bar{\bar{Y_{ib}}})+V(\bar{\bar{Y_{is}}})+V(\bar{\bar{Y_{c}}})\\
&-2C(\bar{\bar{Y_t}},\bar{\bar{Y_{ib}}})-2C(\bar{\bar{Y_t}},\bar{\bar{Y_{is}}})+2C(\bar{\bar{Y_t}},\bar{\bar{Y_{c}}})\\
&+2C(\bar{\bar{Y_{ib}}},\bar{\bar{Y_{is}}})-2C(\bar{\bar{Y_{ib}}},\bar{\bar{Y_{c}}})\\
&-2C(\bar{\bar{Y_{is}}},\bar{\bar{Y_{c}}}))\\
\end{aligned}
\end{equation}
* $\tau^b_{spillover}$: spill-over effect from treated users to control users. The estimator is $\bar{\bar{Y_{ib}}}-\bar{\bar{Y_{c}}}$, variance is 
\begin{equation}
\begin{aligned}
V(\tau^b_{spillover}) = V(\bar{\bar{Y_{ib}}}-\bar{\bar{Y_{c}}})=
&V(\bar{\bar{Y_{ib}}})+V(\bar{\bar{Y_{c}}})-2C(\bar{\bar{Y_{ib}}},\bar{\bar{Y_{c}}})\\
\end{aligned}
\end{equation}
* $\tau^s_{spillover}$: spill-over effect from treated sellers to control sellers. The estimator is $\bar{\bar{Y_{is}}}-\bar{\bar{Y_{c}}}$, variance is 
\begin{equation}
\begin{aligned}
V(\tau^s_{spillover}) = V(\bar{\bar{Y_{is}}}-\bar{\bar{Y_{c}}})=
&V(\bar{\bar{Y_{is}}})+V(\bar{\bar{Y_{c}}})-2C(\bar{\bar{Y_{is}}},\bar{\bar{Y_{c}}})\\
\end{aligned}
\end{equation}

In formulas above, $V(\bar{\bar{Y_t}})$, $V(\bar{\bar{Y_{ib}}})$, $V(\bar{\bar{Y_{is}}})$, and $V(\bar{\bar{Y_c}})$ follows Lemma A.4. Covariance terms follows Lemma A.5.



In [None]:
import sys
print(sys.path)
sys.path.append('/Users/bytedance/PycharmProjects/github/CausalMatch')

import causalmatch as causalmatch
from causalmatch import matching, gen_test_data, gen_test_data_mrd
from causalmatch import mrd

print('current version is: ',causalmatch.__version__)


import pandas as pd
import numpy as np
import statsmodels.api as sm


In [2]:
# STEP 1: generate synthetic data of a two-sided experiment:
#         In the test dataset, shop_id mimics seller id in Bajari, Patrick, et al. (2021),
#         user_id mimics buyer id, treatment_u is the treatment status of the user_id,
#         treatment_s is the treatment status of the shop_id, y_overflow is the dependent variable
#         target to be estimated. 

df_raw = gen_test_data_mrd(n_shops = 5
                          , n_users = 10
                          , ate = 1.5
                          , uflow = 0.2
                          , sflow = 0.3)
df_raw.head()

Unnamed: 0,shop_id,user_id,treatment,treatment_u,treatment_s,error,status,y_clean,y_overflow
0,1,1,0,0,1,-0.12244,is,-0.12244,0.17756
1,2,1,0,0,0,0.087021,c,0.087021,0.087021
2,3,1,0,0,1,-0.016797,is,-0.016797,0.283203
3,4,1,0,0,0,-0.098401,c,-0.098401,-0.098401
4,5,1,0,0,0,-0.155775,c,-0.155775,-0.155775


In [3]:
# STEP 2: initialize a mrd object
mrd_obj = mrd(data = df_raw,
              idb = 'user_id',
              ids = 'shop_id',
              tb  = 'treatment_u',
              ts  = 'treatment_s',
              y   = 'y_overflow')

In [4]:
# STEP 3: calculate average treatment effect of four type
mrd_obj.ate()

Unnamed: 0,parameters,mean,variance,t_stat,p_values
0,tau,2.082982,0.127165,41.303428,9.676286e-40
1,tau_tdirect,1.487907,0.072563,39.057345,1.3841649999999998e-38
2,tau_b_spillover,0.278886,0.003458,33.535917,1.860179e-35
3,tau_s_spillover,0.316188,0.003518,37.695863,7.453885999999999e-38
