<a href="https://colab.research.google.com/github/Olowookere-O-O/Python-for-RCT-and-Impact-Evaluation/blob/main/Python_Pract_for_RCT.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [3]:
import pandas as pd # for data manipulation

In [4]:
import numpy as np # for numerical computation

In [8]:
# Set seed for reproducibility
np.random.seed(42)

To ensure we arrive close to the same answers, I used the statsical moments displayed by the author in my DGP. While this may guarantees promity of answers, the randomness has implication.

In [9]:
desired_mean = [0.575, 20.744, 0.074] ## Mean copied from the author

In [10]:
desired_std = [0.042, 2.013, 0.040] ## Std copied from the author

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [11]:
# Generate random data
num_rows = 324  # to match the provided DataFrame
gender = np.random.choice([0, 1], size=num_rows)
cross_sell_email = np.random.choice(['short', 'long', 'no_email'], size=num_rows)
age = np.random.normal(loc=desired_mean[1], scale=desired_std[1], size=num_rows)
conversion = np.random.normal(loc=desired_mean[2], scale=desired_std[2], size=num_rows)
 # After creating data, now Map gender values to 0 and 1


In [13]:
# Clip age values to ensure they are within a reasonable range
#data['age'] = np.clip(data['age'], 15, 40)

In [14]:
# Create DataFrame
data = pd.DataFrame({'gender': gender, 'cross_sell_email': cross_sell_email, 'age': age, 'conversion': conversion})

cross_sell_email is the typical treatment arms for RCT, customer is the unit of analysis while merginal effect in in coversion is the outcome if interest. Age is unit characteristics.

In [None]:
print(data)

     gender cross_sell_email        age  conversion
0         1            short  20.094895    0.017182
1         0         no_email  20.576842    0.080169
2         0            short  22.441448    0.088470
3         1            short  22.102103    0.072325
4         0             long  16.149769    0.091419
..      ...              ...        ...         ...
319       1             long  20.930821    0.022617
320       1         no_email  21.253771    0.082308
321       1         no_email  21.253870    0.082682
322       0         no_email  21.828827    0.074768
323       0            short  20.676718    0.062375

[324 rows x 4 columns]


In [None]:
(data.groupby(["cross_sell_email"]).mean())

Unnamed: 0_level_0,gender,age,conversion
cross_sell_email,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
long,0.514286,20.639692,0.077049
no_email,0.540984,21.122053,0.07419
short,0.494845,21.058725,0.072464


In [15]:
## ATE = E[Y|T= 1] - E[Y|T = 0]
Trt_Grp_mean_long = data[data['cross_sell_email'] == 'long']['conversion'].mean() ##E[Y|T= 1]
CtlGrp_mean_no_email = data[data['cross_sell_email'] == 'no_email']['conversion'].mean() ##E[Y|T = 0]
secd_trt_arms = data[data['cross_sell_email'] == 'short']['conversion'].mean() ##E[Y|T= 1]
First_ATE = Trt_Grp_mean_long - CtlGrp_mean_no_email
Second_ATE = secd_trt_arms - CtlGrp_mean_no_email

print(First_ATE, Second_ATE)

0.00889261084819884 0.00040954024638872877


In [None]:
## The Long email generates an average sales of 0.2 percentage points
## while shorts email reduced sales by 0.17 percentage points
data.groupby(["cross_sell_email"]).mean()

Unnamed: 0_level_0,gender,age,conversion
cross_sell_email,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
long,0.514286,20.639692,0.077049
no_email,0.540984,21.122053,0.07419
short,0.494845,21.058725,0.072464


While RCT ensures baseline line equivalence through randomnisation, this is only in theory. In practice, this might still be biased. To conduct baseline equivalence checks, we find the normalised difference between the ctrl group and treatment group: $(\mu_{tr} -\mu_{ctr})/\sqrt(σ_{tr} + σ_{ctr}/2)$

In [None]:
X = ["gender", "age"]

In [None]:
mu = data.groupby("cross_sell_email")[X].mean()
var = data.groupby("cross_sell_email")[X].var()
norm_diff = ((mu - mu.loc["no_email"])/np.sqrt((var + var.loc["no_email"])/2))
norm_diff ## Baseline equivalence less than 0.5 is popular. Therefore, we dont have to worry too much

Unnamed: 0_level_0,gender,age
cross_sell_email,Unnamed: 1_level_1,Unnamed: 2_level_1
long,-0.053259,-0.248957
no_email,0.0,0.0
short,-0.092005,-0.033881


Note:If the difference between the groups are within the acceptance region, then no problem. Moreover, random difference is not a problem since according to law of central limit theorem (CLM), as sample size increases, the sample property approximate to normal.  
Now, since uncertainty is inherent in data, it is essesntial to compute the mergin of error and to test whether the difference between groups' outcome is indeed significant. To do this, we compute the standard error of the estimate: $σ/\sqrt(n) $.

In [16]:
## Begin by subseting the data
short_email = data.query("cross_sell_email=='short'")["conversion"]
long_email = data.query("cross_sell_email=='long'")["conversion"]
email = data.query("cross_sell_email!='no_email'")["conversion"]
no_email = data.query("cross_sell_email=='no_email'")["conversion"]

In [17]:
data.groupby("cross_sell_email").size() ## Check the sample size of each group

cross_sell_email
long        111
no_email     98
short       115
dtype: int64

In [None]:
## Write a function for SE
## Note that pandas has it Standard error function .sem
def se(y: pd.Series): return y.std() / np.sqrt(len(y)) ## New is created from practice

In [None]:
print("SE Long Email:", se(long_email), "SE Short Email:", se(short_email))

SE Long Email: 0.003963125760570597 SE Short Email: 0.003711898313291993


In [None]:
exp_se = short_email.sem()
exp_mu = short_email.mean()
ci = (exp_mu - 2 * exp_se, exp_mu + 2 * exp_se)
print(ci)

(0.06504023064720528, 0.07988782390037326)


In [None]:
def ci(y: pd.Series):
             return (y.mean() - 2 * y.sem(), y.mean() + 2 * y.sem())

In [None]:
print("95% CI for Short Email:", ci(short_email))
print("95% CI for Long Email:", ci(long_email))
print("95% CI for No Email:", ci(no_email))

95% CI for Short Email: (0.06504023064720528, 0.07988782390037326)
95% CI for Long Email: (0.06912303758857229, 0.08497554063085468)
95% CI for No Email: (0.06677101690836355, 0.08160947775983803)


In [None]:
pwd

'/content'

In [None]:
from google.colab import drive

In [None]:
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
from google.colab import drive

In [None]:
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
drive.mount("/content/drive", force_remount=True)
[ ]


Mounted at /content/drive


[]

In [None]:
!git clone https://github.com/Olowookere-O-O/Python-for-RCT-and-Impact-Evaluation.git

Cloning into 'Python-for-RCT-and-Impact-Evaluation'...


In [26]:
ls

[0m[01;34msample_data[0m/


In [None]:
!git add cross_sell_email.csv

In [None]:
!git commit -m "Add cross_sell_email.csv"

Author identity unknown

*** Please tell me who you are.

Run

  git config --global user.email "you@example.com"
  git config --global user.name "Your Name"

to set your account's default identity.
Omit --global to set the identity only in this repository.

fatal: unable to auto-detect email address (got 'root@98b747a19c6b.(none)')


In [None]:
!git config --global user.mail "olowookereolawale1993@gmail.com"

In [None]:
!git config --global user.name "Olowookere-O-O"

In [None]:
!git push origin main  # or the branch you are working on

error: src refspec main does not match any
[31merror: failed to push some refs to 'https://github.com/Olowookere-O-O/Python-for-RCT-and-Impact-Evaluation.git'
[m

In [None]:
!git commit -m "first commit"

Author identity unknown

*** Please tell me who you are.

Run

  git config --global user.email "you@example.com"
  git config --global user.name "Your Name"

to set your account's default identity.
Omit --global to set the identity only in this repository.

fatal: unable to auto-detect email address (got 'root@98b747a19c6b.(none)')


In [None]:
!git config --global user.mail "olowookereolawale1993@gmail.com"

In [None]:
!git config --global user.name "Olowookere-O-O"

In [None]:
!git branch -M main

In [None]:
!git remote add origin https://github.com/Olowookere-O-O/Python-for-RCT-and-Impact-Evaluation.git

error: remote origin already exists.


In [None]:
!git branch -M main

In [None]:
!git push -u origin main

error: src refspec main does not match any
[31merror: failed to push some refs to 'https://github.com/Olowookere-O-O/Python-for-RCT-and-Impact-Evaluation.git'
[m

In [None]:
!git branch your-branch-DGP

fatal: Not a valid object name: 'main'.


In [None]:
ls

[0m[01;34msample_data[0m/


In [None]:
pwd

'/content'

In [None]:
! git

usage: git [--version] [--help] [-C <path>] [-c <name>=<value>]
           [--exec-path[=<path>]] [--html-path] [--man-path] [--info-path]
           [-p | --paginate | -P | --no-pager] [--no-replace-objects] [--bare]
           [--git-dir=<path>] [--work-tree=<path>] [--namespace=<name>]
           [--super-prefix=<path>] [--config-env=<name>=<envvar>]
           <command> [<args>]

These are common Git commands used in various situations:

start a working area (see also: git help tutorial)
   clone     Clone a repository into a new directory
   init      Create an empty Git repository or reinitialize an existing one

work on the current change (see also: git help everyday)
   add       Add file contents to the index
   mv        Move or rename a file, a directory, or a symlink
   restore   Restore working tree files
   rm        Remove files from the working tree and from the index

examine the history and state (see also: git help revisions)
   bisect    Use binary search to find th

In [None]:
! git init

[33mhint: Using 'master' as the name for the initial branch. This default branch name[m
[33mhint: is subject to change. To configure the initial branch name to use in all[m
[33mhint: [m
[33mhint: 	git config --global init.defaultBranch <name>[m
[33mhint: [m
[33mhint: Names commonly chosen instead of 'master' are 'main', 'trunk' and[m
[33mhint: 'development'. The just-created branch can be renamed via this command:[m
[33mhint: [m
[33mhint: 	git branch -m <name>[m
Initialized empty Git repository in /content/.git/


In [None]:
! git clone https://github.com/Olowookere-O-O/Python-for-RCT-and-Impact-Evaluation.git

Cloning into 'Python-for-RCT-and-Impact-Evaluation'...


In [None]:
pwd

'/content'

In [None]:
cd Python-for-RCT-and-Impact-Evaluation/

/content/Python-for-RCT-and-Impact-Evaluation


In [None]:
! git remote -v

origin	https://github.com/Olowookere-O-O/Python-for-RCT-and-Impact-Evaluation.git (fetch)
origin	https://github.com/Olowookere-O-O/Python-for-RCT-and-Impact-Evaluation.git (push)


In [None]:
! git status

On branch main

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	[31mPython_notebook[m

nothing added to commit but untracked files present (use "git add" to track)


In [None]:
! git add .

In [None]:
! git status

On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	[32mnew file:   Python_notebook[m



In [None]:
!git commit -a -m "first commit"

Author identity unknown

*** Please tell me who you are.

Run

  git config --global user.email "you@example.com"
  git config --global user.name "Your Name"

to set your account's default identity.
Omit --global to set the identity only in this repository.

fatal: unable to auto-detect email address (got 'root@8cf9cbc23c70.(none)')


In [None]:
! git config --global user.email "olowookereolawale1993@gmail.com"
! git config --global user.name "Olowookere-O-O"

In [None]:
! git config --list

filter.lfs.clean=git-lfs clean -- %f
filter.lfs.smudge=git-lfs smudge -- %f
filter.lfs.process=git-lfs filter-process
filter.lfs.required=true
user.email=olowookereolawale1993@gmail.com
user.name=Olowookere-O-O
core.repositoryformatversion=0
core.filemode=true
core.bare=false
core.logallrefupdates=true
remote.origin.url=https://github.com/Olowookere-O-O/Python-for-RCT-and-Impact-Evaluation.git
remote.origin.fetch=+refs/heads/*:refs/remotes/origin/*
branch.main.remote=origin
branch.main.merge=refs/heads/main


In [None]:
! git commit -a -m "first commit"

[main (root-commit) a72481f] first commit
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 Python_notebook


In [None]:
username = input("Enter username: ")

Enter username: Olowookere-O-O


In [None]:
from getpass import getpass

In [None]:
password = getpass("Enter password: ")

Enter password: ··········


In [None]:
! git remote add origin https://$username:$password@github.com/Olowookere-O-O/Python-for-RCT-and-Impact-Evaluation.git

In [None]:
! git remote rm origin

In [None]:
! git remote add origin https://$username:$password@github.com/Olowookere-O-O/Python-for-RCT-and-Impact-Evaluation.git

In [None]:
!git push origin main

remote: Support for password authentication was removed on August 13, 2021.
remote: Please see https://docs.github.com/en/get-started/getting-started-with-git/about-remote-repositories#cloning-with-https-urls for information on currently recommended modes of authentication.
fatal: Authentication failed for 'https://github.com/Olowookere-O-O/Python-for-RCT-and-Impact-Evaluation.git/'


In [None]:
print(data)

NameError: name 'data' is not defined

In [23]:
print(no_email)

2      0.099734
3      0.136588
6      0.048805
9      0.055189
10     0.057902
         ...   
297    0.049849
306    0.056033
308    0.051293
313    0.078307
322    0.029337
Name: conversion, Length: 98, dtype: float64


In [24]:
## Power Calculation
np.ceil(16 * (no_email.std()/0.08)**2)
data.groupby("cross_sell_email").size()

cross_sell_email
long        111
no_email     98
short       115
dtype: int64

In [25]:
print(data)


     gender cross_sell_email        age  conversion
0         0             long  19.505511    0.063247
1         1             long  21.514575    0.056405
2         0         no_email  21.734680    0.099734
3         0         no_email  17.669238    0.136588
4         0            short  21.107331    0.122199
..      ...              ...        ...         ...
319       1             long  21.401090    0.083732
320       1            short  19.924821    0.060252
321       0            short  17.820713    0.135968
322       1         no_email  22.381955    0.029337
323       1            short  17.722156    0.027146

[324 rows x 4 columns]
