[Probability and Statistics for Data Science](https://www.ps4ds.net/) 

Code for Examples 7.64 and 7.66

Causal inference analysis to evaluate the effect of private lessons on student grades, taking into account that there is a possible confounding factor (whether they have previously failed the course)\
Topics: Causal inference, average treatment effect, confounding factor, adjusting for confounders 

Author: Carlos Fernandez-Granda\
Data source: https://archive.ics.uci.edu/dataset/320/student+performance

In [15]:
import pandas as pd
import numpy as np

data = pd.read_csv ("../data/student_grades/student-mat.csv", encoding='latin-1',sep=";")

np.set_printoptions(precision=3)

grades = data["G3"]

In [16]:
private = np.sum(data["paid"]=="yes")
prob_private = private / len(data["paid"]) 
print("Fraction with private classes",np.round(prob_private,4))

mean_private = np.mean(grades[data["paid"]=="yes"])
mean_noprivate = np.mean(grades[data["paid"]=="no"])
print("Private lessons, sample mean of grade: ",np.round(mean_private,4))
print("No private lessons, sample mean of grade: ",np.round(mean_noprivate,4))

print("Naive average treatment effect: ",np.round(mean_private - mean_noprivate,4))

mean_failures = np.mean(grades[data["failures"] >0])
mean_nofailures = np.mean(grades[data["failures"]==0])

print("Sample mean if failed: ",np.round(mean_failures,4))
print("Sample mean if didn't fail: ",np.round(mean_nofailures,4))

failures = np.sum(data["failures"] >0)
no_failures = np.sum(data["failures"] == 0)
prob_failures = failures / (failures + no_failures)
print("Fraction of failures:",np.round(prob_failures,4))

grades_cond = grades[data["failures"] >0]
paid_cond = data["paid"][data["failures"]> 0]

print("\nStudents who previously failed:")
private_failures = np.sum(paid_cond=="yes")
prob_private_failures = private_failures / failures 
print("Received private classes",private_failures)
print("Didn't receive private classes",len(paid_cond) - np.sum(paid_cond=="yes"))
print("Fraction with private classes",np.round(prob_private_failures,4))
mean_private_failures = np.mean(grades_cond[paid_cond =="yes"])
mean_noprivate_failures = np.mean(grades_cond[paid_cond=="no"])
print("Sample mean grade (private lessons): ",np.round(mean_private_failures,4))
print("Sample mean grade (no private lessons): ",np.round(mean_noprivate_failures,4))

grades_cond = grades[data["failures"] == 0]
paid_cond = data["paid"][data["failures"]== 0]

print("\nStudents who did not previously fail:")
private_nofailures = np.sum(paid_cond=="yes")
prob_private_nofailures = private_nofailures / no_failures 
print("Received private classes",private_nofailures)
print("Didn't receive private classes",len(paid_cond) - np.sum(paid_cond=="yes"))
print("Fraction with private classes",np.round(prob_private_nofailures,4))
mean_private_nofailures = np.mean(grades_cond[paid_cond =="yes"])
mean_noprivate_nofailures = np.mean(grades_cond[paid_cond=="no"])
print("Sample mean grade (private lessons): ",np.round(mean_private_nofailures,4))
print("Sample mean grade (no private lessons): ",np.round(mean_noprivate_nofailures,4))

adjusted_mean_private = prob_failures * mean_private_failures + (1-prob_failures) * mean_private_nofailures
print("\nAdjusted mean grade (private lessons)",np.round(adjusted_mean_private,4))
adjusted_mean_noprivate = prob_failures * mean_noprivate_failures + (1-prob_failures) * mean_noprivate_nofailures
print("Adjusted mean grade (no private lessons)",np.round(adjusted_mean_noprivate,4))
print("Adjusted average treatment effect",np.round(adjusted_mean_private-adjusted_mean_noprivate,4))

Fraction with private classes 0.4582
Private lessons, sample mean of grade:  10.9227
No private lessons, sample mean of grade:  9.986
Naive average treatment effect:  0.9367
Sample mean if failed:  7.2651
Sample mean if didn't fail:  11.2532
Fraction of failures: 0.2101

Students who previously failed:
Received private classes 22
Didn't receive private classes 61
Fraction with private classes 0.2651
Sample mean grade (private lessons):  8.9545
Sample mean grade (no private lessons):  6.6557

Students who did not previously fail:
Received private classes 159
Didn't receive private classes 153
Fraction with private classes 0.5096
Sample mean grade (private lessons):  11.195
Sample mean grade (no private lessons):  11.3137

Adjusted mean grade (private lessons) 10.7242
Adjusted mean grade (no private lessons) 10.335
Adjusted average treatment effect 0.3892
