# Simulating a pricing problem - a walkthrough

# 1. Data simulation: a predictor of the outcome

In [1]:
from numpy.random import choice, normal, seed
import statsmodels.formula.api as smf
import random
import pandas as pd

seed(1234)

n = 100

z = normal(loc=0, scale=.5, size=n)
x = normal(loc=0, scale=.5, size=n)
y = 3*x + 3*z + normal(loc=0, scale=.5, size=n)

df = pd.DataFrame({'x': x, 'y': y, 'z': z})

# 2. Controling for a predictor of the outcome can reduce ATE’s standard error

In [2]:
smf.ols(formula='y ~ x', data=df).fit().summary()

In [3]:
smf.ols(formula="y ~ x + z", data=df).fit().summary()

The standard error (variance of the estimation) of the second model is
much lower than the first one. In this case, adding a covariate improves
our estimation.

# 3. Data simulation: a predictor of the treatment variable

In [4]:
n = 100

z = normal(loc=0, scale=.5, size=n)
x = z + normal(loc=0, scale=.01, size=n)
y = 3*x + normal(loc=0, scale=.5, size=n)

df = pd.DataFrame({'x': x, 'y': y, 'z': z})

# 4. Controling for a predictor of the treatment increases ATE’s standard error

In [5]:
smf.ols(formula='y ~ x', data=df).fit().summary()

# 5. Obtaining a linear model of bonuses and ghosts

In [6]:
smf.ols(formula='y ~ x + z', data=df).fit().summary()

The standard error for the coefficient of ‘x’ is higher when controling
for ‘z’