# Pre vs Post Treatment Tests
- **Objective**: This test compares two states of the samples: before and after an intervention or treatment has been applied.
- Reference: 
    - [My Easy Guide to Pre vs. Post Treatment Tests](https://towardsdatascience.com/my-easy-guide-to-pre-vs-post-treatment-tests-0206f56f83a4)
## Problem Statement
- A grocery store chain observes a spike in some brands of coffee and wants to test whether it increases sales or not if they double the facings of these new best-performing brands to the customer.
    - Hence, they will select some stores at random as a treatment group and make that change.
- Summarizing
    - **Control Group**: Stores without change in the coffee section
    - **Treatment Group**: Stores with the redesigned coffee section
    - **Pre-Period**: a period before the intervention. 
        - The size of this period must be chosen taking into consideration the seasonality of the business and any other aspects that can affect the results, like sales, promotions, holidays, and weekends.
    - **Post Period**: the period after the intervention. The same is true for determining the size of the period
- Hypothesis:
    - Ho says that the treatment is not effective
    - Ha says that it is effective.

In [2]:
import numpy as np
import pandas as pd
import scipy.stats as scs

## Dataset
- To simulate a situation of Pre-Post intervention.

In [11]:
# Date: 
dates = pd.date_range(start='2024-01-01', end='2024-01-30')

# Stores IDs: 1000 stores in total
store_ids = range(1,1001,1)

# Control and Treat Group
np.random.seed(42)
group = np.random.choice(a=['Control', 'Treatment'], size=len(store_ids), p=[0.50, 0.50])

# Create dataframe: 1000 stores with 30 days - daily revenue
df = pd.DataFrame({'dt': list(dates)*1000,'store_id': list(store_ids)*30, 'group':list(group)*30})

# Sales Control and Treatment
sales_control = np.random.normal(loc=100, scale=20, size=len(df[df['group'] == 'Control']))
sales_treatment_before = np.random.normal(loc=100, scale=20, size=len(df[(df.group == 'Treatment') & (df.dt <= '2024-01-15')]) )
# after, the mean (loc=104) increases, and the scale=10 reduces vs before as the revenue is more compact
sales_treatment_after = np.random.normal(loc=104, scale=10, size=len(df[(df.group == "Treatment") & (df.dt > "2024-01-15")]))

# Add sales to the data
df = df.sort_values(['group', 'dt'])
df['sales'] = np.concatenate([sales_control, sales_treatment_before, sales_treatment_after])
df = df.sort_values(['dt', 'store_id'])

# View Dataset
df.sample(8).sort_values(['dt', 'store_id'])

Unnamed: 0,dt,store_id,group,sales
1112,2024-01-03,113,Treatment,93.02296
23432,2024-01-03,433,Treatment,128.985849
27555,2024-01-16,556,Control,85.306578
619,2024-01-20,620,Treatment,115.289692
4012,2024-01-23,13,Treatment,109.965434
7494,2024-01-25,495,Control,78.30064
1794,2024-01-25,795,Control,93.938146
9898,2024-01-29,899,Control,83.07306


## Power Analysis
- Find the size of the sample

In [8]:
from statsmodels.stats.power import TTestIndPower

# Parameter for the power analysis
effect = 0.2 # effect size must be positive
alpha = 0.05
power = 0.8

# Perform power analysis
pwr = TTestIndPower()

result = pwr.solve_power(effect, power = power, nobs1= None, 
                         ratio = 1, alpha=alpha)

print(result)

393.4056989990348


- Since we have around 500 stores in each group, we’re covered.

## Data Pre-processing
- Cut-off time between Pre and Post periods is on 2024–01–15

In [9]:
# Difference between groups
df.groupby('group')[["sales"]].mean()

Unnamed: 0_level_0,sales
group,Unnamed: 1_level_1
Control,100.032725
Treatment,102.014091


In [13]:
# split between Pre and Post periods
df['after'] = np.where(df['dt'] > '2024-01-15', 1, 0)

# pre_post data
df_pre_post = (df # dataset
               .groupby(['store_id','group','after']) # groupings
               .sales.mean() # calculate sales means
               .reset_index()
               .pivot(index=['store_id', 'group'], columns='after', values='sales') # pivot the data to put pre and post in columns
               .reset_index()
               .rename(columns={0:'pre', 1:'post'}) # rename
               )

# create col difference post-pre
df_pre_post = df_pre_post.assign(dif_pp= df_pre_post.post - df_pre_post.pre)

# View
df_pre_post.head()

after,store_id,group,pre,post,dif_pp
0,1,Control,98.14678,108.272529,10.125749
1,2,Treatment,103.114093,100.591466,-2.522627
2,3,Treatment,106.356929,103.304819,-3.05211
3,4,Treatment,89.385473,107.779255,18.393783
4,5,Control,97.79614,94.102268,-3.693872
