# A/B Testing

## Test of Means Analysis
Statistical calculation of whether the mean values of the treatment and control groups are the same

## T-Test
Give p-value - likelihood the actual difference between the means is zero.  
p-value < 0.05 (statistically significant)  
T-test is important when working in small data set

There are three types of t-tests you can use:  
1. Paired  
2. Equal variance  
3. Unequal variance  

In a random experiment you will usually assume that variances between the groups are different, so we’ll use an unequal variance t-test.

In [None]:
# load package
import pandas as pd
import numpy as np
from scipy.stats import ttest_ind

## Supper Hero Data

In [None]:
super_hero = pd.read_excel('superherodata.xlsx')
super_hero

In [None]:
useful_columns = list(super_hero.columns)[:8]
useful_columns

In [None]:
super_hero_clean_data = super_hero[useful_columns].dropna()
super_hero_clean_data

In [None]:
good = super_hero_clean_data[super_hero_clean_data['Alignment']=='good'].select_dtypes(exclude='O')
good

In [None]:
bad = super_hero_clean_data[super_hero_clean_data['Alignment']=='bad'].select_dtypes(exclude='O')
bad

In [None]:
sp = ttest_ind(good, bad, equal_var=False)
print('Statistic:', sp.statistic)
print('pvalue:', sp.pvalue)

## Customer Support Time Study Data

In [None]:
customer_data = pd.read_excel('customersupporttimestudydata.xlsx', header=2)
customer_data

In [None]:
customer_data = customer_data.dropna().drop(index=8).drop(columns='Customer Onboarding Process').astype('float')
customer_data

In [None]:
nataly = customer_data[:4].sum().values
joe = customer_data[4:].sum().values
joe

In [None]:
sp = ttest_ind(joe, nataly, equal_var=False)
print('Statistic:', sp.statistic)
print('pvalue:', sp.pvalue)

## Customer Service AB Testing Data 

In [None]:
customer_service_data = pd.read_csv('customerserviceabtestdata.csv')
customer_service_data.head()

In [None]:
phone_count = customer_service_data['Phone Number'].value_counts()
phone_count[:10]

In [None]:
AutomatedFlag_0 = customer_service_data[customer_service_data['AutomatedFlag']==0]['CS Rating'].values
AutomatedFlag_1 = customer_service_data[customer_service_data['AutomatedFlag']==1]['CS Rating'].values
AutomatedFlag_1

In [None]:
sp = ttest_ind(AutomatedFlag_0, AutomatedFlag_1, equal_var=False)
print('Statistic:', sp.statistic)
print('pvalue:', sp.pvalue)

## Grocery Website AB Test Data

In [None]:
data = pd.read_csv('grocerywebsiteabtestdata.csv')
data.head()

In [None]:
data = data.drop(columns='RecordID')

In [None]:
data['Group'] = np.where(data['ServerID'] == 1, 'Treatment', 'Control')
data.head()

In [None]:
data_summary = data.groupby(['IP Address', 'LoggedInFlag', 'Group', 'ServerID'], as_index=False).sum('VisitPageFlag')
data_summary.head(10)

In [None]:
# Multiple visit fix
data_summary['VisitPageFlag'] = np.where(data_summary['VisitPageFlag'] == 0, 0, 1)
data_summary.head()

In [None]:
non_logged_user = data_summary[data_summary['LoggedInFlag'] != 1]
non_logged_user.head(10)

In [None]:
treatment = non_logged_user[non_logged_user['Group']=='Treatment']['VisitPageFlag'].values
control = non_logged_user[non_logged_user['Group']=='Control']['VisitPageFlag'].values
control

In [None]:
sp = ttest_ind(control, treatment, equal_var=False)
print('Statistic:', sp.statistic)
print('pvalue:', sp.pvalue)

In [None]:
non_logged_user.head()

In [None]:
summary_non_logged_user = non_logged_user.groupby(['Group', 'VisitPageFlag'], as_index=False).size()
summary_non_logged_user

In [None]:
perc_control_increase = 6131 / (6131 + 26839) * 100
perc_treatment_increase = 3847 / (3847 + 12696) * 100

print('Control Increament:', perc_control_increase)
print('Treatment Increament:', perc_treatment_increase)

In [None]:
print('Conclusion:', perc_treatment_increase - perc_control_increase, '% jump if company change link to picture image of the app')

## Matched Pair Design