# Workshop 5: Formulating Hypothesis and MVT

__A/B Testing__
Suppose you have a predicted churn with 95% accuracy. By calling customers who are likely to churn and giving them 
attractive offers you are assuming 10% of them will retain and will bring 20 USD of revenue per customer. But, these are a lot of assumptions:-
- Firstly, the model accuracy is 95%. Next months their will be new campaigns, new product features, different marketing & brand activities, new seasonality and so on. Historical and current data rarely match in these scenarios. so, we can't provide the same outcome under different conditions. The circumstances have been changed.
- Next, we are assuming that there will be 10% conversion. But, we cannot be sure that your new action will have 10% conversion even without factors mentioned above. Moreover, since it is a new group of customers, their actions are unpredictable.
- Finally, we are assuming that each of these customers will bring 20 USD as monthly revenue. But, it doesn't mean eacg retained customer will bring the same after your new action.
And to see what is going to happen we perform an A/B Test.

Going forward with the issues mentioned above, our hypothesis testing may still have the error generated from the underlying distributions.

Let set two test groups, which will have different retention with different treatments:

Group A → Offer → Higher Retention
Group B → No offer → Lower Retention

The stimulation helps us to test model accuracy. 

If group B’s retention rate is only 50% and the retention of the group A is 60%, it clearly shows how our model is sometime not working. The same applies to measure revenue coming from those users too. In this case our success metric will be retention rate of both groups.

In [None]:
#import libraries
from datetime import datetime, timedelta, date
import pandas as pd
%matplotlib inline
from sklearn.metrics import classification_report, confusion_matrix
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from sklearn.cluster import KMeans
import sklearn
from sklearn.model_selection import KFold, cross_val_score, train_test_split

Now we are going to create our own dataset. The dataset will contain the columns below:
- customer_id: the unique identifier of the customer
- segment: customer’s segment; high-value or low-value
- group: indicates whether the customer is in the test or control group
- purchase_count: # of purchases completed by the customer

In [None]:
df_hv = pd.DataFrame()
df_hv['customer_id'] = np.array([count for count in range(20000)])
df_hv['segment'] = np.array(['high-value' for _ in range(20000)])
df_hv['group'] = 'control'
df_hv.loc[df_hv.index<10000,'group'] = 'test'

Ideally, purchase count is drawn from a Poisson distribution. There will be customers with no purchase and we will have less customers with high purchase counts. Let’s use numpy.random.poisson() for doing that and assign different 
mean of Poission distributions to test and control group:

In [None]:
df_hv.loc[df_hv.group == 'test', 'purchase_count'] = np.random.poisson(0.7, 10000)
df_hv.loc[df_hv.group == 'control', 'purchase_count'] = np.random.poisson(0.5, 10000)

In [None]:
from scipy.special import factorial
rate_1 = .5
rate_2 = .7
t = np.arange(0, 6, 0.01)

d_1 = rate**(t)/factorial(t)*np.exp(-rate_1)
plt.plot(t, d_1, 'bs',color='green', marker='.', linestyle='dashed', linewidth=1, markersize=3)

d_2 = rate**(t)/factorial(t)*np.exp(-rate_2)
plt.plot(t, d_2, 'bs',color='red', marker='.', linestyle='dashed', linewidth=1, markersize=3)

plt.show()

In [None]:
df_hv.head(10)

In [None]:
df_hv.tail(10)

__High Value Customer Test Vs Control Group Comparison__

In [None]:
#Assume we applied an offer to 50% of high-value users and observed their purchases in a given period. 
#Best way to visualize it to check the densities:
test_results = df_hv[df_hv.group == 'test'].purchase_count
control_results = df_hv[df_hv.group == 'control'].purchase_count

hist_data = pd.DataFrame(list(zip(test_results, control_results)),columns=['test', 'control'])
group_labels = ['test', 'control']

In [None]:
fig,ax = plt.subplots(figsize=(14,8))
table = pd.crosstab(df_hv["group"],df_hv["purchase_count"])
pd.crosstab(df_hv["group"],df_hv["purchase_count"]).div(table.sum(1).astype(float), axis=0).T.plot(kind='bar',ax=ax)
plt.title("Proportion Plot High Value Customer Test Vs Control Group")
plt.xlabel("Purchase Count")
plt.legend(["Control","Test"],loc='lower left',frameon=False)
for p in ax.patches:
    width, height = p.get_width(), p.get_height()
    x, y = p.get_xy() 
    ax.annotate('{:.2%}'.format(height), (x, y + height + 0.01))
plt.show() 

In [None]:
table

The results are looking really good. The density of the test group’s purchase is better starting from 1. 
But how we can certainly say this simulation is successful and the difference does not happen due to other factors.
Let check if the uptick in the test group is statistically significant. For this we will perform
t-test.

In [None]:
from scipy import stats 
test_result = stats.ttest_ind(test_results, control_results)
print(test_result)

ttest_ind() method returns two output:

1. t-statistic: represents the difference between averages of test and control group in units of standard error. Higher t-statistic value means bigger difference and supports our hypothesis.

2. p-value: measures the probability of the null hypothesis to be true. If null hypothesis is true, it means there is no significant difference between your test and control group. So lower p-value means lower probability of null is right due to the sampling. It implies that higher probability of null hypothesis is not right. As the industry standard, we accept that p-value < 5%, claiming that the result is statistically significant. 
(but it depends on your business logic, there are cases that people use 10% or even 1%)

In [None]:
def eval_test(test_results,control_results):
    test_result = stats.ttest_ind(test_results, control_results)
    if test_result[1] < 0.05:
        print('result is significant')
    else:
        print('result is not significant')

In [None]:
eval_test(test_result,control_results)

__Blocking__

Create two groups of customers based on their previous purchase patterns, high value group 20% and low value group 80%. It is not explicitly stated the underlying factors why they are labelled as high value group or low value group and simulated their previous purchase patterns, which high value group purchased more previously. 

In [None]:
#create hv segment
df_hv = pd.DataFrame()
df_hv['customer_id'] = np.array([count for count in range(20000)])
df_hv['segment'] = np.array(['high-value' for _ in range(20000)])
df_hv['prev_purchase_count'] = np.random.poisson(0.9, 20000)

df_lv = pd.DataFrame()
df_lv['customer_id'] = np.array([count for count in range(20000,100000)])
df_lv['segment'] = np.array(['low-value' for _ in range(80000)])
df_lv['prev_purchase_count'] = np.random.poisson(0.3, 80000)

df_customers = pd.concat([df_hv,df_lv],axis=0)

In [None]:
df_customers.head()
#df_customers.shape

In [None]:
df_customers.tail()

In [None]:
len(df_customers)
print(f"Total sample size = ",len(df_customers))

Separate the sample into 90% as test group and 10% as control group

In [None]:
df_test = df_customers.sample(frac=0.9)
df_control = df_customers[~df_customers.customer_id.isin(df_test.customer_id)]

In [None]:
df_test.segment.value_counts()

In the test group there are around 72K of low value customers and 19K of high value customers

In [None]:
df_control.segment.value_counts()

In the control group there are around 8K of low value customers and 2K of high value customers.

Unless you sample within the segment, you would not have the exact same ration proportion of high value custoemrs to low value customers

In [None]:
df_test_hv = df_customers[df_customers.segment == 'high-value'].sample(frac=0.9)
df_test_lv = df_customers[df_customers.segment == 'low-value'].sample(frac=0.9)

df_test = pd.concat([df_test_hv,df_test_lv],axis=0)
df_control = df_customers[~df_customers.customer_id.isin(df_test.customer_id)]

In [None]:
df_test.segment.value_counts()

In [None]:
df_control.segment.value_counts()

In [None]:
# create different groups of customers with different "current" purchasing patterns A (40%), B (60%), C (20%)
# within the hv segment which only defined by "previous" pruchasing pattern
df_hv = pd.DataFrame()
df_hv['customer_id'] = np.array([count for count in range(30000)])
df_hv['segment'] = np.array(['high-value' for _ in range(30000)])
df_hv['group'] = 'A'
df_hv.loc[df_hv.index>=10000,'group'] = 'B' 
df_hv.loc[df_hv.index>=20000,'group'] = 'C'

df_hv.group.value_counts()

df_hv.loc[df_hv.group == 'A', 'purchase_count'] = np.random.poisson(0.4, 10000)
df_hv.loc[df_hv.group == 'B', 'purchase_count'] = np.random.poisson(0.6, 10000)
df_hv.loc[df_hv.group == 'C', 'purchase_count'] = np.random.poisson(0.2, 10000)

In [None]:
a_stats = df_hv[df_hv.group=='A'].purchase_count
b_stats = df_hv[df_hv.group=='B'].purchase_count
c_stats = df_hv[df_hv.group=='C'].purchase_count

hist_data = [a_stats, b_stats, c_stats]

group_labels = ['A', 'B','C']

fig,ax = plt.subplots(figsize=(14,8))
table = pd.crosstab(df_hv["group"],df_hv["purchase_count"])
pd.crosstab(df_hv["group"],df_hv["purchase_count"]).div(table.sum(1).astype(float), axis=0).T.plot(kind='bar',ax=ax)
plt.title("Proportion Plot High Value Customer amongst Different Groups")
plt.xlabel("Purchase Count")
plt.legend(["Group A","Group B","Group C"],loc='best',frameon=False)
for p in ax.patches:
    width, height = p.get_width(), p.get_height()
    x, y = p.get_xy() 
    ax.annotate('{:.1%}'.format(height), (x, y + height + 0.01))
plt.show() 

## Multivariate Test

An ANOVA test (a F-test), a MTV test, is conducted to find out if group A, B, C are different groups with respect to their current purchasing patterns. 

For example: a group of psychiatric patients are trying three different therapies: counseling, medication and biofeedback. One want to see if one kind of therapy is better than the others. Or students from different colleges take the same exam. Do one college outperform the other.

### Interpreting the MANOVA results
If the multivariate F value indicates the test is statistically significant, this means that something is significant. In the example below, you would not know if purchase mean frequency are different. Once you have a significant result, you would then have to look at each individual component (the univariate F tests) to see which dependent variable(s) contributed to the statistically significant result.

### Advantages and Disadvantages of t-test vs. ANOVA
#### Advantages
MANOVA enables you to test multiple dependent variables.
MANOVA can protect against Type I errors.
#### Disadvantages
MANOVA is many times more complicated than ANOVA, making it a challenge to see which independent variables are affecting dependent variables.
One degree of freedom is lost with the addition of each new variable.
The dependent variables should be uncorrelated as much as possible. If they are correlated, the loss in degrees of freedom means that there isn’t much advantages in including more than one dependent variable on the test.

In [None]:
def one_anova_test(a_stats,b_stats,c_stats):
    test_result = stats.f_oneway(a_stats, b_stats, c_stats)
    if test_result[1] < 0.05:
        print('result is significant')
    else:
        print('result is not significant')

one_anova_test(a_stats,b_stats,c_stats)

In [None]:
df_hv.loc[df_hv.group == 'A', 'purchase_count'] = np.random.poisson(0.5, 10000)
df_hv.loc[df_hv.group == 'B', 'purchase_count'] = np.random.poisson(0.5, 10000)
df_hv.loc[df_hv.group == 'C', 'purchase_count'] = np.random.poisson(0.5, 10000)

a_stats = df_hv[df_hv.group=='A'].purchase_count
b_stats = df_hv[df_hv.group=='B'].purchase_count
c_stats = df_hv[df_hv.group=='C'].purchase_count

hist_data = [a_stats, b_stats, c_stats]

group_labels = ['A', 'B','C']

In [None]:
fig,ax = plt.subplots(figsize=(18,8))
table = pd.crosstab(df_hv["group"],df_hv["purchase_count"])
pd.crosstab(df_hv["group"],df_hv["purchase_count"]).div(table.sum(1).astype(float), axis=0).T.plot(kind='bar',ax=ax)
plt.title("Test Vs Control Stats")
plt.xlabel("Purchase Count")
plt.legend(["Group A","Group B","Group C"],loc='best',frameon=False)
for p in ax.patches:
    width, height = p.get_width(), p.get_height()
    x, y = p.get_xy() 
    ax.annotate('{:.1%}'.format(height), (x, y + height + 0.01))
plt.show() 

In [None]:
one_anova_test(a_stats,b_stats,c_stats)

In [None]:
# create hv segment and lv segment control and test groups purchasing pattern, 
# which hv segment test group are assumed to have proportionally higher than lv segment test group 

df_hv = pd.DataFrame()
df_hv['customer_id'] = np.array([count for count in range(20000)])
df_hv['segment'] = np.array(['high-value' for _ in range(20000)])
df_hv['group'] = 'control'
df_hv.loc[df_hv.index<10000,'group'] = 'test' 
df_hv.loc[df_hv.group == 'control', 'purchase_count'] = np.random.poisson(0.6, 10000)
df_hv.loc[df_hv.group == 'test', 'purchase_count'] = np.random.poisson(0.8, 10000)


df_lv = pd.DataFrame()
df_lv['customer_id'] = np.array([count for count in range(20000,100000)])
df_lv['segment'] = np.array(['low-value' for _ in range(80000)])
df_lv['group'] = 'control'
df_lv.loc[df_lv.index<40000,'group'] = 'test' 
df_lv.loc[df_lv.group == 'control', 'purchase_count'] = np.random.poisson(0.3, 40000)
df_lv.loc[df_lv.group == 'test', 'purchase_count'] = np.random.poisson(0.4, 40000)

df_customers = pd.concat([df_hv,df_lv],axis=0)

In [None]:
df_customers.head()

## Regression

Estimate a regression (MVT) current purchasing pattern on label (segment) + (group)

In [None]:
import statsmodels.formula.api as smf 
from statsmodels.stats.anova import anova_lm
model = smf.ols(formula='purchase_count ~ segment + group ', data=df_customers).fit()
print(model.summary())

Stimulate test and control without differences, see whther the MVT test can test against it.

In [None]:
#create hv segment
df_hv = pd.DataFrame()
df_hv['customer_id'] = np.array([count for count in range(20000)])
df_hv['segment'] = np.array(['high-value' for _ in range(20000)])
df_hv['group'] = 'control'
df_hv.loc[df_hv.index<10000,'group'] = 'test' 
df_hv.loc[df_hv.group == 'control', 'purchase_count'] = np.random.poisson(0.8, 10000)
df_hv.loc[df_hv.group == 'test', 'purchase_count'] = np.random.poisson(0.8, 10000)


df_lv = pd.DataFrame()
df_lv['customer_id'] = np.array([count for count in range(20000,100000)])
df_lv['segment'] = np.array(['low-value' for _ in range(80000)])
df_lv['group'] = 'control'
df_lv.loc[df_lv.index<40000,'group'] = 'test' 
df_lv.loc[df_lv.group == 'control', 'purchase_count'] = np.random.poisson(0.2, 40000)
df_lv.loc[df_lv.group == 'test', 'purchase_count'] = np.random.poisson(0.2, 40000)

df_customers = pd.concat([df_hv,df_lv],axis=0)

In [None]:
import statsmodels.formula.api as smf 
from statsmodels.stats.anova import anova_lm
model = smf.ols(formula='purchase_count ~ segment + group ', data=df_customers).fit()
print(model.summary())

THE END