## Analyze A/B Test Results



## Table of Contents
- [Introduction](#intro)
- [Part I - Probability](#probability)
- [Part II - A/B Test](#ab_test)


<a id='intro'></a>
### Introduction

For this project, I will be working to understand the results of an A/B test run by Tactile entertainment on a game called Cookie Cats to examine what happens when the first gate in the game was moved from level 30 to level 40. When a player installed the game, he or she was randomly assigned to either gate30 or gate40. My goal is to help the company understand if they should implement the use gate30, or gate40, or perhaps run the experiment longer to make their decision.

The dataset was gotten from kaggle and contains the AB test for 90,189 players that installed the game while the AB-test was running. The variables are:

- userid: A unique number that identifies each player.
- version: Whether the player was put in the control group (gate30 - a gate at level 30) or the group with the moved gate (gate40 - a gate at level 40).
- sumgamerounds: the number of game rounds played by the player during the first 14 days after install.
- retention_1: Did the player come back and play 1 day after installing?
- retention_7: Did the player come back and play 7 days after installing?

When a player installed the game, he or she was randomly assigned to either.

For the purposes of this analysis gate30 will represent the control group and gate40 will be the treatment group,

<a id='probability'></a>
#### Part I - Probability

Importing libraries.

In [1]:
import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib.pyplot as plt
import statsmodels.api as sms
import statsmodels.stats.api as sms
from statsmodels.stats.proportion import proportions_ztest, proportion_confint
from math import ceil

Reading the dataset to a pandas dataframe

In [2]:
df = pd.read_csv('cookie_cats.csv')

In [3]:
df.head()

Unnamed: 0,userid,version,sum_gamerounds,retention_1,retention_7
0,116,gate_30,3,False,False
1,337,gate_30,38,True,False
2,377,gate_40,165,True,False
3,483,gate_40,1,False,False
4,488,gate_40,179,True,True


Unique users

In [4]:
df['userid'].nunique()

90189

Distribution of the users 

In [5]:
df.groupby(by='version').count()['userid']

version
gate_30    44700
gate_40    45489
Name: userid, dtype: int64

Percentage of users that returned after 1 day

In [6]:
sum(df['retention_1'])/ df.shape[0]

0.4452095044850259

Percentage of users that returned after 7 days

In [7]:
sum(df['retention_7'])/ df.shape[0]

0.1860648194347426

Probability of a user given gate30 returns after 1 day

In [8]:
control = df[df['version'] == 'gate_30']
ctr_p1 = sum(control['retention_1'])/ control.shape[0]

Probability of a user given gate30 returns after 7 days

In [9]:
ctr_p2 = sum(control['retention_7'])/ control.shape[0]

Probability of a user given gate40 returns after 1 day

In [10]:
treatment = df[df['version'] == 'gate_40']
trt_p1 = sum(treatment['retention_1'])/ treatment.shape[0]

Probability of a user given gate40 returns after 7 days

In [11]:
trt_p2 = sum(treatment['retention_7'])/ treatment.shape[0]

In [27]:
diff_p1 = ctr_p1 - trt_p1

In [28]:
diff_p2 = ctr_p2 - trt_p2

In [29]:
diff_p1, diff_p2

(0.005905169787341458, 0.008201298315205913)

**From the preliminary analysis there isnt sufficient evidence to suggest that the treatment group leads to more retention**

<a id='ab_test'></a>
#### A/B test

**$H_0$ : $p_{30} =  p_{40}$** <br>
**$H_1$ : $p_{30} \neq  p_{40}$**

In [37]:
effect_size = sms.proportion_effectsize(0.45, 0.47)    # Calculating effect size based on our expected rates

required_n = sms.NormalIndPower().solve_power(
    effect_size, 
    power=0.9, 
    alpha=0.05, 
    ratio=1
    )                                                  # Calculating sample size needed

required_n = ceil(required_n)                          # Rounding up to next whole number                          

print(required_n)

13049


We need at least 13,049 observations for each group<br>
The power is set to 90 percent and the expected rate is (45% for gate 30 and 47% for gate 40) meaning that to have a 90% chance of detecting if the difference is statistically significant we need the above minimum number of observations.

In [21]:
control_sample = df[df['version'] == 'gate_30'].sample(44000)
treatment_sample = df[df['version'] == 'gate_40'].sample(44000)

ab_test = pd.concat([control_sample, treatment_sample], axis=0)
ab_test.reset_index(drop=True, inplace=True)

In [22]:
ab_test

Unnamed: 0,userid,version,sum_gamerounds,retention_1,retention_7
0,6058009,gate_30,6,False,False
1,3129796,gate_30,2,False,False
2,6143201,gate_30,32,True,False
3,1247537,gate_30,10,False,False
4,8312326,gate_30,2,False,False
...,...,...,...,...,...
87995,7329616,gate_40,1,False,False
87996,5325946,gate_40,3,True,False
87997,4622531,gate_40,7,False,False
87998,3563811,gate_40,4,False,False


### Retention 1 : Players coming back to play the game after the first day.

In [23]:
control_results = ab_test[ab_test['version'] == 'gate_30']['retention_1']
treatment_results = ab_test[ab_test['version'] == 'gate_40']['retention_1']

In [38]:
n_con = control_results.count()
n_treat = treatment_results.count()
successes = [control_results.sum(), treatment_results.sum()]
nobs = [n_con, n_treat]

z_stat, pval = proportions_ztest(successes, nobs=nobs)
(lower_con, lower_treat), (upper_con, upper_treat) = proportion_confint(successes, nobs=nobs, alpha=0.05)

print(f'z statistic: {z_stat:.2f}')
print(f'p-value: {pval:.3f}')
print(f'ci 95% for control group: [{lower_con:.3f}, {upper_con:.3f}]')
print(f'ci 95% for treatment group: [{lower_treat:.3f}, {upper_treat:.3f}]')

z statistic: 1.89
p-value: 0.058
ci 95% for control group: [0.443, 0.453]
ci 95% for treatment group: [0.437, 0.446]


### Conclusion

With a p-value of 0.058 which is greater than the alpha(0.05) we cannot reject the null hypothesis. <br>
Therefore **$H_0$ : $p_{30} =  p_{40}$** meaning the two gates are equal in their ability to retain a players 1 day after installation. <br>
The confidence interval for the control group (44.3% - 45.3%) is slightly higher than that of the treatment group  (43.7% - 44.6%) meaning that the conversion rate of gate 30 is higher than that of gate 40.

### Retention 7 : Players coming back to play the game after the seventh day.

In [40]:
control_results_7 = ab_test[ab_test['version'] == 'gate_30']['retention_7']
treatment_results_7 = ab_test[ab_test['version'] == 'gate_40']['retention_7']

In [41]:
n_control = control_results_7.count()
n_treatment = treatment_results_7.count()
successes_7 = [control_results_7.sum(), treatment_results_7.sum()]
nobs_7 = [n_control, n_treatment]

z_stat_7, pval_7 = proportions_ztest(successes_7, nobs=nobs_7)
(lower_con_7, lower_treat_7), (upper_con_7, upper_treat_7) = proportion_confint(successes_7, nobs=nobs_7, alpha=0.05)

print(f'z statistic: {z_stat_7:.2f}')
print(f'p-value: {pval_7:.3f}')
print(f'ci 95% for control group: [{lower_con_7:.3f}, {upper_con_7:.3f}]')
print(f'ci 95% for treatment group: [{lower_treat_7:.3f}, {upper_treat_7:.3f}]')

z statistic: 2.96
p-value: 0.003
ci 95% for control group: [0.186, 0.194]
ci 95% for treatment group: [0.179, 0.186]


### Conclusion

With a p-value of 0.003 which is not greater than the alpha(0.05) we can reject the null hypothesis in favour of the alternative. <br>
Therefore **$H_1$ : $p_{30} \neq  p_{40}$** meaning the two gates are not equal in their ability to retain a players 7 days after installation. <br>
The confidence interval for the control group (18.6% - 19.4%) is significantly higher than that of the treatment group  (17.9% - 18.6%) meaning that the conversion rate of gate 30 is significantly higher than that of gate 40.