## Analyze A/B Test Results



## Table of Contents
- [Introduction](#intro)
- [Part I - Probability](#probability)
- [Part II - A/B Test](#ab_test)
- [Part III - Regression](#regression)


<a id='intro'></a>
### Introduction

For this project, I will be working to understand the results of an A/B test run by Tactile entertainment on a game called Cookie Cats to examine what happens when the first gate in the game was moved from level 30 to level 40. When a player installed the game, he or she was randomly assigned to either gate30 or gate40. My goal is to help the company understand if they should implement the use gate30, or gate40, or perhaps run the experiment longer to make their decision.

The dataset was gotten from kaggle and contains the AB test for 90,189 players that installed the game while the AB-test was running. The variables are:

- userid: A unique number that identifies each player.
- version: Whether the player was put in the control group (gate30 - a gate at level 30) or the group with the moved gate (gate40 - a gate at level 40).
- sumgamerounds: the number of game rounds played by the player during the first 14 days after install.
- retention1: Did the player come back and play 1 day after installing?
- retention_7: Did the player come back and play 7 days after installing?

When a player installed the game, he or she was randomly assigned to either.

For the purposes of this analysis gate30 will represent the control group and gate40 will be the treatment group,

<a id='probability'></a>
#### Part I - Probability

Importing libraries.

In [23]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm


Reading the dataset to a pandas dataframe

In [2]:
df = pd.read_csv('cookie_cats.csv')

In [22]:
df.head()

Unnamed: 0,userid,version,sum_gamerounds,retention_1,retention_7
0,116,gate_30,3,False,False
1,337,gate_30,38,True,False
2,377,gate_40,165,True,False
3,483,gate_40,1,False,False
4,488,gate_40,179,True,True


Unique users

In [7]:
df['userid'].nunique()

90189

Distribution of the users 

In [13]:
df.groupby(by='version').count()['userid']

version
gate_30    44700
gate_40    45489
Name: userid, dtype: int64

Percentage of users that returned after 1 day

In [19]:
sum(df['retention_1'])/ df.shape[0]

0.4452095044850259

Percentage of users that returned after 7 days

In [20]:
sum(df['retention_7'])/ df.shape[0]

0.1860648194347426

Probability of a user given gate30 returns after 1 day

In [28]:
control = df[df['version'] == 'gate_30']
sum(control['retention_1'])/ control.shape[0]

0.4481879194630872

Probability of a user given gate30 returns after 7 days

In [29]:
sum(control['retention_7'])/ control.shape[0]

0.19020134228187918

Probability of a user given gate40 returns after 1 day

In [33]:
treatment = df[df['version'] == 'gate_40']
sum(treatment['retention_1'])/ treatment.shape[0]

0.44228274967574577

Probability of a user given gate40 returns after 7 days

In [34]:
sum(treatment['retention_7'])/ treatment.shape[0]

0.18200004396667327

**From the preliminary analysis there isnt sufficient evidence to suggest that the treatment group leads to more retention**

<a id='ab_test'></a>
#### A/B test