# Mobile Games A/B Test 

Cookie Cats is an extremely popular mobile puzzle game developed by Tactile Entertainment. This is a classic connect 3 puzzle game where players must connect blocks of the same color to clear the board and win the level.
 As players progress through game levels, they occasionally encounter levels that force them to wait a considerable amount of time or make in-app purchases to proceed. In addition to encouraging in-app purchases, these portals serve an important purpose of forcing players to stop playing the game, thereby hopefully increasing and prolonging the player's enjoyment of the game.
 But where should the target be placed? Originally the first gate was at level 30, but in this notebook we will analyze an AB test where we move the first gate in Cookie Cats from level 30 to level 40. In particular, we will examine the impact of player retention. But before we start, an important step before any analysis is understanding the data. So let's load it up and take a look!

### 1. Importing libraries and reading the data

In [30]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.stats.proportion as proportion
import statsmodels.stats.power as smp

In [8]:
df = pd.read_csv(r'C:\Users\SilkRIT\Desktop\разное\ab_test\cookie_cats.csv')

In [10]:
df.head()

Unnamed: 0,userid,version,sum_gamerounds,retention_1,retention_7
0,116,gate_30,3,False,False
1,337,gate_30,38,True,False
2,377,gate_40,165,True,False
3,483,gate_40,1,False,False
4,488,gate_40,179,True,True


The data we have contains information about 90,189 players that installed the game while the AB-test was running. 
The variables are:
* userid - a unique number that identifies each player.
* version - whether the player was put in the control group (gate_30 - a gate at level 30) or the group with the moved gate (gate_40 - a gate at level 40).
* sum_gamerounds - the number of game rounds played by the player during the first 14 days after install.
* retention_1 - did the player come back and play 1 day after installing?
* retention_7 - did the player come back and play 7 days after installing?

In [13]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 90189 entries, 0 to 90188
Data columns (total 5 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   userid          90189 non-null  int64 
 1   version         90189 non-null  object
 2   sum_gamerounds  90189 non-null  int64 
 3   retention_1     90189 non-null  bool  
 4   retention_7     90189 non-null  bool  
dtypes: bool(2), int64(2), object(1)
memory usage: 2.2+ MB


Let's convert values in columns *retention_1* and *retention_7* to int type. It will help us to make calculations.

In [18]:
df[['retention_1','retention_7']] = df[['retention_1','retention_7']].astype(int)

In [19]:
df.head()

Unnamed: 0,userid,version,sum_gamerounds,retention_1,retention_7
0,116,gate_30,3,0,0
1,337,gate_30,38,1,0
2,377,gate_40,165,1,0
3,483,gate_40,1,0,0
4,488,gate_40,179,1,1


### 2. A/B testing

Next step is to calculate target metric which is *retention_rate* for both control (gate_30) and test (group_40) groups.
We may see that two groups have roughly the same value of users number which is good news for us. 

In [22]:
metrics = df.groupby('version', as_index = False).agg({'userid':'count','retention_1':'sum','retention_7':'sum'})
metrics['retention_rate_1'] = round(metrics['retention_1']/metrics['userid']*100,2)
metrics['retention_rate_7'] = round(metrics['retention_7']/metrics['userid']*100,2)
metrics

Unnamed: 0,version,userid,retention_1,retention_7,retention_rate_1,retention_rate_7
0,gate_30,44700,20034,8502,44.82,19.02
1,gate_40,45489,20119,8279,44.23,18.2


We see that for both 1st and 7th day retention rate decreases in case of moving gate to 40th level.
But to prove this we need to check p-value.
As we have binomial target values we may use Chitest to calculate p-value.

In [41]:
chi2stat, pval, table = proportion.proportions_chisquare(metrics['retention_1'], metrics['userid'])

Significance level is 0.05

In [26]:
alpha = 0.05

In [42]:
print(pval < alpha)

False


In [45]:
print("p-value for retention_1 is", round(pval,3))

p-value for retention_1 is 0.074


P-value is greater than the significance level, so we cannot reject the null hypothesis.
It means that the difference could be random.

In [46]:
chi2stat, pval, table = proportion.proportions_chisquare(metrics['retention_7'], metrics['userid'])

In [47]:
print(pval < alpha)

True


In [49]:
print("p-value for retention_7 is", round(pval,3))

p-value for retention_7 is 0.002


P-value is lower than the significance level, so we can reject the null hypothesis and rely on hypothesis that moving gate to 40 level leads to decreasing 7th day retention rate.
To fix the hypothesis we will find out test's power.
Test power - the probability of correctly rejecting the null hypothesis if it is false.

In [53]:
chipower = smp.GofChisquarePower()
retention_7_30 = metrics['retention_rate_7'].values[0]/100
retention_7_40 = metrics['retention_rate_7'].values[1]/100
nobs = min(metrics['userid'])

Here I define effect size function for GofChisquarePower function.

In [51]:
def chi2_effect_size(p0,p1):
    return np.sqrt((p0 - p1)**2 / p0)

In [52]:
chipower.solve_power(effect_size = chi2_effect_size(retention_7_30, retention_7_40),
                    nobs = nobs,
                    alpha = pval,
                    power = None)

0.7912804005639859

In our case test's power is 79% which is high value. It means that the probability of not seeing statistically significant differences, where they are, is 21% .
So we may rely on the results of A/B test.

### 3. Conclusion

The Chitest result tells us that there is strong evidence that 7-day retention is higher when the gate is at level 30 than when it is at level 40. The conclusion is: If we want to keep high 7-day retention — we should not move the gate from level 30 to level 40.