## A/B Test Analysis

We need to analyze the A/B test data from the popular game Cookie Cats. This is a classic “match-three” puzzle game where players must connect tiles of the same color to clear the board and win a level. The board also features singing cats :)

During gameplay, players encounter gates that require them to wait for a certain time before progressing or making an in-app purchase. In this task, we will analyze the results of an A/B test where the first gate in Cookie Cats was moved from level 30 to level 40. Specifically, we aim to assess the impact on player retention. In other words, we want to understand whether moving the gate 10 levels later affects when players stop playing the game, measured by the number of days since they installed it.

We will work with data from the cookie_cats.csv file. The variables in the dataset are as follows:
	•	userid: A unique identifier for each player.
	•	version: Indicates whether the player was in the control group (gate_30 - gate at level 30) or the test group (gate_40 - gate at level 40).
	•	sum_gamerounds: The number of game rounds played by the player during the first week after installation.
	•	retention_1: Whether the player returned to play the game one day after installation.
	•	retention_7: Whether the player returned to play the game seven days after installation.

When a player installed the game, they were randomly assigned to either the gate_30 or gate_40 group.

<span style="font-size: 16px; font-weight: bold;">1. 	We load the A/B test data into the variable df and output the average value of the retention_7 metric (7-day retention) by game version.</span>

In [1]:
import pandas as pd

In [3]:
df = pd.read_csv('../Data/data_statistics/cookie_cats.csv')

In [5]:
df.head()

Unnamed: 0,userid,version,sum_gamerounds,retention_1,retention_7
0,116,gate_30,3,False,False
1,337,gate_30,38,True,False
2,377,gate_40,165,True,False
3,483,gate_40,1,False,False
4,488,gate_40,179,True,True


In [7]:
session_counts = df['userid'].value_counts(ascending=False)
multi_users = session_counts[session_counts > 1].count()

print(f'It is {multi_users} users who appear multiple times in the dataset..')

It is 0 users who appear multiple times in the dataset..


In [9]:
average_retention_7 = df.groupby('version')['retention_7'].mean()

In [11]:
print(average_retention_7)

version
gate_30    0.190201
gate_40    0.182000
Name: retention_7, dtype: float64


<span style="color: #388E3C; font-size: 16px; font-weight: bold;">gate_30 has an average retention rate of 0.190201 (or approximately 19%). This means that 19% of users who played the game version with the gate_30 parameter returned to the game 7 days after installation. gate_40 has an average retention rate of 0.182000 (or approximately 18%). This means that 18% of users who played the game version with the gate_40 parameter returned to the game 7 days after installation. Hypothesis: The gate_30 version provides a 1% better retention rate 7 days after installation.</span>

<span style="font-size: 16px; font-weight: bold;">2. We check using a z-test whether one version of the game provides a better retention_7 rate at a significance level of 0.05. We also calculate confidence intervals for the two samples.</span>

In [13]:
import numpy as np
import statsmodels.stats.api as sms
from statsmodels.stats.proportion import proportions_ztest, proportion_confint

In [15]:
df_gate_30 = df[df['version'] == 'gate_30']
df_gate_40 = df[df['version'] == 'gate_40']

n_gate_30 = df_gate_30['retention_7'].count()
n_gate_40 = df_gate_40['retention_7'].count()

successes_gate_30 = df_gate_30['retention_7'].sum()
successes_gate_40 = df_gate_40['retention_7'].sum()

successes = [successes_gate_30, successes_gate_40]
nobs = [n_gate_30, n_gate_40]

z_stat, pval = proportions_ztest(successes, nobs=nobs)

(lower_gate_30, upper_gate_30), (lower_gate_40, upper_gate_40) = proportion_confint(successes, nobs, alpha=0.05)

if lower_gate_30 > upper_gate_30:
    lower_gate_30, upper_gate_30 = upper_gate_30, lower_gate_30

if lower_gate_40 > upper_gate_40:
    lower_gate_40, upper_gate_40 = upper_gate_40, lower_gate_40

print(f'z statistic: {z_stat:.2f}')
print(f'p-value: {pval:.3f}')
print(f'The 95% confidence interval for the gate_30 group is: [{lower_gate_30:.3f}, {upper_gate_30:.3f}]')
print(f'The 95% confidence interval for the gate_40 group is:: [{lower_gate_40:.3f}, {upper_gate_40:.3f}]')

z statistic: 3.16
p-value: 0.002
The 95% confidence interval for the gate_30 group is: [0.178, 0.187]
The 95% confidence interval for the gate_40 group is:: [0.186, 0.194]


<span style="color: #388E3C; font-size: 16px; font-weight: bold;">The p-value (0.002) is significantly less than 0.05. This means we can reject the null hypothesis (which states that there is no difference in user retention between the two groups). Yes, there is a statistically significant difference in user behavior between the two versions of the game. The two confidence intervals partially overlap, as the lower bound of the gate_40 group (0.186) is only slightly above the upper bound of the gate_30 group (0.187). This suggests that, while there is some probability that the average retention rates in these groups might be similar, the intervals overall indicate that the gate_40 group has a higher retention rate. These findings further support our conclusion that we can reject H₀.</span>

<span style="font-size: 16px; font-weight: bold;">3. Chi-square test.</span>

In [17]:
from scipy.stats import chi2_contingency

In [19]:
contingency_table = pd.crosstab(df['version'], df['retention_7'])

chi2_stat, p_value, dof, expected = chi2_contingency(contingency_table)

print(f'p-значення: {p_value:.3f}')

alpha = 0.05 
if p_value < alpha:
    print("We reject the null hypothesis: there is a dependency between the game version and the retention_7 value.")
else:
    print("We fail to reject the null hypothesis: the retention_7 value does not depend on the game version.")

p-значення: 0.002
We reject the null hypothesis: there is a dependency between the game version and the retention_7 value.


<span style="color: #388E3C; font-size: 16px; font-weight: bold;">Rejecting the null hypothesis indicates that there is a statistically significant relationship between the game version and user retention after 7 days. Thus, this test further supports the validity of our previous hypothesis, confirming that we reject the null hypothesis. There are sufficiently significant results indicating that the alternative game version (gate_40) retains players better.</span>