## About the data set.

Cookie Cats is a hugely popular mobile puzzle game developed by Tactile Entertainment. It's a classic "Match_three" style puzzle game where the player must connect tiles of the same color in order to clear the board and win the level. Featuring engaging levels, charming characters, and vibrant visuals, the game focuses on casual, fun gameplay designed to retain players through delightful progression and social features.

As players progress through the game they will encounter gates that force them to wait some time before they can progress or make an in-app purchase. In this project, we will analyze the result of an A/B test where the first gate in Cookie Cats was moved from level 30 to level 40. In particular, we will analyze the impact on player retention and game rounds.

### How does the data look

The data is from 90,189 players that installed the game while the AB-test was running. The variables are:

1. userid - a unique number that identifies each player.<br>
2. version - whether the player was put in the control group (gate_30 - a gate at level 30) or the test group (gate_40 - a gate at level 40).<br>
3. sum_gamerounds - the number of game rounds played by the player during the first week after installation<br>
4. retention_1 - did the player come back and play 1 day after installing?<br>
5. retention_7 - did the player come back and play 7 days after installing?<br>

When a player installed the game, they are randomly assigned to either gate_30 or gate_40.

## Importing the data set and basic data exploration


In [1]:
import pandas as pd

file_path = r"C:\Users\Lachu\OneDrive\Documents\Visual Studio 2019\projecct_draft\cookie_cats - cookie_cats.csv.csv"
data = pd.read_csv(file_path)

# Display the first few rows and structure of the data
print(data.head())
print(data.columns)


   userid  version  sum_gamerounds  retention_1  retention_7
0     116  gate_30               3        False        False
1     337  gate_30              38         True        False
2     377  gate_40             165         True        False
3     483  gate_40               1        False        False
4     488  gate_40             179         True         True
Index(['userid', 'version', 'sum_gamerounds', 'retention_1', 'retention_7'], dtype='object')


In [2]:
#checking Dataframe basic informations (columns names, number of values, data types ......)

data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 90189 entries, 0 to 90188
Data columns (total 5 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   userid          90189 non-null  int64 
 1   version         90189 non-null  object
 2   sum_gamerounds  90189 non-null  int64 
 3   retention_1     90189 non-null  bool  
 4   retention_7     90189 non-null  bool  
dtypes: bool(2), int64(2), object(1)
memory usage: 2.2+ MB


In [3]:
#checking Dataframe shape (number of rows and columns)

data.shape

(90189, 5)

In [6]:
#Describing numerical values in the data set
data.describe()

Unnamed: 0,userid,sum_gamerounds
count,90189.0,90189.0
mean,4998412.0,51.872457
std,2883286.0,195.050858
min,116.0,0.0
25%,2512230.0,5.0
50%,4995815.0,16.0
75%,7496452.0,51.0
max,9999861.0,49854.0


### Cleaning Data

In [4]:
# checking for NaN values patients

data.isnull().sum()

userid            0
version           0
sum_gamerounds    0
retention_1       0
retention_7       0
dtype: int64

## Exploratory Data Analysis

#### Understanding the player distribution 
Here we are finding out more about our data through exploratory data analysis.<br>
1. Finding out the total numer of players in the game.<br>
2. Finding out number of players for each version of the game(for version_30 where gate placed on level 30 and version_40 when gate is moved to level 40)<br>
3. Finding out the number of players who never played the game(sum_gamerounds=0)<br>
4. Finding out the number of players who never came back after day 1/ day 7<br>



In [8]:
#Total number of players for both versions
number_of_players =data['userid'].count()
print(number_of_players)

90189


In [17]:
#Number of players for each version

player_in_version = data.groupby('version')['userid'].size()
print(player_in_version)


version
gate_30    44700
gate_40    45489
Name: userid, dtype: int64


In [None]:
#players who never played any rounds of the game after installing it.

never_played = data[data['sum_gamerounds']==0].count()
print(never_played)

userid            3994
version           3994
sum_gamerounds    3994
retention_1       3994
retention_7       3994
dtype: int64
