# Mobile Game Analysis

### An analysis of user progress data

Data can be found on Kaggle at [this link](https://www.kaggle.com/datasets/manchvictor/prediction-of-user-loss-in-mobile-games)

level_seq.csv - This is the core data file, which contains the user's record of playing each level. Each record is an attempt to play a level. The meaning of each column is as follows:
'user_id' : user id, which can be matched with those in training, verification, and test sets;
'level_id' : level id;
f_success ': indicates whether to complete the clearance (1: completes the clearance, 0: fails).
f_duration ': the duration of the attempt (unit: s);
f_reststep ': the ratio of the remaining steps to the limited steps (failure is 0);
f_help ': Whether extra help, such as props and hints, was used (1: used, 0: not used);
'time' : indicates the timestamp.

level_meta.csv- Some statistical characteristics of each level can be used to represent the level. The meaning of each column is as follows:
f_avg_duration ': Average time spent on each attempt (unit s, including successful and failed attempts);
'f_avg_passrate' : average clearance rate;
f_avg_win_duration ': Average time spent on each clearance (in s, including only the attempts to clear the clearance);
f_avg_retrytimes' : Average number of retries (the second time to play the same level counts as the first retry);
'level_id' : indicates the id of the level, which can be matched with the level in level_seq.csv.

In [1]:
#Import libraries
import pandas as pd
import glob
import os
import datetime as dt
import numpy as np
import matplotlib as mpl
from matplotlib import pyplot as plt
from matplotlib.ticker import (MultipleLocator, FormatStrFormatter, AutoMinorLocator)
import matplotlib.dates
import matplotlib.dates as mdates
import seaborn as sns

In [27]:
#Import csv files
path = "/Users/raws/Downloads/mobile_game_data"
csv_files = glob.glob(path + "/*.csv")

csv_files

['/Users/raws/Downloads/mobile_game_data/test.csv',
 '/Users/raws/Downloads/mobile_game_data/level_seq.csv',
 '/Users/raws/Downloads/mobile_game_data/dev.csv',
 '/Users/raws/Downloads/mobile_game_data/train.csv',
 '/Users/raws/Downloads/mobile_game_data/level_meta.csv']

In [28]:
#Append csvs to list using list comprehension
df_list = [pd.read_csv(filename, delimiter='\t', index_col=None, header=0) for filename in csv_files]
df_list

[      user_id
 0           1
 1           2
 2           3
 3           4
 4           5
 ...       ...
 2768     2769
 2769     2770
 2770     2771
 2771     2772
 2772     2773
 
 [2773 rows x 1 columns],
          user_id  level_id  f_success  f_duration  f_reststep  f_help  \
 0          10932         1          1       127.0    0.500000       0   
 1          10932         2          1        69.0    0.703704       0   
 2          10932         3          1        67.0    0.560000       0   
 3          10932         4          1        58.0    0.700000       0   
 4          10932         5          1        83.0    0.666667       0   
 ...          ...       ...        ...         ...         ...     ...   
 2194346    10931        40          1       111.0    0.250000       1   
 2194347    10931        41          1        76.0    0.277778       0   
 2194348    10931        42          0       121.0    0.000000       1   
 2194349    10931        42          0       115.0  

In [31]:
users = df_list[1]
meta = df_list[4]

In [32]:
users

Unnamed: 0,user_id,level_id,f_success,f_duration,f_reststep,f_help,time
0,10932,1,1,127.0,0.500000,0,2020-02-01 00:05:51
1,10932,2,1,69.0,0.703704,0,2020-02-01 00:08:01
2,10932,3,1,67.0,0.560000,0,2020-02-01 00:09:50
3,10932,4,1,58.0,0.700000,0,2020-02-01 00:11:16
4,10932,5,1,83.0,0.666667,0,2020-02-01 00:13:12
...,...,...,...,...,...,...,...
2194346,10931,40,1,111.0,0.250000,1,2020-02-03 16:26:37
2194347,10931,41,1,76.0,0.277778,0,2020-02-03 16:28:06
2194348,10931,42,0,121.0,0.000000,1,2020-02-03 16:30:17
2194349,10931,42,0,115.0,0.000000,0,2020-02-03 16:33:40


In [42]:
users.describe()

Unnamed: 0,user_id,level_id,f_success,f_duration,f_reststep,f_help
count,2194351.0,2194351.0,2194351.0,2194351.0,2194351.0,2194351.0
mean,6745.03,96.836,0.5283216,108.1228,0.1678471,0.04415565
std,3942.094,84.10689,0.4991974,53.61323,0.226146,0.2054409
min,1.0,1.0,0.0,1.0,0.0,0.0
25%,3287.0,41.0,0.0,77.0,0.0,0.0
50%,6688.0,80.0,1.0,100.0,0.04545455,0.0
75%,10163.0,142.0,1.0,127.0,0.2857143,0.0
max,13589.0,1509.0,1.0,600.0,1.0,1.0


In [39]:
users.duplicated().sum()

69322

In [45]:
dupes = users[users.duplicated()]
dupes

Unnamed: 0,user_id,level_id,f_success,f_duration,f_reststep,f_help,time
64,10932,50,0,153.0,0.000000,0,2020-02-01 17:05:43
65,10932,50,0,153.0,0.000000,0,2020-02-01 17:05:43
665,2774,42,1,130.0,0.045455,0,2020-02-01 10:56:34
667,2774,44,1,116.0,0.214286,0,2020-02-01 11:02:32
707,2774,62,1,89.0,0.428571,0,2020-02-02 09:10:50
...,...,...,...,...,...,...,...
2192592,10924,36,1,110.0,0.500000,0,2020-02-02 00:45:41
2193091,13586,104,0,155.0,0.000000,0,2020-02-03 21:13:35
2193569,10927,137,1,73.0,0.285714,0,2020-02-02 22:36:23
2193666,10927,207,0,166.0,0.000000,0,2020-02-04 20:59:36


<div class="alert alert-warning">
  <strong>Summary of Findings</strong>
    <li>No null values or duplicates.</li>
    <li>event_timestamp needs to be converted to datetime data type.</li>
</div>