# Fugo Games - Profit Prediction

## 1) Business Problem Understanding

<p>In this business problem, the objective is to estimate the amount that the company will earn at the end of 90 days based on the data of the users who played the game in the first 15 days.</p>

## 2) Data Understanding
- ID: Unique ID for every installation of the game (User ID)

- first_open_date: Date of the first launch of the game

- first_open_timestamp: Timestamp of the first launch of the game, in UTC timezone, Unix time in microseconds

- local_first_open_timestamp: First open timestamp in local timezone of the user

- country: Country of the user

- platform: Platform of the user; Android or iOS

- device_category: Category of the device; mobile or tablet

- device_brand: Brand of the device

- device_model: Model of the device

- has_ios_att_permission: Whether the iOS user has given ATT permission (true or false), false for Android users

- ad_network: Ad network the user has come from, null for organic users

- first_prediction: Initial predicted value of the user (in USD)

- RetentionD{i}: Whether the user launched the game at i’th day (true or false)

- LevelAdvancedCountD{i}: Number of levels the user completed at i’th day

- Level_{i}_Duration: The time it takes for the user to complete i’th level (null if the user hasn’t completed i’th level)

- AdRevenueD{i}: Amount of ad revenue (in USD) the user generated at i’th day

- IAPRevenueD{i}: Amount of IAP (in-app purchase) revenue the user generated at i’th day

- TARGET: Total amount of revenue the user generated in their first 90 days, this is the target value that you should predict

### 2.1) Import of necessary libraries and datasets

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split, KFold, cross_val_score

from sklearn.preprocessing import OrdinalEncoder, OneHotEncoder, LabelEncoder, RobustScaler, MinMaxScaler

from sklearn.linear_model import LinearRegression, Lasso, Ridge

import warnings
warnings.filterwarnings("ignore")
pd.set_option("display.max_columns", None)

In [2]:
user_feature_train = pd.read_csv("user_features_train.csv")
user_train = pd.read_csv("users_train.csv")
targets_train = pd.read_csv("targets_train.csv")

user_feature_test = pd.read_csv("user_features_test.csv")
user_test = pd.read_csv("users_test.csv")

submission = pd.read_csv("sample_submission.csv")

### 2.2) Combining related datasets

In [3]:
# Train veri setlerinin birleştirilmesi
user_train = user_train.merge(user_feature_train, on="ID")
train = user_train.merge(targets_train, on="ID")

train.head()

Unnamed: 0,ID,first_open_date,first_open_timestamp,local_first_open_timestamp,country,platform,device_category,device_brand,device_model,has_ios_att_permission,ad_network,first_prediction,RetentionD0,RetentionD1,RetentionD2,RetentionD3,RetentionD4,RetentionD5,RetentionD6,RetentionD7,RetentionD8,RetentionD9,RetentionD10,RetentionD11,RetentionD12,RetentionD13,RetentionD14,RetentionD15,LevelAdvancedCountD0,LevelAdvancedCountD1,LevelAdvancedCountD2,LevelAdvancedCountD3,LevelAdvancedCountD4,LevelAdvancedCountD5,LevelAdvancedCountD6,LevelAdvancedCountD7,LevelAdvancedCountD8,LevelAdvancedCountD9,LevelAdvancedCountD10,LevelAdvancedCountD11,LevelAdvancedCountD12,LevelAdvancedCountD13,LevelAdvancedCountD14,LevelAdvancedCountD15,Level_1_Duration,Level_2_Duration,Level_3_Duration,Level_4_Duration,Level_5_Duration,Level_6_Duration,Level_7_Duration,Level_8_Duration,Level_9_Duration,Level_10_Duration,AdRevenueD0,AdRevenueD1,AdRevenueD2,AdRevenueD3,AdRevenueD4,AdRevenueD5,AdRevenueD6,AdRevenueD7,AdRevenueD8,AdRevenueD9,AdRevenueD10,AdRevenueD11,AdRevenueD12,AdRevenueD13,AdRevenueD14,AdRevenueD15,IAPRevenueD0,IAPRevenueD1,IAPRevenueD2,IAPRevenueD3,IAPRevenueD4,IAPRevenueD5,IAPRevenueD6,IAPRevenueD7,IAPRevenueD8,IAPRevenueD9,IAPRevenueD10,IAPRevenueD11,IAPRevenueD12,IAPRevenueD13,IAPRevenueD14,IAPRevenueD15,TARGET
0,0,2024-03-02,1709355895042000,1709334295042000,Mexico,Android,mobile,Xiaomi,Redmi A2,False,unityads_int,3.314099,True,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,26.0,69.0,36.0,,,,,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,1,2024-03-19,1710824539731000,1710806539731000,Peru,Android,mobile,Samsung,Galaxy A13,False,applovin_int,1.681524,True,False,False,False,False,False,True,True,False,True,False,False,False,False,False,False,5,0,0,0,0,0,3,5,0,2,0,0,0,0,0,0,13.0,91.0,39.0,79.0,180.0,89.0,124.0,118.0,35.0,117.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.008674,0.0,0.010218,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018892
2,2,2024-03-18,1710731043082000,1710720243082000,Brazil,Android,mobile,Xiaomi,Redmi 12,False,applovin_int,10.71875,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,8.0,63.0,86.0,,,,,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,3,2024-03-03,1709455862260000,1709441462260000,Dominican Republic,iOS,mobile,Apple,iPhone 11 Pro Max,False,,5.1,True,True,True,False,False,False,False,False,False,False,False,False,False,False,False,False,19,8,9,0,0,0,0,0,0,0,0,0,0,0,0,0,23.0,141.0,131.0,118.0,77.0,107.0,77.0,182.0,42.0,156.0,0.00215,0.019159,0.025341,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04665
4,4,2024-04-30,1714482477190000,1714464477190000,Ecuador,Android,mobile,Motorola,Moto E22,False,applovin_int,2.091409,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,15,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,8.0,52.0,30.0,84.0,139.0,96.0,268.0,97.0,44.0,122.0,0.01468,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01468


In [4]:
user_feature_test = pd.read_csv("user_features_test.csv")
user_test = pd.read_csv("users_test.csv")

test = user_test.merge(user_feature_test, on="ID")
test.head()

Unnamed: 0,ID,first_open_date,first_open_timestamp,local_first_open_timestamp,country,platform,device_category,device_brand,device_model,has_ios_att_permission,ad_network,first_prediction,RetentionD0,RetentionD1,RetentionD2,RetentionD3,RetentionD4,RetentionD5,RetentionD6,RetentionD7,RetentionD8,RetentionD9,RetentionD10,RetentionD11,RetentionD12,RetentionD13,RetentionD14,RetentionD15,LevelAdvancedCountD0,LevelAdvancedCountD1,LevelAdvancedCountD2,LevelAdvancedCountD3,LevelAdvancedCountD4,LevelAdvancedCountD5,LevelAdvancedCountD6,LevelAdvancedCountD7,LevelAdvancedCountD8,LevelAdvancedCountD9,LevelAdvancedCountD10,LevelAdvancedCountD11,LevelAdvancedCountD12,LevelAdvancedCountD13,LevelAdvancedCountD14,LevelAdvancedCountD15,Level_1_Duration,Level_2_Duration,Level_3_Duration,Level_4_Duration,Level_5_Duration,Level_6_Duration,Level_7_Duration,Level_8_Duration,Level_9_Duration,Level_10_Duration,AdRevenueD0,AdRevenueD1,AdRevenueD2,AdRevenueD3,AdRevenueD4,AdRevenueD5,AdRevenueD6,AdRevenueD7,AdRevenueD8,AdRevenueD9,AdRevenueD10,AdRevenueD11,AdRevenueD12,AdRevenueD13,AdRevenueD14,AdRevenueD15,IAPRevenueD0,IAPRevenueD1,IAPRevenueD2,IAPRevenueD3,IAPRevenueD4,IAPRevenueD5,IAPRevenueD6,IAPRevenueD7,IAPRevenueD8,IAPRevenueD9,IAPRevenueD10,IAPRevenueD11,IAPRevenueD12,IAPRevenueD13,IAPRevenueD14,IAPRevenueD15
0,878594,2024-05-12,1715478163668000,1715467363668000,Argentina,Android,mobile,Motorola,Moto G32,False,applovin_int,1.444805,True,True,True,True,True,False,False,False,False,False,False,False,False,False,False,False,10,5,1,0,0,0,0,0,0,0,0,0,0,0,0,0,16.0,147.0,81.0,68.0,265.0,184.0,142.0,133.0,59.0,157.0,0.001595,0.009382,0.00174,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,878595,2024-01-26,1706254855890000,1706233255890000,Mexico,Android,mobile,OnePlus,Nord N20 SE,False,applovin_int,9.147972,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,17.0,209.0,84.0,,,,,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,878596,2024-02-13,1707778260263000,1707781860263000,France,Android,mobile,Motorola,moto g13,False,applovin_int,40.731158,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,19.0,73.0,130.0,,,,,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,878597,2024-02-21,1708530744695000,1708519944695000,Brazil,Android,mobile,Samsung,Galaxy A03,False,applovin_int,4.967959,True,False,False,True,False,True,True,False,True,True,False,False,False,False,False,False,2,0,0,1,0,4,13,0,2,0,0,0,0,0,0,0,66.0,896.0,562.0,840.0,412.0,1001.0,530.0,536.0,85.0,562.0,0.0,0.0,0.0,0.0,0.0,0.0,0.156159,0.0,0.112458,0.000451,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,878598,2024-05-03,1714706093405000,1714688093405000,Peru,Android,mobile,Xiaomi,Redmi 13C,False,applovin_int,2.445842,True,False,True,True,True,True,False,False,False,False,False,False,False,False,False,False,21,0,5,18,26,9,0,0,0,0,0,0,0,0,0,0,32.0,99.0,38.0,46.0,175.0,48.0,72.0,59.0,37.0,69.0,0.0,0.0,0.004202,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### 2.3) Overview of the Train dataset

#### For Train dataset

In [5]:
print(f"Train Veri Setinin Boyut Bilgisi: {train.shape}")

Train Veri Setinin Boyut Bilgisi: (878594, 87)


In [6]:
train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 878594 entries, 0 to 878593
Data columns (total 87 columns):
 #   Column                      Non-Null Count   Dtype  
---  ------                      --------------   -----  
 0   ID                          878594 non-null  int64  
 1   first_open_date             878594 non-null  object 
 2   first_open_timestamp        878594 non-null  int64  
 3   local_first_open_timestamp  878594 non-null  int64  
 4   country                     878512 non-null  object 
 5   platform                    878594 non-null  object 
 6   device_category             878594 non-null  object 
 7   device_brand                872754 non-null  object 
 8   device_model                878594 non-null  object 
 9   has_ios_att_permission      878594 non-null  bool   
 10  ad_network                  568124 non-null  object 
 11  first_prediction            852859 non-null  float64
 12  RetentionD0                 878594 non-null  bool   
 13  RetentionD1   

In [7]:
train.head(3)

Unnamed: 0,ID,first_open_date,first_open_timestamp,local_first_open_timestamp,country,platform,device_category,device_brand,device_model,has_ios_att_permission,ad_network,first_prediction,RetentionD0,RetentionD1,RetentionD2,RetentionD3,RetentionD4,RetentionD5,RetentionD6,RetentionD7,RetentionD8,RetentionD9,RetentionD10,RetentionD11,RetentionD12,RetentionD13,RetentionD14,RetentionD15,LevelAdvancedCountD0,LevelAdvancedCountD1,LevelAdvancedCountD2,LevelAdvancedCountD3,LevelAdvancedCountD4,LevelAdvancedCountD5,LevelAdvancedCountD6,LevelAdvancedCountD7,LevelAdvancedCountD8,LevelAdvancedCountD9,LevelAdvancedCountD10,LevelAdvancedCountD11,LevelAdvancedCountD12,LevelAdvancedCountD13,LevelAdvancedCountD14,LevelAdvancedCountD15,Level_1_Duration,Level_2_Duration,Level_3_Duration,Level_4_Duration,Level_5_Duration,Level_6_Duration,Level_7_Duration,Level_8_Duration,Level_9_Duration,Level_10_Duration,AdRevenueD0,AdRevenueD1,AdRevenueD2,AdRevenueD3,AdRevenueD4,AdRevenueD5,AdRevenueD6,AdRevenueD7,AdRevenueD8,AdRevenueD9,AdRevenueD10,AdRevenueD11,AdRevenueD12,AdRevenueD13,AdRevenueD14,AdRevenueD15,IAPRevenueD0,IAPRevenueD1,IAPRevenueD2,IAPRevenueD3,IAPRevenueD4,IAPRevenueD5,IAPRevenueD6,IAPRevenueD7,IAPRevenueD8,IAPRevenueD9,IAPRevenueD10,IAPRevenueD11,IAPRevenueD12,IAPRevenueD13,IAPRevenueD14,IAPRevenueD15,TARGET
0,0,2024-03-02,1709355895042000,1709334295042000,Mexico,Android,mobile,Xiaomi,Redmi A2,False,unityads_int,3.314099,True,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,26.0,69.0,36.0,,,,,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,1,2024-03-19,1710824539731000,1710806539731000,Peru,Android,mobile,Samsung,Galaxy A13,False,applovin_int,1.681524,True,False,False,False,False,False,True,True,False,True,False,False,False,False,False,False,5,0,0,0,0,0,3,5,0,2,0,0,0,0,0,0,13.0,91.0,39.0,79.0,180.0,89.0,124.0,118.0,35.0,117.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.008674,0.0,0.010218,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018892
2,2,2024-03-18,1710731043082000,1710720243082000,Brazil,Android,mobile,Xiaomi,Redmi 12,False,applovin_int,10.71875,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,8.0,63.0,86.0,,,,,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [8]:
train.tail(3)

Unnamed: 0,ID,first_open_date,first_open_timestamp,local_first_open_timestamp,country,platform,device_category,device_brand,device_model,has_ios_att_permission,ad_network,first_prediction,RetentionD0,RetentionD1,RetentionD2,RetentionD3,RetentionD4,RetentionD5,RetentionD6,RetentionD7,RetentionD8,RetentionD9,RetentionD10,RetentionD11,RetentionD12,RetentionD13,RetentionD14,RetentionD15,LevelAdvancedCountD0,LevelAdvancedCountD1,LevelAdvancedCountD2,LevelAdvancedCountD3,LevelAdvancedCountD4,LevelAdvancedCountD5,LevelAdvancedCountD6,LevelAdvancedCountD7,LevelAdvancedCountD8,LevelAdvancedCountD9,LevelAdvancedCountD10,LevelAdvancedCountD11,LevelAdvancedCountD12,LevelAdvancedCountD13,LevelAdvancedCountD14,LevelAdvancedCountD15,Level_1_Duration,Level_2_Duration,Level_3_Duration,Level_4_Duration,Level_5_Duration,Level_6_Duration,Level_7_Duration,Level_8_Duration,Level_9_Duration,Level_10_Duration,AdRevenueD0,AdRevenueD1,AdRevenueD2,AdRevenueD3,AdRevenueD4,AdRevenueD5,AdRevenueD6,AdRevenueD7,AdRevenueD8,AdRevenueD9,AdRevenueD10,AdRevenueD11,AdRevenueD12,AdRevenueD13,AdRevenueD14,AdRevenueD15,IAPRevenueD0,IAPRevenueD1,IAPRevenueD2,IAPRevenueD3,IAPRevenueD4,IAPRevenueD5,IAPRevenueD6,IAPRevenueD7,IAPRevenueD8,IAPRevenueD9,IAPRevenueD10,IAPRevenueD11,IAPRevenueD12,IAPRevenueD13,IAPRevenueD14,IAPRevenueD15,TARGET
878591,878591,2024-03-13,1710283704357000,1710290904357000,Finland,Android,mobile,Motorola,moto g51 5G,False,applovin_int,11.138877,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,19,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,15.0,27.0,45.0,130.0,164.0,151.0,135.0,48.0,37.0,160.0,0.038446,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038446
878592,878592,2024-04-20,1713633879783000,1713641079783000,Italy,iOS,mobile,Apple,iPhone 14 Pro Max,False,,10.006476,True,True,False,True,True,False,False,True,True,False,False,True,True,True,False,True,3,6,0,1,21,0,0,3,22,0,0,1,6,20,0,2,10.0,142.0,215.0,222.0,70.0,150.0,120.0,72.0,327.0,56.0,0.0,0.024268,0.0,0.008702,0.102881,0.0,0.0,0.021853,0.038574,0.0,0.0,0.003894,0.024609,0.045459,0.0,0.007496,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.022793
878593,878593,2024-02-03,1706918564741000,1706896964741000,Mexico,Android,mobile,Honor,X8,False,applovin_int,4.807477,True,False,False,False,True,False,True,False,False,False,False,False,False,False,False,False,3,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,26.0,205.0,309.0,726.0,,,,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [9]:
train.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
ID,878594.0,4.392965e+05,2.536284e+05,0.000000e+00,2.196482e+05,4.392965e+05,6.589448e+05,8.785930e+05
first_open_timestamp,878594.0,1.710306e+15,3.554136e+12,1.704056e+15,1.707209e+15,1.710289e+15,1.713503e+15,1.716066e+15
local_first_open_timestamp,878594.0,1.710299e+15,3.553598e+12,1.704028e+15,1.707209e+15,1.710280e+15,1.713488e+15,1.716107e+15
first_prediction,852859.0,3.761236e+01,8.779914e+01,1.000166e-04,3.932556e+00,1.141456e+01,3.587645e+01,4.944477e+03
LevelAdvancedCountD0,878594.0,1.112697e+01,1.365224e+01,0.000000e+00,3.000000e+00,7.000000e+00,1.400000e+01,7.810000e+02
...,...,...,...,...,...,...,...,...
IAPRevenueD12,878594.0,3.623630e-05,9.222650e-03,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,3.250000e+00
IAPRevenueD13,878594.0,4.660116e-05,1.021374e-02,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,3.250000e+00
IAPRevenueD14,878594.0,5.028546e-05,1.091650e-02,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,3.250000e+00
IAPRevenueD15,878594.0,6.584384e-05,1.419456e-02,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,5.200000e+00


In [10]:
train_null_cols = [col for col in train.columns if train[col].isnull().sum() > 0]

train[train_null_cols].isnull().sum()

country                  82
device_brand           5840
ad_network           310470
first_prediction      25735
Level_1_Duration       6100
Level_2_Duration      50406
Level_3_Duration      87940
Level_4_Duration     149897
Level_5_Duration     191762
Level_6_Duration     229227
Level_7_Duration     268830
Level_8_Duration     301236
Level_9_Duration     318661
Level_10_Duration    355989
dtype: int64

In [11]:
train.duplicated().sum()

0

#### For Test dataset

In [12]:
print(f"Test Veri Setinin Boyut Bilgisi: {test.shape}")

Test Veri Setinin Boyut Bilgisi: (585730, 86)


In [13]:
test.head(3)

Unnamed: 0,ID,first_open_date,first_open_timestamp,local_first_open_timestamp,country,platform,device_category,device_brand,device_model,has_ios_att_permission,ad_network,first_prediction,RetentionD0,RetentionD1,RetentionD2,RetentionD3,RetentionD4,RetentionD5,RetentionD6,RetentionD7,RetentionD8,RetentionD9,RetentionD10,RetentionD11,RetentionD12,RetentionD13,RetentionD14,RetentionD15,LevelAdvancedCountD0,LevelAdvancedCountD1,LevelAdvancedCountD2,LevelAdvancedCountD3,LevelAdvancedCountD4,LevelAdvancedCountD5,LevelAdvancedCountD6,LevelAdvancedCountD7,LevelAdvancedCountD8,LevelAdvancedCountD9,LevelAdvancedCountD10,LevelAdvancedCountD11,LevelAdvancedCountD12,LevelAdvancedCountD13,LevelAdvancedCountD14,LevelAdvancedCountD15,Level_1_Duration,Level_2_Duration,Level_3_Duration,Level_4_Duration,Level_5_Duration,Level_6_Duration,Level_7_Duration,Level_8_Duration,Level_9_Duration,Level_10_Duration,AdRevenueD0,AdRevenueD1,AdRevenueD2,AdRevenueD3,AdRevenueD4,AdRevenueD5,AdRevenueD6,AdRevenueD7,AdRevenueD8,AdRevenueD9,AdRevenueD10,AdRevenueD11,AdRevenueD12,AdRevenueD13,AdRevenueD14,AdRevenueD15,IAPRevenueD0,IAPRevenueD1,IAPRevenueD2,IAPRevenueD3,IAPRevenueD4,IAPRevenueD5,IAPRevenueD6,IAPRevenueD7,IAPRevenueD8,IAPRevenueD9,IAPRevenueD10,IAPRevenueD11,IAPRevenueD12,IAPRevenueD13,IAPRevenueD14,IAPRevenueD15
0,878594,2024-05-12,1715478163668000,1715467363668000,Argentina,Android,mobile,Motorola,Moto G32,False,applovin_int,1.444805,True,True,True,True,True,False,False,False,False,False,False,False,False,False,False,False,10,5,1,0,0,0,0,0,0,0,0,0,0,0,0,0,16.0,147.0,81.0,68.0,265.0,184.0,142.0,133.0,59.0,157.0,0.001595,0.009382,0.00174,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,878595,2024-01-26,1706254855890000,1706233255890000,Mexico,Android,mobile,OnePlus,Nord N20 SE,False,applovin_int,9.147972,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,17.0,209.0,84.0,,,,,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,878596,2024-02-13,1707778260263000,1707781860263000,France,Android,mobile,Motorola,moto g13,False,applovin_int,40.731158,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,19.0,73.0,130.0,,,,,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [14]:
test.tail(3)

Unnamed: 0,ID,first_open_date,first_open_timestamp,local_first_open_timestamp,country,platform,device_category,device_brand,device_model,has_ios_att_permission,ad_network,first_prediction,RetentionD0,RetentionD1,RetentionD2,RetentionD3,RetentionD4,RetentionD5,RetentionD6,RetentionD7,RetentionD8,RetentionD9,RetentionD10,RetentionD11,RetentionD12,RetentionD13,RetentionD14,RetentionD15,LevelAdvancedCountD0,LevelAdvancedCountD1,LevelAdvancedCountD2,LevelAdvancedCountD3,LevelAdvancedCountD4,LevelAdvancedCountD5,LevelAdvancedCountD6,LevelAdvancedCountD7,LevelAdvancedCountD8,LevelAdvancedCountD9,LevelAdvancedCountD10,LevelAdvancedCountD11,LevelAdvancedCountD12,LevelAdvancedCountD13,LevelAdvancedCountD14,LevelAdvancedCountD15,Level_1_Duration,Level_2_Duration,Level_3_Duration,Level_4_Duration,Level_5_Duration,Level_6_Duration,Level_7_Duration,Level_8_Duration,Level_9_Duration,Level_10_Duration,AdRevenueD0,AdRevenueD1,AdRevenueD2,AdRevenueD3,AdRevenueD4,AdRevenueD5,AdRevenueD6,AdRevenueD7,AdRevenueD8,AdRevenueD9,AdRevenueD10,AdRevenueD11,AdRevenueD12,AdRevenueD13,AdRevenueD14,AdRevenueD15,IAPRevenueD0,IAPRevenueD1,IAPRevenueD2,IAPRevenueD3,IAPRevenueD4,IAPRevenueD5,IAPRevenueD6,IAPRevenueD7,IAPRevenueD8,IAPRevenueD9,IAPRevenueD10,IAPRevenueD11,IAPRevenueD12,IAPRevenueD13,IAPRevenueD14,IAPRevenueD15
585727,1464321,2024-02-14,1707930024996000,1707912024996000,United States,iOS,mobile,Apple,iPhone 11,False,applovin_int,101.114631,True,True,True,False,False,False,False,False,False,False,False,False,False,False,False,False,14,8,6,0,0,0,0,0,0,0,0,0,9,0,0,0,6.0,9.0,24.0,27.0,17.0,15.0,23.0,12.0,31.0,56.0,0.422472,0.407156,0.21502,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.315011,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
585728,1464322,2024-05-07,1715043787322000,1715032987322000,Argentina,Android,mobile,Samsung,Galaxy A22 4G,False,applovin_int,1.341221,True,True,False,True,False,False,False,False,False,False,False,False,False,False,False,False,10,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,22.0,33.0,24.0,38.0,90.0,51.0,124.0,80.0,50.0,92.0,0.002737,0.0,0.0,0.004852,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
585729,1464323,2024-01-01,1704078581140000,1704082181140000,France,Android,mobile,Samsung,Galaxy S10,False,applovin_int,39.502274,True,True,False,True,False,True,False,True,True,False,False,False,False,False,False,False,46,3,0,0,0,1,0,5,4,0,0,0,0,0,0,0,18.0,47.0,58.0,235.0,51.0,156.0,40.0,56.0,36.0,124.0,0.555541,0.06531,0.0,0.025157,0.0,0.021121,0.0,0.161279,0.109985,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [15]:
test.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
ID,585730.0,1.171458e+06,1.690858e+05,8.785940e+05,1.025026e+06,1.171458e+06,1.317891e+06,1.464323e+06
first_open_timestamp,585730.0,1.710307e+15,3.554277e+12,1.704056e+15,1.707210e+15,1.710289e+15,1.713505e+15,1.716066e+15
local_first_open_timestamp,585730.0,1.710301e+15,3.553715e+12,1.704028e+15,1.707211e+15,1.710280e+15,1.713494e+15,1.716089e+15
first_prediction,568452.0,3.754019e+01,8.764884e+01,9.999999e-05,3.904612e+00,1.131115e+01,3.561212e+01,4.884568e+03
LevelAdvancedCountD0,585730.0,1.114865e+01,1.366017e+01,0.000000e+00,3.000000e+00,7.000000e+00,1.400000e+01,7.950000e+02
...,...,...,...,...,...,...,...,...
IAPRevenueD11,585730.0,7.432947e-05,2.036342e-02,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,1.300000e+01
IAPRevenueD12,585730.0,8.096563e-05,1.678863e-02,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,6.500000e+00
IAPRevenueD13,585730.0,4.436686e-05,1.363813e-02,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,6.500000e+00
IAPRevenueD14,585730.0,3.218206e-05,8.450442e-03,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,3.250000e+00


In [16]:
test_null_cols = [col for col in test.columns if test[col].isnull().sum() > 0]

test[test_null_cols].isnull().sum()

country                  56
device_brand           3927
ad_network           206591
first_prediction      17278
Level_1_Duration       4223
Level_2_Duration      33821
Level_3_Duration      58749
Level_4_Duration     100119
Level_5_Duration     127762
Level_6_Duration     153108
Level_7_Duration     179261
Level_8_Duration     200449
Level_9_Duration     212093
Level_10_Duration    237160
dtype: int64

### 2.4) Delete unnecesarry columns, filling in missing values and simplifying columns before exploratory data analysis

#### 2.4.1) Delete unnecesarry columns

In [17]:
train.drop(["ID", "first_open_date", "first_open_timestamp", "local_first_open_timestamp","has_ios_att_permission"], axis=1, inplace=True)

test.drop(["ID", "first_open_date", "first_open_timestamp", "local_first_open_timestamp", "has_ios_att_permission"], axis=1, inplace=True)

In [18]:
train.head()

Unnamed: 0,country,platform,device_category,device_brand,device_model,ad_network,first_prediction,RetentionD0,RetentionD1,RetentionD2,RetentionD3,RetentionD4,RetentionD5,RetentionD6,RetentionD7,RetentionD8,RetentionD9,RetentionD10,RetentionD11,RetentionD12,RetentionD13,RetentionD14,RetentionD15,LevelAdvancedCountD0,LevelAdvancedCountD1,LevelAdvancedCountD2,LevelAdvancedCountD3,LevelAdvancedCountD4,LevelAdvancedCountD5,LevelAdvancedCountD6,LevelAdvancedCountD7,LevelAdvancedCountD8,LevelAdvancedCountD9,LevelAdvancedCountD10,LevelAdvancedCountD11,LevelAdvancedCountD12,LevelAdvancedCountD13,LevelAdvancedCountD14,LevelAdvancedCountD15,Level_1_Duration,Level_2_Duration,Level_3_Duration,Level_4_Duration,Level_5_Duration,Level_6_Duration,Level_7_Duration,Level_8_Duration,Level_9_Duration,Level_10_Duration,AdRevenueD0,AdRevenueD1,AdRevenueD2,AdRevenueD3,AdRevenueD4,AdRevenueD5,AdRevenueD6,AdRevenueD7,AdRevenueD8,AdRevenueD9,AdRevenueD10,AdRevenueD11,AdRevenueD12,AdRevenueD13,AdRevenueD14,AdRevenueD15,IAPRevenueD0,IAPRevenueD1,IAPRevenueD2,IAPRevenueD3,IAPRevenueD4,IAPRevenueD5,IAPRevenueD6,IAPRevenueD7,IAPRevenueD8,IAPRevenueD9,IAPRevenueD10,IAPRevenueD11,IAPRevenueD12,IAPRevenueD13,IAPRevenueD14,IAPRevenueD15,TARGET
0,Mexico,Android,mobile,Xiaomi,Redmi A2,unityads_int,3.314099,True,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,26.0,69.0,36.0,,,,,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Peru,Android,mobile,Samsung,Galaxy A13,applovin_int,1.681524,True,False,False,False,False,False,True,True,False,True,False,False,False,False,False,False,5,0,0,0,0,0,3,5,0,2,0,0,0,0,0,0,13.0,91.0,39.0,79.0,180.0,89.0,124.0,118.0,35.0,117.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.008674,0.0,0.010218,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018892
2,Brazil,Android,mobile,Xiaomi,Redmi 12,applovin_int,10.71875,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,8.0,63.0,86.0,,,,,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Dominican Republic,iOS,mobile,Apple,iPhone 11 Pro Max,,5.1,True,True,True,False,False,False,False,False,False,False,False,False,False,False,False,False,19,8,9,0,0,0,0,0,0,0,0,0,0,0,0,0,23.0,141.0,131.0,118.0,77.0,107.0,77.0,182.0,42.0,156.0,0.00215,0.019159,0.025341,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04665
4,Ecuador,Android,mobile,Motorola,Moto E22,applovin_int,2.091409,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,15,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,8.0,52.0,30.0,84.0,139.0,96.0,268.0,97.0,44.0,122.0,0.01468,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01468


#### 2.4.2) Filling Missing Values

##### For Train Dataset

In [19]:
train_null_cols = [col for col in train.columns if train[col].isnull().sum() > 0]

train[train_null_cols].isnull().sum()

country                  82
device_brand           5840
ad_network           310470
first_prediction      25735
Level_1_Duration       6100
Level_2_Duration      50406
Level_3_Duration      87940
Level_4_Duration     149897
Level_5_Duration     191762
Level_6_Duration     229227
Level_7_Duration     268830
Level_8_Duration     301236
Level_9_Duration     318661
Level_10_Duration    355989
dtype: int64

In [20]:
train["ad_network"].fillna("organic_user", inplace=True)

In [21]:
# Level_{i}_Duration: Kullanıcının i'inci seviyeyi tamamlaması için geçen süre (kullanıcı i'inci seviyeyi tamamlamamışsa null).
train["Level_1_Duration"].fillna(0, inplace=True)
train["Level_2_Duration"].fillna(0, inplace=True)
train["Level_3_Duration"].fillna(0, inplace=True)
train["Level_4_Duration"].fillna(0, inplace=True)
train["Level_5_Duration"].fillna(0, inplace=True)
train["Level_6_Duration"].fillna(0, inplace=True)
train["Level_7_Duration"].fillna(0, inplace=True)
train["Level_8_Duration"].fillna(0, inplace=True)
train["Level_9_Duration"].fillna(0, inplace=True)
train["Level_10_Duration"].fillna(0, inplace=True)

In [22]:
# first_prediction: Kullanıcının ilk tahmini değeri (USD cinsinden).
# first_prediction değeri boş olan kullanıcılar oyuna para yatırmayan kullanıcılar. Bu yüzden bu kolondaki null değerler de sıfır ile doldurulacak.
train["first_prediction"].fillna(0, inplace=True)

In [23]:
train["country"].fillna("unknown", inplace=True)

In [24]:
train["device_brand"].fillna("unknown", inplace=True)

In [25]:
train.isnull().sum().sum()

0

##### For Test Dataset

In [26]:
test_null_cols = [col for col in test.columns if test[col].isnull().sum() > 0]

test[test_null_cols].isnull().sum()

country                  56
device_brand           3927
ad_network           206591
first_prediction      17278
Level_1_Duration       4223
Level_2_Duration      33821
Level_3_Duration      58749
Level_4_Duration     100119
Level_5_Duration     127762
Level_6_Duration     153108
Level_7_Duration     179261
Level_8_Duration     200449
Level_9_Duration     212093
Level_10_Duration    237160
dtype: int64

In [27]:
test["ad_network"].fillna("organic_user", inplace=True)

In [28]:
# Level_{i}_Duration: Kullanıcının i'inci seviyeyi tamamlaması için geçen süre (kullanıcı i'inci seviyeyi tamamlamamışsa null).
test["Level_1_Duration"].fillna(0, inplace=True)
test["Level_2_Duration"].fillna(0, inplace=True)
test["Level_3_Duration"].fillna(0, inplace=True)
test["Level_4_Duration"].fillna(0, inplace=True)
test["Level_5_Duration"].fillna(0, inplace=True)
test["Level_6_Duration"].fillna(0, inplace=True)
test["Level_7_Duration"].fillna(0, inplace=True)
test["Level_8_Duration"].fillna(0, inplace=True)
test["Level_9_Duration"].fillna(0, inplace=True)
test["Level_10_Duration"].fillna(0, inplace=True)

In [29]:
# first_prediction: Kullanıcının ilk tahmini değeri (USD cinsinden).
# first_prediction değeri boş olan kullanıcılar oyuna para yatırmayan kullanıcılar. Bu yüzden bu kolondaki null değerler de sıfır ile doldurulacak.
test["first_prediction"].fillna(0, inplace=True)

In [30]:
test["country"].fillna("unknown", inplace=True)

In [31]:
test["device_brand"].fillna("unknown", inplace=True)

In [32]:
test.isnull().sum().sum()

0

#### 2.4.2) Simplifying columns

##### For Train Dataset

In [33]:
retention_cols = [col for col in train.columns if "Retention" in col]
level_advanced_cols =[col for col in train.columns if "LevelAdvancedCount" in col]
level_duration_cols = [col for col in train.columns if (str(col).startswith("Level")) and (str(col).endswith("Duration"))]
adrevenue_cols = [col for col in train.columns if "AdRevenue" in col]
iaprevenue_cols = [col for col in train.columns if "IAPRevenue" in col]

In [34]:
print(retention_cols)
print(level_advanced_cols)
print(level_duration_cols)
print(adrevenue_cols)
print(iaprevenue_cols)

['RetentionD0', 'RetentionD1', 'RetentionD2', 'RetentionD3', 'RetentionD4', 'RetentionD5', 'RetentionD6', 'RetentionD7', 'RetentionD8', 'RetentionD9', 'RetentionD10', 'RetentionD11', 'RetentionD12', 'RetentionD13', 'RetentionD14', 'RetentionD15']
['LevelAdvancedCountD0', 'LevelAdvancedCountD1', 'LevelAdvancedCountD2', 'LevelAdvancedCountD3', 'LevelAdvancedCountD4', 'LevelAdvancedCountD5', 'LevelAdvancedCountD6', 'LevelAdvancedCountD7', 'LevelAdvancedCountD8', 'LevelAdvancedCountD9', 'LevelAdvancedCountD10', 'LevelAdvancedCountD11', 'LevelAdvancedCountD12', 'LevelAdvancedCountD13', 'LevelAdvancedCountD14', 'LevelAdvancedCountD15']
['Level_1_Duration', 'Level_2_Duration', 'Level_3_Duration', 'Level_4_Duration', 'Level_5_Duration', 'Level_6_Duration', 'Level_7_Duration', 'Level_8_Duration', 'Level_9_Duration', 'Level_10_Duration']
['AdRevenueD0', 'AdRevenueD1', 'AdRevenueD2', 'AdRevenueD3', 'AdRevenueD4', 'AdRevenueD5', 'AdRevenueD6', 'AdRevenueD7', 'AdRevenueD8', 'AdRevenueD9', 'AdRevenu

In [35]:
# Retention kolonlarını booleandan integera çevirme
for col in retention_cols:
    train[col] = train[col].astype(int)
    test[col] = test[col].astype(int)

In [36]:
# Toplam elde tutma, oyuna girilen gün sayısı
train["total_retention"] = train[retention_cols].sum(axis=1)

test["total_retention"] = test[retention_cols].sum(axis=1)

In [37]:
# Toplam tamamlanan seviye sayısı
train["total_level_advanced"] = train[level_advanced_cols].sum(axis=1)

test["total_level_advanced"] = test[level_advanced_cols].sum(axis=1)

In [38]:
# Toplam geçirilen süre
train["total_level_duration"] = train[level_duration_cols].sum(axis=1)

test["total_level_duration"] = test[level_duration_cols].sum(axis=1)

In [39]:
# Reklamlardan kazanılan toplam tutar
train["total_ad_revenue"] = train[adrevenue_cols].sum(axis=1)

test["total_ad_revenue"] = test[adrevenue_cols].sum(axis=1)

In [40]:
# Oyun içi yüklemelerden kazanılan tutar
train["total_iap_revenue"] = train[iaprevenue_cols].sum(axis=1)

test["total_iap_revenue"] = test[iaprevenue_cols].sum(axis=1)

In [41]:
train.head(3)

Unnamed: 0,country,platform,device_category,device_brand,device_model,ad_network,first_prediction,RetentionD0,RetentionD1,RetentionD2,RetentionD3,RetentionD4,RetentionD5,RetentionD6,RetentionD7,RetentionD8,RetentionD9,RetentionD10,RetentionD11,RetentionD12,RetentionD13,RetentionD14,RetentionD15,LevelAdvancedCountD0,LevelAdvancedCountD1,LevelAdvancedCountD2,LevelAdvancedCountD3,LevelAdvancedCountD4,LevelAdvancedCountD5,LevelAdvancedCountD6,LevelAdvancedCountD7,LevelAdvancedCountD8,LevelAdvancedCountD9,LevelAdvancedCountD10,LevelAdvancedCountD11,LevelAdvancedCountD12,LevelAdvancedCountD13,LevelAdvancedCountD14,LevelAdvancedCountD15,Level_1_Duration,Level_2_Duration,Level_3_Duration,Level_4_Duration,Level_5_Duration,Level_6_Duration,Level_7_Duration,Level_8_Duration,Level_9_Duration,Level_10_Duration,AdRevenueD0,AdRevenueD1,AdRevenueD2,AdRevenueD3,AdRevenueD4,AdRevenueD5,AdRevenueD6,AdRevenueD7,AdRevenueD8,AdRevenueD9,AdRevenueD10,AdRevenueD11,AdRevenueD12,AdRevenueD13,AdRevenueD14,AdRevenueD15,IAPRevenueD0,IAPRevenueD1,IAPRevenueD2,IAPRevenueD3,IAPRevenueD4,IAPRevenueD5,IAPRevenueD6,IAPRevenueD7,IAPRevenueD8,IAPRevenueD9,IAPRevenueD10,IAPRevenueD11,IAPRevenueD12,IAPRevenueD13,IAPRevenueD14,IAPRevenueD15,TARGET,total_retention,total_level_advanced,total_level_duration,total_ad_revenue,total_iap_revenue
0,Mexico,Android,mobile,Xiaomi,Redmi A2,unityads_int,3.314099,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,26.0,69.0,36.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2,3,131.0,0.0,0.0
1,Peru,Android,mobile,Samsung,Galaxy A13,applovin_int,1.681524,1,0,0,0,0,0,1,1,0,1,0,0,0,0,0,0,5,0,0,0,0,0,3,5,0,2,0,0,0,0,0,0,13.0,91.0,39.0,79.0,180.0,89.0,124.0,118.0,35.0,117.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.008674,0.0,0.010218,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018892,4,15,885.0,0.018892,0.0
2,Brazil,Android,mobile,Xiaomi,Redmi 12,applovin_int,10.71875,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,8.0,63.0,86.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,3,157.0,0.0,0.0


In [42]:
test.head(3)

Unnamed: 0,country,platform,device_category,device_brand,device_model,ad_network,first_prediction,RetentionD0,RetentionD1,RetentionD2,RetentionD3,RetentionD4,RetentionD5,RetentionD6,RetentionD7,RetentionD8,RetentionD9,RetentionD10,RetentionD11,RetentionD12,RetentionD13,RetentionD14,RetentionD15,LevelAdvancedCountD0,LevelAdvancedCountD1,LevelAdvancedCountD2,LevelAdvancedCountD3,LevelAdvancedCountD4,LevelAdvancedCountD5,LevelAdvancedCountD6,LevelAdvancedCountD7,LevelAdvancedCountD8,LevelAdvancedCountD9,LevelAdvancedCountD10,LevelAdvancedCountD11,LevelAdvancedCountD12,LevelAdvancedCountD13,LevelAdvancedCountD14,LevelAdvancedCountD15,Level_1_Duration,Level_2_Duration,Level_3_Duration,Level_4_Duration,Level_5_Duration,Level_6_Duration,Level_7_Duration,Level_8_Duration,Level_9_Duration,Level_10_Duration,AdRevenueD0,AdRevenueD1,AdRevenueD2,AdRevenueD3,AdRevenueD4,AdRevenueD5,AdRevenueD6,AdRevenueD7,AdRevenueD8,AdRevenueD9,AdRevenueD10,AdRevenueD11,AdRevenueD12,AdRevenueD13,AdRevenueD14,AdRevenueD15,IAPRevenueD0,IAPRevenueD1,IAPRevenueD2,IAPRevenueD3,IAPRevenueD4,IAPRevenueD5,IAPRevenueD6,IAPRevenueD7,IAPRevenueD8,IAPRevenueD9,IAPRevenueD10,IAPRevenueD11,IAPRevenueD12,IAPRevenueD13,IAPRevenueD14,IAPRevenueD15,total_retention,total_level_advanced,total_level_duration,total_ad_revenue,total_iap_revenue
0,Argentina,Android,mobile,Motorola,Moto G32,applovin_int,1.444805,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,10,5,1,0,0,0,0,0,0,0,0,0,0,0,0,0,16.0,147.0,81.0,68.0,265.0,184.0,142.0,133.0,59.0,157.0,0.001595,0.009382,0.00174,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5,16,1252.0,0.012716,0.0
1,Mexico,Android,mobile,OnePlus,Nord N20 SE,applovin_int,9.147972,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,17.0,209.0,84.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,3,310.0,0.0,0.0
2,France,Android,mobile,Motorola,moto g13,applovin_int,40.731158,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,19.0,73.0,130.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,3,222.0,0.0,0.0


## 3) Feature Engineering

### 3.1) Feature Extraction

In [43]:
# Toplam kazanç
train["total_revenue"] = train["total_ad_revenue"] + train["total_iap_revenue"]

test["total_revenue"] = test["total_ad_revenue"] + test["total_iap_revenue"]

In [44]:
# Oyuna girilen günler şirket ortalama ne kadar kazanmış
train['daily_revenue'] = (train["total_revenue"]+1) / (train["total_retention"]+1)
test['daily_revenue'] = (test["total_revenue"]+1) / (test["total_retention"]+1)

# Oyuna girilen günler ortalama ne kadar bölüm geçilmiş
train['daily_level_advanced'] = (train["total_level_advanced"]+1) / (train["total_retention"]+1)
test['daily_level_advanced'] = (test["total_level_advanced"]+1) / (test["total_retention"]+1)


# Tamamlanan seviyelerin toplamı ile gelir arasındaki ilişki
train['level_to_revenue_ratio'] = (train['total_level_advanced']+1) / (train['total_revenue']+1)
test['level_to_revenue_ratio'] = (test['total_level_advanced']+1) / (test['total_revenue']+1)

In [45]:
# Seviye başına kazanılan gelir
train['revenue_per_level'] = (train['total_revenue']+1) / (train['total_level_advanced']+1)

test['revenue_per_level'] = (test['total_revenue']+1) / (test['total_level_advanced']+1)

In [46]:
# ilk günki kazanç ile son günki kazanç arasındaki fark
train['revenue_first_vs_last'] = train['AdRevenueD1'] + train['IAPRevenueD1'] - (train['AdRevenueD15'] + train['IAPRevenueD15'])

test['revenue_first_vs_last'] = test['AdRevenueD1'] + test['IAPRevenueD1'] - (test['AdRevenueD15'] + test['IAPRevenueD15'])

In [47]:
# İlk gün kazanılan toplam gelir / toplam gelir

train['first_day_revenue_ratio'] = (train['AdRevenueD1'] + train['IAPRevenueD1']+1) / (train['total_revenue']+1)

test['first_day_revenue_ratio'] = (test['AdRevenueD1'] + test['IAPRevenueD1']+1) / (test['total_revenue']+1)

In [48]:
train['total_revenue_first_day'] = train['AdRevenueD1'] + train['IAPRevenueD1']

test['total_revenue_first_day'] = test['AdRevenueD1'] + test['IAPRevenueD1']

In [49]:
# Toplam kazanılan gelirin 15 güne oranı
train['avg_daily_revenue'] = train['total_revenue'] / 15

test['avg_daily_revenue'] = test['total_revenue'] / 15

In [50]:
# Geçilen level başına kazanılan tutar
train['revenue_per_level'] = (train['total_revenue']+1) / (train['total_level_advanced']+1)

test['revenue_per_level'] = (test['total_revenue']+1) / (test['total_level_advanced']+1)

In [51]:
# RFM Metriklerinin Oluşturulması

In [52]:
# Recency Metriğinin Oluşturulması
# Recency (Yenilik): Kullanıcının en son oyuna giriş zamanı

# Kullanıcının en son hangi gün oyuna girdiğini buluyoruz
train['last_active_day'] = train[retention_cols].apply(lambda row: row[::-1].idxmax(), axis=1)

# Gün sayısını bulmak için kolon adından 'RetentionD{i}' kısmını çıkarıyoruz
train['last_active_day'] = train['last_active_day'].apply(lambda x: int(x.split('D')[-1]))

# Recency'yi bulmak için son günden en son aktif olduğu günü çıkarıyoruz (örneğin son gün 15)
train['recency'] = 15 - train['last_active_day']


test['last_active_day'] = test[retention_cols].apply(lambda row: row[::-1].idxmax(), axis=1)

test['last_active_day'] = test['last_active_day'].apply(lambda x: int(x.split('D')[-1]))

test['recency'] = 15 - test['last_active_day']

In [53]:
# Frequency Metriğinin Oluşturulması
# Kullanıcının oyuna para yatırma sayısı

# iaprevenue_cols içindeki tüm sütunları toplayarak frequency'yi hesapla
train['frequency'] = (train[iaprevenue_cols].gt(0).sum(axis=1)+1)
test['frequency'] = (test[iaprevenue_cols].gt(0).sum(axis=1)+1)

In [54]:
# Monetary Metriğinin Oluşturulması
# Kullanıcının şirkete ne kadar gelir kazandırdığını ölçer

train['monetary'] = train[adrevenue_cols + iaprevenue_cols].sum(axis=1)
test['monetary'] = test[adrevenue_cols + iaprevenue_cols].sum(axis=1)

In [55]:
train.head()

Unnamed: 0,country,platform,device_category,device_brand,device_model,ad_network,first_prediction,RetentionD0,RetentionD1,RetentionD2,RetentionD3,RetentionD4,RetentionD5,RetentionD6,RetentionD7,RetentionD8,RetentionD9,RetentionD10,RetentionD11,RetentionD12,RetentionD13,RetentionD14,RetentionD15,LevelAdvancedCountD0,LevelAdvancedCountD1,LevelAdvancedCountD2,LevelAdvancedCountD3,LevelAdvancedCountD4,LevelAdvancedCountD5,LevelAdvancedCountD6,LevelAdvancedCountD7,LevelAdvancedCountD8,LevelAdvancedCountD9,LevelAdvancedCountD10,LevelAdvancedCountD11,LevelAdvancedCountD12,LevelAdvancedCountD13,LevelAdvancedCountD14,LevelAdvancedCountD15,Level_1_Duration,Level_2_Duration,Level_3_Duration,Level_4_Duration,Level_5_Duration,Level_6_Duration,Level_7_Duration,Level_8_Duration,Level_9_Duration,Level_10_Duration,AdRevenueD0,AdRevenueD1,AdRevenueD2,AdRevenueD3,AdRevenueD4,AdRevenueD5,AdRevenueD6,AdRevenueD7,AdRevenueD8,AdRevenueD9,AdRevenueD10,AdRevenueD11,AdRevenueD12,AdRevenueD13,AdRevenueD14,AdRevenueD15,IAPRevenueD0,IAPRevenueD1,IAPRevenueD2,IAPRevenueD3,IAPRevenueD4,IAPRevenueD5,IAPRevenueD6,IAPRevenueD7,IAPRevenueD8,IAPRevenueD9,IAPRevenueD10,IAPRevenueD11,IAPRevenueD12,IAPRevenueD13,IAPRevenueD14,IAPRevenueD15,TARGET,total_retention,total_level_advanced,total_level_duration,total_ad_revenue,total_iap_revenue,total_revenue,daily_revenue,daily_level_advanced,level_to_revenue_ratio,revenue_per_level,revenue_first_vs_last,first_day_revenue_ratio,total_revenue_first_day,avg_daily_revenue,last_active_day,recency,frequency,monetary
0,Mexico,Android,mobile,Xiaomi,Redmi A2,unityads_int,3.314099,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,26.0,69.0,36.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2,3,131.0,0.0,0.0,0.0,0.333333,1.333333,4.0,0.25,0.0,1.0,0.0,0.0,12,3,1,0.0
1,Peru,Android,mobile,Samsung,Galaxy A13,applovin_int,1.681524,1,0,0,0,0,0,1,1,0,1,0,0,0,0,0,0,5,0,0,0,0,0,3,5,0,2,0,0,0,0,0,0,13.0,91.0,39.0,79.0,180.0,89.0,124.0,118.0,35.0,117.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.008674,0.0,0.010218,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018892,4,15,885.0,0.018892,0.0,0.018892,0.203778,3.2,15.703339,0.063681,0.0,0.981459,0.0,0.001259,9,6,1,0.018892
2,Brazil,Android,mobile,Xiaomi,Redmi 12,applovin_int,10.71875,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,8.0,63.0,86.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,3,157.0,0.0,0.0,0.0,0.5,2.0,4.0,0.25,0.0,1.0,0.0,0.0,0,15,1,0.0
3,Dominican Republic,iOS,mobile,Apple,iPhone 11 Pro Max,organic_user,5.1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,19,8,9,0,0,0,0,0,0,0,0,0,0,0,0,0,23.0,141.0,131.0,118.0,77.0,107.0,77.0,182.0,42.0,156.0,0.00215,0.019159,0.025341,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04665,3,36,1054.0,0.04665,0.0,0.04665,0.261663,9.25,35.350871,0.028288,0.019159,0.973734,0.019159,0.00311,2,13,1,0.04665
4,Ecuador,Android,mobile,Motorola,Moto E22,applovin_int,2.091409,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,15,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,8.0,52.0,30.0,84.0,139.0,96.0,268.0,97.0,44.0,122.0,0.01468,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01468,1,15,940.0,0.01468,0.0,0.01468,0.50734,8.0,15.768523,0.063417,0.0,0.985533,0.0,0.000979,0,15,1,0.01468


In [56]:
# RFM Skorlarının Oluşturulması

# Recency (yenilik) metriği ne kadar düşük olursa bizim için o kadar iyidir. Bu kullancının sıcaklığını, yeniliğini ifade eder.
train["recency_score"] = pd.qcut(x = train["recency"].rank(method = "first"), q = 5, labels = [5, 4, 3, 2, 1])
test["recency_score"] = pd.qcut(x = test["recency"].rank(method = "first"), q = 5, labels = [5, 4, 3, 2, 1])

# Frequency (sıklık) metriği ne kadar kullanıcının toplam oyuna giriş sayısını ifade eder. Bu metrik ne kadar yüksekse bizim için o kadar iyidir.
train["frequency_score"] = pd.qcut(x = train["frequency"].rank(method = "first"), q = 5, labels = [1, 2, 3, 4, 5])
test["frequency_score"] = pd.qcut(x = test["frequency"].rank(method = "first"), q = 5, labels = [1, 2, 3, 4, 5])

# Monetary (Parasal Değer) metriği, kullanıcının şirkete bıraktığı getiriyi ifade eder. Bu ifade de ne kadar yüksek olursa bizim için o kadar iyidir.
train["monetary_score"] = pd.qcut(x = train["monetary"].rank(method = "first"), q = 5, labels = [1, 2, 3, 4, 5])
test["monetary_score"] = pd.qcut(x = test["monetary"].rank(method = "first"), q = 5, labels = [1, 2, 3, 4, 5])

In [57]:
train.isnull().sum().sum()

0

In [58]:
test.isnull().sum().sum()

0

In [59]:
# Toplam değerleri bulunan kolonların silinmesi
remove_cols = [retention_cols, level_advanced_cols, level_duration_cols, iaprevenue_cols]

for col in remove_cols:
    train.drop(col, axis=1, inplace=True)
    test.drop(col, axis=1, inplace=True)

In [60]:
rfm_metrics_drop = ["recency", "frequency", "monetary"]

train.drop(rfm_metrics_drop, axis=1, inplace=True)

test.drop(rfm_metrics_drop, axis=1, inplace=True)

In [61]:
train.head(3)

Unnamed: 0,country,platform,device_category,device_brand,device_model,ad_network,first_prediction,AdRevenueD0,AdRevenueD1,AdRevenueD2,AdRevenueD3,AdRevenueD4,AdRevenueD5,AdRevenueD6,AdRevenueD7,AdRevenueD8,AdRevenueD9,AdRevenueD10,AdRevenueD11,AdRevenueD12,AdRevenueD13,AdRevenueD14,AdRevenueD15,TARGET,total_retention,total_level_advanced,total_level_duration,total_ad_revenue,total_iap_revenue,total_revenue,daily_revenue,daily_level_advanced,level_to_revenue_ratio,revenue_per_level,revenue_first_vs_last,first_day_revenue_ratio,total_revenue_first_day,avg_daily_revenue,last_active_day,recency_score,frequency_score,monetary_score
0,Mexico,Android,mobile,Xiaomi,Redmi A2,unityads_int,3.314099,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2,3,131.0,0.0,0.0,0.0,0.333333,1.333333,4.0,0.25,0.0,1.0,0.0,0.0,12,5,1,1
1,Peru,Android,mobile,Samsung,Galaxy A13,applovin_int,1.681524,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.008674,0.0,0.010218,0.0,0.0,0.0,0.0,0.0,0.0,0.018892,4,15,885.0,0.018892,0.0,0.018892,0.203778,3.2,15.703339,0.063681,0.0,0.981459,0.0,0.001259,9,4,1,3
2,Brazil,Android,mobile,Xiaomi,Redmi 12,applovin_int,10.71875,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,3,157.0,0.0,0.0,0.0,0.5,2.0,4.0,0.25,0.0,1.0,0.0,0.0,0,2,1,1


In [62]:
test.head(3)

Unnamed: 0,country,platform,device_category,device_brand,device_model,ad_network,first_prediction,AdRevenueD0,AdRevenueD1,AdRevenueD2,AdRevenueD3,AdRevenueD4,AdRevenueD5,AdRevenueD6,AdRevenueD7,AdRevenueD8,AdRevenueD9,AdRevenueD10,AdRevenueD11,AdRevenueD12,AdRevenueD13,AdRevenueD14,AdRevenueD15,total_retention,total_level_advanced,total_level_duration,total_ad_revenue,total_iap_revenue,total_revenue,daily_revenue,daily_level_advanced,level_to_revenue_ratio,revenue_per_level,revenue_first_vs_last,first_day_revenue_ratio,total_revenue_first_day,avg_daily_revenue,last_active_day,recency_score,frequency_score,monetary_score
0,Argentina,Android,mobile,Motorola,Moto G32,applovin_int,1.444805,0.001595,0.009382,0.00174,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5,16,1252.0,0.012716,0.0,0.012716,0.168786,2.833333,16.786544,0.059572,0.009382,0.996708,0.009382,0.000848,4,3,1,3
1,Mexico,Android,mobile,OnePlus,Nord N20 SE,applovin_int,9.147972,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,3,310.0,0.0,0.0,0.0,0.5,2.0,4.0,0.25,0.0,1.0,0.0,0.0,0,2,1,1
2,France,Android,mobile,Motorola,moto g13,applovin_int,40.731158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,3,222.0,0.0,0.0,0.0,0.5,2.0,4.0,0.25,0.0,1.0,0.0,0.0,0,2,1,1


### 3.2) Encoding

In [63]:
ordinal_cols = ["ad_network", "recency_score", "frequency_score", "monetary_score"]

for col in ordinal_cols:
    ordinal_encoder = OrdinalEncoder()
    
    train[[col]] = ordinal_encoder.fit_transform(train[[col]])
    test[[col]] = ordinal_encoder.transform(test[[col]])

In [64]:
frequency_cols = ["country", "device_brand", "device_model"]

for col in frequency_cols:
    # Frekansları hesapla
    frequency_encoding = train[col].value_counts(normalize=True)
    
    # Country sütununu frekanslarıyla değiştir
    train[col] = train[col].map(frequency_encoding)

In [65]:
frequency_cols = ["country", "device_brand", "device_model"]

for col in frequency_cols:
    # Frekansları hesapla
    frequency_encoding = test[col].value_counts(normalize=True)
    
    # Country sütununu frekanslarıyla değiştir
    test[col] = test[col].map(frequency_encoding)

In [66]:
train.head()

Unnamed: 0,country,platform,device_category,device_brand,device_model,ad_network,first_prediction,AdRevenueD0,AdRevenueD1,AdRevenueD2,AdRevenueD3,AdRevenueD4,AdRevenueD5,AdRevenueD6,AdRevenueD7,AdRevenueD8,AdRevenueD9,AdRevenueD10,AdRevenueD11,AdRevenueD12,AdRevenueD13,AdRevenueD14,AdRevenueD15,TARGET,total_retention,total_level_advanced,total_level_duration,total_ad_revenue,total_iap_revenue,total_revenue,daily_revenue,daily_level_advanced,level_to_revenue_ratio,revenue_per_level,revenue_first_vs_last,first_day_revenue_ratio,total_revenue_first_day,avg_daily_revenue,last_active_day,recency_score,frequency_score,monetary_score
0,0.112846,Android,mobile,0.093511,0.000744,7.0,3.314099,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2,3,131.0,0.0,0.0,0.0,0.333333,1.333333,4.0,0.25,0.0,1.0,0.0,0.0,12,4.0,0.0,0.0
1,0.019452,Android,mobile,0.257037,0.010128,2.0,1.681524,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.008674,0.0,0.010218,0.0,0.0,0.0,0.0,0.0,0.0,0.018892,4,15,885.0,0.018892,0.0,0.018892,0.203778,3.2,15.703339,0.063681,0.0,0.981459,0.0,0.001259,9,3.0,0.0,2.0
2,0.127643,Android,mobile,0.093511,0.003181,2.0,10.71875,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,3,157.0,0.0,0.0,0.0,0.5,2.0,4.0,0.25,0.0,1.0,0.0,0.0,0,1.0,0.0,0.0
3,0.0038,iOS,mobile,0.413869,0.007599,5.0,5.1,0.00215,0.019159,0.025341,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04665,3,36,1054.0,0.04665,0.0,0.04665,0.261663,9.25,35.350871,0.028288,0.019159,0.973734,0.019159,0.00311,2,2.0,0.0,2.0
4,0.024117,Android,mobile,0.086512,0.003083,2.0,2.091409,0.01468,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01468,1,15,940.0,0.01468,0.0,0.01468,0.50734,8.0,15.768523,0.063417,0.0,0.985533,0.0,0.000979,0,1.0,0.0,2.0


In [67]:
# OneHotEncode yapmak istediğiniz kolonlar
ohe_cols = ["platform", "device_category"]

# OneHotEncoder'ı başlat (sparse_output=False ile dense matrix döndürmesini sağlıyoruz)
ohe = OneHotEncoder(sparse_output=False)

# Train seti için fit ve transform işlemi
train_ohe = ohe.fit_transform(train[ohe_cols])

# Test seti için sadece transform işlemi
test_ohe = ohe.transform(test[ohe_cols])

# Yeni sütun isimlerini alıyoruz
ohe_columns = ohe.get_feature_names_out(ohe_cols)

# One-hot encoded verileri DataFrame'e çeviriyoruz
train_encoded = pd.DataFrame(train_ohe, columns=ohe_columns, index=train.index)
test_encoded = pd.DataFrame(test_ohe, columns=ohe_columns, index=test.index)

# Orijinal verilerle birleştiriyoruz ve eski sütunları kaldırıyoruz
train = pd.concat([train.drop(ohe_cols, axis=1), train_encoded], axis=1)
test = pd.concat([test.drop(ohe_cols, axis=1), test_encoded], axis=1)

In [68]:
train.head()

Unnamed: 0,country,device_brand,device_model,ad_network,first_prediction,AdRevenueD0,AdRevenueD1,AdRevenueD2,AdRevenueD3,AdRevenueD4,AdRevenueD5,AdRevenueD6,AdRevenueD7,AdRevenueD8,AdRevenueD9,AdRevenueD10,AdRevenueD11,AdRevenueD12,AdRevenueD13,AdRevenueD14,AdRevenueD15,TARGET,total_retention,total_level_advanced,total_level_duration,total_ad_revenue,total_iap_revenue,total_revenue,daily_revenue,daily_level_advanced,level_to_revenue_ratio,revenue_per_level,revenue_first_vs_last,first_day_revenue_ratio,total_revenue_first_day,avg_daily_revenue,last_active_day,recency_score,frequency_score,monetary_score,platform_Android,platform_iOS,device_category_mobile,device_category_tablet
0,0.112846,0.093511,0.000744,7.0,3.314099,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2,3,131.0,0.0,0.0,0.0,0.333333,1.333333,4.0,0.25,0.0,1.0,0.0,0.0,12,4.0,0.0,0.0,1.0,0.0,1.0,0.0
1,0.019452,0.257037,0.010128,2.0,1.681524,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.008674,0.0,0.010218,0.0,0.0,0.0,0.0,0.0,0.0,0.018892,4,15,885.0,0.018892,0.0,0.018892,0.203778,3.2,15.703339,0.063681,0.0,0.981459,0.0,0.001259,9,3.0,0.0,2.0,1.0,0.0,1.0,0.0
2,0.127643,0.093511,0.003181,2.0,10.71875,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,3,157.0,0.0,0.0,0.0,0.5,2.0,4.0,0.25,0.0,1.0,0.0,0.0,0,1.0,0.0,0.0,1.0,0.0,1.0,0.0
3,0.0038,0.413869,0.007599,5.0,5.1,0.00215,0.019159,0.025341,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04665,3,36,1054.0,0.04665,0.0,0.04665,0.261663,9.25,35.350871,0.028288,0.019159,0.973734,0.019159,0.00311,2,2.0,0.0,2.0,0.0,1.0,1.0,0.0
4,0.024117,0.086512,0.003083,2.0,2.091409,0.01468,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01468,1,15,940.0,0.01468,0.0,0.01468,0.50734,8.0,15.768523,0.063417,0.0,0.985533,0.0,0.000979,0,1.0,0.0,2.0,1.0,0.0,1.0,0.0


## 4) Modelling

In [69]:
X = train.drop(["TARGET"], axis=1)
y = train["TARGET"]

In [70]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=13)

In [71]:
lr = LinearRegression()
lasso = Lasso(random_state=13)
ridge = Ridge(random_state=13)

In [72]:
# Model listesini tanımlama
model_list = [lr, lasso, ridge]

# Sonuçları depolamak için listeler
model_name_list = []
rmse_list = []

# K-Fold cross-validation
kf = KFold(n_splits=5, random_state=42, shuffle=True)

# Modelleri değerlendirme
for model in model_list:
    
    model_cv = cross_val_score(model,
                               X_train,
                               y_train,
                               cv=kf,
                               scoring="neg_mean_squared_error", 
                               n_jobs=-1)
    
    # Negatif hata olduğu için pozitife çevir
    rmse = np.sqrt(-model_cv.mean())
    
    model_name_list.append(model.__class__.__name__)
    rmse_list.append(rmse)
    
    print(f"{model.__class__.__name__} cross validation RMSE score: {rmse}")
    print("-" * 50)

LinearRegression cross validation RMSE score: 1.4260910449944393
--------------------------------------------------
Lasso cross validation RMSE score: 1.7147453814802376
--------------------------------------------------
Ridge cross validation RMSE score: 1.4257246650991315
--------------------------------------------------
