# <u> **Player Subscription Final Report**


By Xuan Tung Duong, Sara Garcia Rubiera, Daniel Samari and Demelza Awogu


### <u> **Introduction**

The Pacific Laboratory for Artificial Intelligence (PLAI) is a research group at UBC collecting data about how people play video games. In this particular project, MineCraft serverLinks were set up to an external site where players' actions are recorded as they engage with the game [_**INSERT REFERENCE**_]. The issue that arises is that they want to make sure that they have enough resources to ensure optimal experiences for players. A research question to we want to answer is:

_Question 1: What player characteristics and behaviours are most predictive of subscribing to a game-related newsletter, and how do these features differ between various player types?_

In this project, we wanted to answer a more specified research question from the one above, and hence our research question is as follows:

**Can we predict the  the likelihood of a player subscribing to the game newsletter based on their age, and their experience level?**



### <u> **Data Description**

In the table below, there are 9 columns and 196 rows.
The variables describe the following: 

1.`experience` is a categorical variable, with levels of experience, where `Veteran` is the highest, followed by `Pro`, `Regular` and `Amateur`.

2. `subscribe` is a booleen variable, indicating the whether players have subscribed to the newsletter of the game.

3. `hashedEmail` is an ID variable that refers to the players email.

4. `played_hours` is a quantitiative variable - refers to the number of hours played on the game.

5. `name` refers to the name of the player.

6. `gender` indicates that of the player, and `age` is their age.

7. 

In [4]:
#Input libraries 
import pandas as pd 
import numpy as np 
import altair as alt 

In [5]:
#Import urls
players_url = "https://drive.google.com/uc?id=1Mw9vW0hjTJwRWx0bDXiSpYsO3gKogaPz"
players = pd.read_csv(players_url)
players

Unnamed: 0,experience,subscribe,hashedEmail,played_hours,name,gender,age,individualId,organizationName
0,Pro,True,f6daba428a5e19a3d47574858c13550499be23603422e6...,30.3,Morgan,Male,9,,
1,Veteran,True,f3c813577c458ba0dfef80996f8f32c93b6e8af1fa9397...,3.8,Christian,Male,17,,
2,Veteran,False,b674dd7ee0d24096d1c019615ce4d12b20fcbff12d79d3...,0.0,Blake,Male,17,,
3,Amateur,True,23fe711e0e3b77f1da7aa221ab1192afe21648d47d2b4f...,0.7,Flora,Female,21,,
4,Regular,True,7dc01f10bf20671ecfccdac23812b1b415acd42c2147cb...,0.1,Kylie,Male,21,,
...,...,...,...,...,...,...,...,...,...
191,Amateur,True,b6e9e593b9ec51c5e335457341c324c34a2239531e1890...,0.0,Bailey,Female,17,,
192,Veteran,False,71453e425f07d10da4fa2b349c83e73ccdf0fb3312f778...,0.3,Pascal,Male,22,,
193,Amateur,False,d572f391d452b76ea2d7e5e53a3d38bfd7499c7399db29...,0.0,Dylan,Prefer not to say,17,,
194,Amateur,False,f19e136ddde68f365afc860c725ccff54307dedd13968e...,2.3,Harlow,Male,17,,


### <u> **Methods and Results**

In [1]:
# load libraries
import numpy as np
import pandas as pd
import altair as alt
from sklearn import set_config
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.model_selection import GridSearchCV, cross_validate
from sklearn.compose import make_column_transformer
from sklearn.pipeline import make_pipeline
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

set_config(transform_output="pandas")

In [2]:
# load dfs
players_url = 'https://drive.google.com/uc?id=1Mw9vW0hjTJwRWx0bDXiSpYsO3gKogaPz&export=download'
players = pd.read_csv(players_url)
players.drop(columns=['individualId','organizationName'],inplace = True)

sessions_url = 'https://drive.google.com/uc?id=14O91N5OlVkvdGxXNJUj5jIsV5RexhzbB&export=download'
sessions = pd.read_csv(sessions_url)

players

Unnamed: 0,experience,subscribe,hashedEmail,played_hours,name,gender,age
0,Pro,True,f6daba428a5e19a3d47574858c13550499be23603422e6...,30.3,Morgan,Male,9
1,Veteran,True,f3c813577c458ba0dfef80996f8f32c93b6e8af1fa9397...,3.8,Christian,Male,17
2,Veteran,False,b674dd7ee0d24096d1c019615ce4d12b20fcbff12d79d3...,0.0,Blake,Male,17
3,Amateur,True,23fe711e0e3b77f1da7aa221ab1192afe21648d47d2b4f...,0.7,Flora,Female,21
4,Regular,True,7dc01f10bf20671ecfccdac23812b1b415acd42c2147cb...,0.1,Kylie,Male,21
...,...,...,...,...,...,...,...
191,Amateur,True,b6e9e593b9ec51c5e335457341c324c34a2239531e1890...,0.0,Bailey,Female,17
192,Veteran,False,71453e425f07d10da4fa2b349c83e73ccdf0fb3312f778...,0.3,Pascal,Male,22
193,Amateur,False,d572f391d452b76ea2d7e5e53a3d38bfd7499c7399db29...,0.0,Dylan,Prefer not to say,17
194,Amateur,False,f19e136ddde68f365afc860c725ccff54307dedd13968e...,2.3,Harlow,Male,17


In [3]:
# give experience numeric values
experience_value = {'Beginner':0,'Amateur':1,'Regular':2,'Veteran':3,'Pro':4}

experience_to_val = players['experience'].to_numpy().copy()

for old_val, new_val in experience_value.items():
    experience_to_val[experience_to_val == old_val] = new_val

players = players.assign(experience_value=experience_to_val)
players

Unnamed: 0,experience,subscribe,hashedEmail,played_hours,name,gender,age,experience_value
0,Pro,True,f6daba428a5e19a3d47574858c13550499be23603422e6...,30.3,Morgan,Male,9,4
1,Veteran,True,f3c813577c458ba0dfef80996f8f32c93b6e8af1fa9397...,3.8,Christian,Male,17,3
2,Veteran,False,b674dd7ee0d24096d1c019615ce4d12b20fcbff12d79d3...,0.0,Blake,Male,17,3
3,Amateur,True,23fe711e0e3b77f1da7aa221ab1192afe21648d47d2b4f...,0.7,Flora,Female,21,1
4,Regular,True,7dc01f10bf20671ecfccdac23812b1b415acd42c2147cb...,0.1,Kylie,Male,21,2
...,...,...,...,...,...,...,...,...
191,Amateur,True,b6e9e593b9ec51c5e335457341c324c34a2239531e1890...,0.0,Bailey,Female,17,1
192,Veteran,False,71453e425f07d10da4fa2b349c83e73ccdf0fb3312f778...,0.3,Pascal,Male,22,3
193,Amateur,False,d572f391d452b76ea2d7e5e53a3d38bfd7499c7399db29...,0.0,Dylan,Prefer not to say,17,1
194,Amateur,False,f19e136ddde68f365afc860c725ccff54307dedd13968e...,2.3,Harlow,Male,17,1


In [4]:
players_train, players_test = train_test_split(
    players, train_size=0.80, random_state=2025, stratify=players['subscribe']
)

players_preprocessor = make_column_transformer(
    (StandardScaler(), ['played_hours', 'age','experience_value']),
    remainder='passthrough',
    verbose_feature_names_out=False
)

players_pipe = make_pipeline(players_preprocessor, KNeighborsClassifier())

param_grid = { 'kneighborsclassifier__n_neighbors': range(1,31,1) }

players_search = GridSearchCV(
    estimator=players_pipe,
    param_grid=param_grid,
    cv=5,
    return_train_score=True,
    n_jobs=-1
)

players_search.fit(
    players_train[['played_hours', 'age','experience_value']],
    players_train['subscribe']
)

cv_results = pd.DataFrame(players_search.cv_results_)
cv_results.sort_values(by='rank_test_score').head(5).reset_index()

Unnamed: 0,index,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_kneighborsclassifier__n_neighbors,params,split0_test_score,split1_test_score,split2_test_score,...,mean_test_score,std_test_score,rank_test_score,split0_train_score,split1_train_score,split2_train_score,split3_train_score,split4_train_score,mean_train_score,std_train_score
0,6,0.004343,6.3e-05,0.004609,1.3e-05,7,{'kneighborsclassifier__n_neighbors': 7},0.65625,0.774194,0.709677,...,0.750605,0.062375,1,0.790323,0.76,0.776,0.776,0.744,0.769265,0.015864
1,10,0.004501,0.000301,0.004664,4e-05,11,{'kneighborsclassifier__n_neighbors': 11},0.6875,0.774194,0.741935,...,0.750403,0.039516,2,0.774194,0.76,0.752,0.768,0.736,0.758039,0.013313
2,13,0.006487,0.003973,0.004684,3.3e-05,14,{'kneighborsclassifier__n_neighbors': 14},0.6875,0.741935,0.741935,...,0.750403,0.048928,2,0.774194,0.76,0.76,0.768,0.744,0.761239,0.010137
3,19,0.004393,7.5e-05,0.004701,7.7e-05,20,{'kneighborsclassifier__n_neighbors': 20},0.6875,0.741935,0.774194,...,0.750403,0.039516,2,0.75,0.752,0.752,0.76,0.744,0.7516,0.005122
4,21,0.004465,0.00021,0.004664,2.4e-05,22,{'kneighborsclassifier__n_neighbors': 22},0.6875,0.741935,0.774194,...,0.750403,0.039516,2,0.774194,0.744,0.744,0.76,0.744,0.753239,0.012173


In [5]:
k_point = alt.Chart(cv_results.sort_values(by='rank_test_score').head(1).reset_index()).mark_point(filled = True).encode(
    x = alt.X('param_kneighborsclassifier__n_neighbors'),
    y = alt.Y('mean_test_score'),
    color = alt.Color('param_kneighborsclassifier__n_neighbors:N',title='Best K-Value').scale(scheme="set1")
)

cross_val_plot= alt.Chart(cv_results).mark_line(point=True).encode(
    x=alt.X("param_kneighborsclassifier__n_neighbors")
        .title('Number of Neighbors (k)'),
    y=alt.Y('mean_test_score')
        .title('Accuracy')
        .scale(zero=False)
)

cross_val_plot + k_point

In [6]:
best_k = cv_results.sort_values(by='rank_test_score').reset_index().loc[0,'param_kneighborsclassifier__n_neighbors']
players_spec = make_pipeline(players_preprocessor,KNeighborsClassifier(n_neighbors = best_k))

players_fit = players_spec.fit(players_train[['played_hours', 'age','experience_value']],players_train['subscribe'])
players_pred = players_fit.predict(players_test[['played_hours', 'age','experience_value']])
players_eval = players_test.assign(actual=players_test['subscribe'],predicted=players_pred)
players_conf_mat = pd.crosstab(players_eval['actual'], players_eval['predicted'])
players_conf_mat

predicted,False,True
actual,Unnamed: 1_level_1,Unnamed: 2_level_1
False,0,11
True,1,28


In [7]:
print(classification_report(players_test['subscribe'], players_pred))

              precision    recall  f1-score   support

       False       0.00      0.00      0.00        11
        True       0.72      0.97      0.82        29

    accuracy                           0.70        40
   macro avg       0.36      0.48      0.41        40
weighted avg       0.52      0.70      0.60        40



### <u> **Discussion**

### <u> **References**

Pacific Laboratory for Artificial Intelligence. (n.d.). PLAI group website. Retrieved November 30, 2025, from https://plai.cs.ubc.ca/