Analyzing Player Behaviour to Predict Newsletter Supscriptions

By: Gurman Gill

In [3]:
library(tidyverse)
library(tidymodels)
library(readr)
library(janitor)
library(knitr)
library(ggplot2)
library(rmarkdown)
theme_set(theme_minimal())
set.seed(2025)

── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.4     [32m✔[39m [34mreadr    [39m 2.1.5
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.1
[32m✔[39m [34mggplot2  [39m 3.5.1     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.3     [32m✔[39m [34mtidyr    [39m 1.3.1
[32m✔[39m [34mpurrr    [39m 1.0.2     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors
── [1mAttaching packages[22m ────────────────────────────────────── tidymodels 1.1.1 ──

[32m✔[39m [34mbroom       [39m 1.0.6     [32m✔[39m [34mrsample     [39

In [4]:
# Introduction

Background: Understanding player behavior is the goal of UBC's Minecraft Research Server Project in order to optimize server resource allocation and engagement tactics.  Whether a player signs up for a newsletter about the game is a crucial engagement indicator.  A player's interest and likelihood of sustained engagement are reflected in their subscriptions, which makes them a useful target for predictive modeling.

Question: Can we predict whether a player will subscribe to the newsletter based on their age, experience level, gender, play duration, and session behavior using KNN classification? 

Data Description: 

We have two datasets provided by the Minecraft Research Server Project team.

players.csv:
- Rows: 196 players
- Columns: 7 variables


sessions.csv:
- Rows - 1535 sessions
- Columns - 5 variables

Below is a table of the relavent variables

| Variable              | Type        | Description                                                    |
|----------------------|-------------|----------------------------------------------------------------|
| `experience`          | Categorical | Self-reported experience level (Amateur, Regular, Pro, etc.)   |
| `subscribe`           | Boolean     | Whether player subscribed to the newsletter                    |
| `hashedEmail`         | Identifier  | Anonymized unique player ID                                    |
| `played_hours`        | Numeric     | Total hours recorded by Minecraft for each player              |
| `name`                | String      | Player's name (not used in modeling)                           |
| `gender`              | Categorical | Player's reported gender                                       |
| `Age`                 | Numeric     | Age of the player (2 values missing originally)                |
| `start_time`          | Numeric     | Start time of sessions                                         |
| `end_time`            | Numeric     | End time of sessions                                           |

Considerations and limitations of data: 
- Self-reported fields like `age`, `gender`, and `experience` may be inaccurate
- Only aggregate features were used; individual session behavior (e.g. time of day, frequency) not explored

In [6]:
# Load Data

players_data <- read_csv("players.csv")
head(players_data)

session_data <- read_csv("

[1mRows: [22m[34m196[39m [1mColumns: [22m[34m7[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (4): experience, hashedEmail, name, gender
[32mdbl[39m (2): played_hours, Age
[33mlgl[39m (1): subscribe

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


experience,subscribe,hashedEmail,played_hours,name,gender,Age
<chr>,<lgl>,<chr>,<dbl>,<chr>,<chr>,<dbl>
Pro,True,f6daba428a5e19a3d47574858c13550499be23603422e6a0ee9728f8b53e192d,30.3,Morgan,Male,9
Veteran,True,f3c813577c458ba0dfef80996f8f32c93b6e8af1fa939732842f2312358a88e9,3.8,Christian,Male,17
Veteran,False,b674dd7ee0d24096d1c019615ce4d12b20fcbff12d79d3c5a9d2118eb7ccbb28,0.0,Blake,Male,17
Amateur,True,23fe711e0e3b77f1da7aa221ab1192afe21648d47d2b4fa7a5a659ff443a0eb5,0.7,Flora,Female,21
Regular,True,7dc01f10bf20671ecfccdac23812b1b415acd42c2147cb0af4d48fcce2420f3e,0.1,Kylie,Male,21
Amateur,True,f58aad5996a435f16b0284a3b267f973f9af99e7a89bee0430055a44fa92f977,0.0,Adrian,Female,17
