# **Predicting Newsletter Subscription from Player Characteristics**

**DSCI 100 Minecraft Project - Daniel Kwok (46358065)**

## Introduction

The rise of online gaming platforms has led to an explosion of behavioral data, offering unique opportunities to study player engagement at scale. In this project, we collaborate with a research group from the University of British Columbia (UBC), led by Dr. Frank Wood, which operates a Minecraft server that tracks player activity. As the platform grows, understanding who engages with the game—and how—is essential for resource planning and targeted outreach. A key interest is identifying what kinds of players are most likely to subscribe to a game-related newsletter. Newsletter subscriptions often signal deeper engagement and interest, making this an important outcome to predict. 

Thus my question for this project is: **Is a player likely to subscribe to the newsletter, based on age and played hours?** To answer this, we will use a knn classifier algorithm.

By analyzing the characteristics of hours played and experience, we aim to uncover patterns that differentiate subscribers from non-subscribers. These insights can support the development of more personalized and efficient player communication strategies.

### Data Description
We are using the players.csv and the sessions.csv data sets from DSCI 100 to predict if a player from this server will subscribe or not. 

Two datasets were provided:
- `players.csv`: Information about each unique player (e.g., age, gender, experience level, played hours, and subscription status).
- `sessions.csv`: Detailed records of gameplay sessions, including start and end times.

Description of columns in players.csv:
- **experience**: A factor indicating the player’s experience level (e.g., Pro, Veteran, Amateur, Regular)
- **subscribe**: A logical value (TRUE/FALSE) indicating whether the player subscribed to the newsletter
- **hashedEmail**: A unique, anonymized identifier for each player used to join datasets
- **played_hours**: The total number of hours the player has reported or accumulated playing the game
- **name**: The first name of the player (likely pseudonymized or fictional for privacy)
- **gender**: The gender of the player (e.g., Male, Female)
- **Age**: The age of the player, in years (numeric)

Description of columns in sessions.csv:
- **hashedEmail**: A unique anonymized player ID, used to join with player data
- **start_time**: The timestamp when a session began (character format, to be parsed)
- **end_time**: The timestamp when the session ended (character format)
- **original_start_time**: The same start time as a Unix timestamp (numeric)
- **original_end_time**: The same end time as a Unix timestamp (numeric)


## Methods and Results


Steps: 

1) Loading essential libraries and reading the data
2) 

## Step 1
Loading essential libraries, reading the data and combining the data

In [19]:
# Load essential libraries for data manipulation, datetime processing, and modeling
library(tidyverse)
library(lubridate)
library(tidymodels)

# Read in the player dataset and standardize column names by replacing spaces with underscores
players <- read_csv("data/players.csv") %>%
  rename_with(~ str_replace_all(., " ", "_"))

# Read in the session dataset and standardize column names similarly
sessions <- read_csv("data/sessions.csv") %>%
  rename_with(~ str_replace_all(., " ", "_"))


combined_data <- left_join(players, sessions, by = "hashedEmail")
head(combined_data)

[1mRows: [22m[34m196[39m [1mColumns: [22m[34m7[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (4): experience, hashedEmail, name, gender
[32mdbl[39m (2): played_hours, Age
[33mlgl[39m (1): subscribe

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.
[1mRows: [22m[34m1535[39m [1mColumns: [22m[34m5[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (3): hashedEmail, start_time, end_time
[32mdbl[39m (2): original_start_time, original_end_time

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


experience,subscribe,hashedEmail,played_hours,name,gender,Age,start_time,end_time,original_start_time,original_end_time
<chr>,<lgl>,<chr>,<dbl>,<chr>,<chr>,<dbl>,<chr>,<chr>,<dbl>,<dbl>
Pro,True,f6daba428a5e19a3d47574858c13550499be23603422e6a0ee9728f8b53e192d,30.3,Morgan,Male,9,08/08/2024 00:21,08/08/2024 01:35,1723080000000.0,1723080000000.0
Pro,True,f6daba428a5e19a3d47574858c13550499be23603422e6a0ee9728f8b53e192d,30.3,Morgan,Male,9,09/09/2024 22:30,09/09/2024 22:37,1725920000000.0,1725920000000.0
Pro,True,f6daba428a5e19a3d47574858c13550499be23603422e6a0ee9728f8b53e192d,30.3,Morgan,Male,9,08/08/2024 02:41,08/08/2024 03:25,1723080000000.0,1723090000000.0
Pro,True,f6daba428a5e19a3d47574858c13550499be23603422e6a0ee9728f8b53e192d,30.3,Morgan,Male,9,10/09/2024 15:07,10/09/2024 15:29,1725980000000.0,1725980000000.0
Pro,True,f6daba428a5e19a3d47574858c13550499be23603422e6a0ee9728f8b53e192d,30.3,Morgan,Male,9,05/05/2024 22:21,05/05/2024 23:17,1714950000000.0,1714950000000.0
Pro,True,f6daba428a5e19a3d47574858c13550499be23603422e6a0ee9728f8b53e192d,30.3,Morgan,Male,9,06/04/2024 22:24,06/04/2024 23:33,1712440000000.0,1712450000000.0


## Discussion

## References