#  Can Played Hours, Age, and Gender Predict Newsletter Subscription?
## A Data-Science Analysis of the UBC Minecraft Research-Server Logs
*May Wei· DSCI _100 · UBC, 2025-06-17*
## Link to github repository

## 1. Introduction
### 1.1 Background
A research group at the University of British Columbia has launched a Minecraft server that records how players behave in virtual environments.  The server collects rich in-game activity data, which can be used to study user engagement and support research in human-computer interaction and AI.

To maintain engagement and allocate server resources effectively, the research team uses a game-related newsletter.  Predicting which players are likely to subscribe can help with targeted recruitment and infrastructure planning.

In the commercial gaming industry, predictive marketing is widely used to retain players by sending customized offers to those at risk of leaving (Ghantasala, 2024).  Similarly, understanding which players are more inclined to subscribe to game newsletters can improve outreach and user management.

This project investigates whether a player’s demographic information (e.g., age, gender, experience) and gameplay patterns (e.g., session frequency, average session length) can predict newsletter subscription status.

### Research Question 
 Can played hours, age, and gender predict newsletter subscription in players?

The response variable is the binary flag **`subscribed`**, and the explanatory variables are  
1. **`hours_played`** – cumulative play-time (h),  
2. **`age`** – self-reported age (years),  
3. **`gender`** – self-reported gender identity.

## 1.2 Data Description

In [None]:
library(tidyverse)

In [None]:
player <- read_csv("players.csv")
head(player)

Since our variables are played hours, age, and gender, so we should remove those we don't need.

In [None]:
clean_player <- select(player, -hashedEmail, -name)
head(clean_player)

In [None]:
gender <- clean_player |>
  group_by(gender) |>
  summarize(count = n())

gender

We have a total of 196 rows; there are multiple genders in the data. We combine a few types of gender into "gender others".

In [None]:
install.packages("mltools")
library(mltools)
library(data.table)

In [None]:
clean_player$gender <- as.factor(clean_player$gender)
player_1h <- one_hot(as.data.table(clean_player))
head(player_1h)

In [None]:
player_1h |>
  mutate(gender_others = gender_Agender + gender_Other + `gender_Two-Spirited`) |>
  select(-gender_Agender, -gender_Other, -`gender_Two-Spirited`) |>
  head()

In [None]:
clean_player$experience <- as.factor(clean_player$experience)
player_exp <- clean_player |> select(-gender)
player_1h_exp <- one_hot(as.data.table(player_exp))
head(player_1h_exp)