# PREDICTING NEWSLETTER SUBSCRIPTION USING PLAYTIME AND EXPERIENCE

## Introduction

### Background
Video games are more than just entertainment. Video games provide communities, hobbies, and even research tools for many users. In order to keep players connected and engaged, game developers often send out newsletters with game updates, special events, and new content. But not everyone signs up. As such, we wish to figure out what kinds of players are most likely to subscribe to game newsletters.

One game that remains a staple for all players is a popular sandbox video game known as Minecraft. The game allows players explore, build, and interact in a block-based virtual world.

In this project, we use real data from a Minecraft server run up by researchers at UBC. The dataset includes basic information over players, such as their average playtime, age, and whether they subscribed to the newsletter, etc. By exploring this data, we hope to find patterns that can help predict who’s likely to subscribe—something that could be really useful for game devs trying to better reach their audience.

### Question
Can the total playtime and age predict whether a player subscribes to the newsletter in the Minecraft dataset?

### Data Description
This project will require the use of the `players.csv` dataset which contains a list of all unique players, including data about each player.
The dataset contains 196 observations, and 7 variables.

The variables are structured as follows:

- `experience`: Character: How experienced the player is.
- `subscribe`: Logical: If the player is subscribed to the game newsletter.
- `hashedEmail`: Character: Encrypted email of player.
- `played_hours`: Double: Total number of hours the player has spent playing the game.
- `name`: Character: Name of the player.
- `gender`: Character: Gender of the player.
- `Age`: Double: Age of the player.

### Methods and Results

## Step 1: Loading Libraries and Datasets
We first load the necessary libraries needed and set the height and width for the plots. As well as loading the `players.csv` dataset that we will be using for the rest of the analysis.

In [None]:
library(tidyverse)
library(repr)
library(tidymodels)
options(repr.plot.width = 10, repr.plot.height = 6)

In [None]:
players <- read_csv("data/players.csv")
players

## Step 2: Wrangling and Cleaning Data

In preparation for data analysis, we wrangled and cleaned the data to only contain the necessary variables that are relevant to our analysis.

In [None]:
wrangled_players <- players |>
# Selected only the necessary variables
    select(played_hours, Age, subscribe)|>
# Arranged the played_hours data to start from highest playtime to lowest
    arrange(desc(played_hours))

clean_players <- wrangled_players |>
# Filtered to skip any rows of played_hours and Age that had missing data (N/A)
    filter(!is.na(played_hours), !is.na(Age))|>
# Mutated subscribe to make it into a factor
    mutate(subscribe = as.factor(subscribe))
clean_players

## Step 3: Summarize the data


In [None]:
sum_ <- can_seniors |>
  summarize(
    pop_mean = mean(age),
    pop_med = median(age),
    pop_sd = sd(age)
  )
pop_parameters