# Data Science Individual Planning Stage
**By: Ricky Shi**

---

## Data Description

- For this project, I will be working with two datasets: `players.csv` and `sessions.csv` which come from a Minecraft research server managed by UBC’s Computer Science department. The data includes player demographics and in-game activity.
- To answer my project question, I will be primarily using the `players.csv` dataset to perform various analyses

**Dataset Overview**
| Dataset | Rows (Observations) | Columns (Variables) | Description |
|----------|------|----------|--------------|
| `players.csv` | 196 | 7 | Contains hashed emails which will act as unique IDs for each player and demographics such as age, gender, total playtime in hours, and game newsletter subscription status |
| `sessions.csv` | 1535 | 5 | Contains session-level data, including hashed emails, start and end of gaming sessions, and the timestamp representations of the start and stop times|

**Potential Issues:**
- Missing age data from some players which could reduce the size of usable data for analysis
- Outliers in play duration (such as idle sessions) could cause inaccuracies in actual playtime 
- Imbalance in newsletter subscription (few “false” cases) could decrease the accuracy of predictions
- Non-uniform scaling across predictors, which will require standardization 

In [1]:
library(tidyverse)
library(tidymodels)

── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.4     [32m✔[39m [34mreadr    [39m 2.1.5
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.1
[32m✔[39m [34mggplot2  [39m 3.5.1     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.3     [32m✔[39m [34mtidyr    [39m 1.3.1
[32m✔[39m [34mpurrr    [39m 1.0.2     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors
── [1mAttaching packages[22m ────────────────────────────────────── tidymodels 1.1.1 ──

[32m✔[39m [34mbroom       [39m 1.0.6     [32m✔[39m [34mrsample     [39