# Project Final Report

## Introduction

**Background**  
In recent years, online gaming has produced vast behavioral datasets that offer insights into user engagement and retention. “Minecraft,” as an open‐world game, generates detailed logs of player activity, making it an ideal case for studying how simple demographic factors relate to play behavior. By analyzing total play duration alongside basic player attributes, we aim to uncover patterns that could inform game design and personalized recommendations.

**Research Question**  
Can a player’s age and total play time predict their experience level on the game server?

- **Response variable**: `experience` (factor) – player’s self‐reported expertise level (“Pro”, “Regular”, “Amateur”, “Beginner”)  
- **Predictor variables**:  
  - `age` (numeric; years)  
  - `played_hours` (numeric; total play duration in hours)  

**Data Description**  
We use the `players.csv` dataset exported from the research server. It contains **196** observations and **9** variables:

| variable           | type       | description                                    |
|--------------------|------------|------------------------------------------------|
| experience         | factor     | player’s experience level                      |
| subscribe          | logical    | whether the player holds a subscription        |
| hashedEmail        | character  | anonymized user identifier                     |
| played_hours       | double     | total play duration per player (hours)         |
| name               | character  | player’s display name                          |
| gender             | character  | player’s self‐reported gender                  |
| age                | double     | player’s age in years                          |
| individualId       | logical    | all NA (no values) — will be dropped           |
| organizationName   | logical    | all NA (no values) — will be dropped           |

For this analysis, we will:

1. Convert `experience` to a factor.  
2. Drop the two all‐NA columns (`individualId`, `organizationName`).  
3. Focus only on the three variables needed for prediction: `experience`, `age`, and `played_hours`.  




In [1]:
library(tidyverse)

── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.4     [32m✔[39m [34mreadr    [39m 2.1.5
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.1
[32m✔[39m [34mggplot2  [39m 3.5.1     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.3     [32m✔[39m [34mtidyr    [39m 1.3.1
[32m✔[39m [34mpurrr    [39m 1.0.2     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors


In [2]:
players <- read_csv("players.csv")
head(players)

[1mRows: [22m[34m196[39m [1mColumns: [22m[34m7[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (4): experience, hashedEmail, name, gender
[32mdbl[39m (2): played_hours, Age
[33mlgl[39m (1): subscribe

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


experience,subscribe,hashedEmail,played_hours,name,gender,Age
<chr>,<lgl>,<chr>,<dbl>,<chr>,<chr>,<dbl>
Pro,True,f6daba428a5e19a3d47574858c13550499be23603422e6a0ee9728f8b53e192d,30.3,Morgan,Male,9
Veteran,True,f3c813577c458ba0dfef80996f8f32c93b6e8af1fa939732842f2312358a88e9,3.8,Christian,Male,17
Veteran,False,b674dd7ee0d24096d1c019615ce4d12b20fcbff12d79d3c5a9d2118eb7ccbb28,0.0,Blake,Male,17
Amateur,True,23fe711e0e3b77f1da7aa221ab1192afe21648d47d2b4fa7a5a659ff443a0eb5,0.7,Flora,Female,21
Regular,True,7dc01f10bf20671ecfccdac23812b1b415acd42c2147cb0af4d48fcce2420f3e,0.1,Kylie,Male,21
Amateur,True,f58aad5996a435f16b0284a3b267f973f9af99e7a89bee0430055a44fa92f977,0.0,Adrian,Female,17
