In [1]:
# 1. Importing Data
library(tidyverse)
library(ggplot2)
library(dplyr)
library(readr)

players <- read_csv("players.csv")
sessions <- read_csv("sessions.csv")


── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.4     [32m✔[39m [34mreadr    [39m 2.1.5
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.1
[32m✔[39m [34mggplot2  [39m 3.5.1     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.3     [32m✔[39m [34mtidyr    [39m 1.3.1
[32m✔[39m [34mpurrr    [39m 1.0.2     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors
[1mRows: [22m[34m196[39m [1mColumns: [22m[34m7[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m


1. Importing Data

The datasets were successfully imported using the readr package. The `players` data contains 196 rows and 7 columns, and the `sessions` data contains 1535 rows and 5 columns.


## 2. Data Description and Definition

For this project, I’m working with two related datasets called `players.csv` and `sessions.csv`.  
They come from a Minecraft research server managed by a UBC research group that studies how people play games.  
Together, they describe how **196 unique players** interacted with the game and when they played, across a total of **1,535 recorded play sessions**.

The **players** file contains demographic and behavioral information about each player, such as their age, experience level, and total playtime.  
The **sessions** file records detailed logs of individual play sessions, including when each one started and ended.  
Below is a summary of what each column represents.

### `players.csv`

| Variable | Type | Description |
|-----------|------|-------------|
| experience | Character | The player’s self-reported experience level (e.g., Pro, Veteran, Amateur). |
| subscribe | Logical | TRUE if the player subscribed to the game newsletter, FALSE otherwise. |
| hashedEmail | Character | A unique anonymous ID used to link the player and session data. |
| played_hours | Numeric | Total hours the player spent on the server. |
| name | Character | The player’s in-game name. |
| gender | Character | Player’s gender (Male or Female). |
| Age | Numeric | The player’s age in years. |

### `sessions.csv`

| Variable | Type | Description |
|-----------|------|-------------|
| hashedEmail | Character | Identifier used to connect each session to a player. |
| start_time | Character | The time when a player started a session. |
| end_time | Character | The time when the session ended. |
| original_start_time | Numeric | Start time in timestamp format. |
| original_end_time | Numeric | End time in timestamp format. |

In total, `players.csv` has **196 rows and 7 columns**, and `sessions.csv` has **1,535 rows and 5 columns**.  
Each player can appear more than once in the sessions dataset since one player can have multiple play sessions.

I also noticed that there are more `FALSE` than `TRUE` values in the `subscribe` column, meaning fewer players chose to subscribe to the newsletter.  
Some players have much higher total play hours than others, which might indicate **outliers**.  
Since this dataset comes from a voluntary research server, it may not represent all Minecraft players.  
However, it still provides useful information to explore what kinds of players are more likely to subscribe and how their gameplay behavior may relate to that decision.
