### Libraries

In [5]:
library(tidyverse)

# 1. Data Description

## Players Dataset

The sessions dataset contains **196 observations** and **7 variables**. The data were likely collected partially by a system which recorded players' playtime and also by a survey in which players self-reported their age, gender etc.

### players.csv Variables

| Name        | Type | Meaning                                 | Example                |
|-------------|------|-----------------------------------------|------------------------|
| experience  | chr (character) | Player's experience level out of: <br> - Amateur <br> - Beginner <br> - Regular <br> - Pro <br> - Veteran | Regular|
| subscribe | lgl (logical) | Whether the player is a subscriber (or not) <br> of PLAICraft | True |
| hashedEmail | chr (character)  | Unique identifier for each player which<br>links to the sessions.csv dataset | bfce39c89d6549f2bb94d8   <br>064d3ce69dc3d7e72b38f4   <br>31d8aa0c4bf95ccee6bf |
| played_hours | dbl (double) | The total number of hours played by a player <br> rounded to the nearst 0.1 of an hour | 30.3 |
| name | chr (character) | The player's first name | Morgan |
| gender | chr (character) | The player's gender | Female |
| Age | int (integer) | The player's age in years | 21 |

### players.csv Summary

In [12]:
players <- read_csv("data/players.csv")

[1mRows: [22m[34m196[39m [1mColumns: [22m[34m7[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (4): experience, hashedEmail, name, gender
[32mdbl[39m (2): played_hours, Age
[33mlgl[39m (1): subscribe

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


In [15]:
summarised_players <- players |>
                      summarize(
                          num_players = n(),
                          mean_hours_played =  mean(played_hours, na.rm = TRUE),
                          mean_age = mean(Age, na.rm = TRUE)
                      )

summarised_players

num_players,mean_hours_played,mean_age
<int>,<dbl>,<dbl>
196,5.845918,21.13918


### players.csv Issues

* Two observations contain a missing age variable
* As the age, gender, and name are likely self-reported, users (particularly children) may provide false ages to bypass parental consent or may otherwise misreport their identity.
* As gender was used rather than sex, the variable is not binary and thus harder to quantify