# (1) Data Description

`players.csv` is a list of all unique players, including data about each player.

`experience`: player's experience level

`subscirbe`: whether the player subscribed to the newsletter.

`hashedEmail`: unique identifier linking to `session.csv`.

`played_hours`: total hour played.

`name`: player's name.

`gender`: player's gender.

`Age`: player's age.

This dataset gives demographic and profile-level data.

Potential issues: Some players have 0 hour played hours.

`session.csv` is a list of individual play sessions by each player, including data about the session.

`hashedEmail`: Unique identifier linking to `players.csv`.

`start_time`: session start time.

`end_time`: session end time.

`original_start_time`: Unix timestamp of start time.

`original_end_time`: Unix timestamp of end time.

This dataset tracks session-level data.

Potential issues: `start_time` and `end_time` are currently stored as `chr`.

# (2) Questions

The broad question our group chose: Question 2

The specific question: Can player's experience level, gender and age predict the total play time.

The goal of this analysis is to understand how player characteristics influence their played hour. Here, total play time serves as a proxy for the amount of data contributed by each player. This is an appropriate response variable because players who spend more time on the server produce more in-game logs, interactions, and behavioral dataâ€”making them more valuable for the research project.

The explanatory variables are:
1. `experience`: captures the player's skill or familiarity with the game.
2. `gender`: allows us to explore potential behavioral differences between male and female players.
3. `Age`: represent player maturity and available time to play. Older players may play less due to commitments, while younger players may play more.

# (3) Exploratory Data Analysis and Visualization

In [None]:
library(tidyverse)

**3.1 Load data**

Demonstrate that the dataset can be loaded into R.

In [None]:
players <- read_csv("players.csv")
head(players, 6)

**3.2 Wrangle the data into a tidy format**

Do the minimum necessary wrangling to turn the data into a tidy format. 

In [None]:
new_players <- players |>
    select(experience, gender, Age, played_hours) |>
    group_by(experience, gender, Age) |>
    summarize(played_hours = sum(played_hours), 
              count = n())

head(new_players, 6)

**3.3 Mean value of quantitative variables**

Compute the mean value for each quantitative variable in the players.csv data set. Report the mean values in a table format.

In [None]:
players_mean <- players |>
    summarize(across(where(is.numeric), mean, na.rm = TRUE))

players_mean