## **Data Description**

The dataset comprises two files: **players.csv** and **sessions.csv**, which contain information on PLAICRAFT players and their game sessions. This data helps analyze player activity, experience levels, and engagement trends. However, for this particular project, I will be using **players.csv** exclusively.  

#### **Players Dataset (`players.csv`)**  
This dataset provides player-specific details, including experience level, subscription status, and demographic attributes.  

- **`experience` (String)**: Categorized into **Pro, Veteran, Amateur, or Regular**, reflecting a player's skill level.  
- **`subscribe` (Boolean)**: TRUE/FALSE value indicating whether the player is a PLAICRAFT subscriber.  
- **`hashedEmail` (String)**: Unique hashed identifier linking players to session data.  
- **`played_hours` (Float)**: Number representing total hours played.  
- **`name` (String)**: The player’s first name.  
- **`gender` (String)**: Player's gender categorized as **Male, Female, and Non-binary**.  
- **`Age` (Integer)**: Player’s age in years. There is 2 missing data.

#### **Data Summary and Issues**  
The dataset consists of **196 player records**. It might contain missing values in key fields such as **'Age'**, which could affect calculations. We may need to consider filtering the data before performing analysis.

## **Questions**

In this project, I will be answering the broad question of:  
**We would like to know which "kinds" of players are most likely to contribute a large amount of data so that we can target those players in our recruiting efforts.** This will help understand which characteristics contribute most to player experience levels, allowing for better matchmaking and personalized game experiences.  

#### **Specific Question**  
Can an Regular player's `age` be used to predict their total accumulated playing hours?  

#### **Hypothesis**  
- Players with higher **played hours** tend to have a higher experience level.  
- Older players may approach the game differently, leading to variations in skill progression compared to younger ones.  
- Subscription status and gender may reflect different levels of engagement or playstyle, which could be linked to skill development.  
- The dataset provides key player attributes, such as `played_hours`, `age`, `gender`, and `subscribe`, which allow for exploring potential trends and relationships between player characteristics and experience levels.  

#### **Plan on Data Wrangling** 
- Load and read **players.csv**.  
- Remove datasets with missing values to ensure data completeness.  
- Select relevant variables: `experience`, `age`, and `played_hours`.  
- Filter the data to include only players with `experience` classified as **Regular**.

## **Exploratory Data Analysis and Visualization**

In [None]:
library(repr)
library(tidyverse)
source("cleanup.R")

In [None]:
players <- read_csv("data/players.csv")

players_tidy <- players |>
                    select(experience, played_hours, Age) |>
                    filter(expreience == "Regular") |>
                    drop_na(Age)

players_tidy

quantitative_summary <- players |>
                    select(played_hours, Age) |>
                    summarise(
                        mean_age = mean(Age, na.rm = TRUE),
                        mean_played_hours = mean(played_hours, na.rm = TRUE)
                    )

quantitative_summary