In [None]:
library(tidyverse)
players_data <- read_csv("Data/players.csv")
players_data

In [None]:
players_data_tidy <- players_data |>
select(name,gender,Age,played_hours,experience,subscribe,-hashedEmail, -gender) |>
rename(age = Age) |>
arrange(age)
players_data_tidy

**Data Description**

| Variable | Type | Meaning |
|------------|------|---------|
| experience | factor | Playing experience (pro, |

**Broad Question**
- What player characteristics and behaviours are most predictive of subscribing to a game-related newsletter, and how do these features differ between various player types?

**Specific Question**
- To what extent do player experience, hours played, and age influence a Minecraft player’s likelihood of subscribing to the game’s newsletter?

**How the dataset will assist in answering this question**
- The dataset shows whether or not each individual is subscribed to the newsletter
- Players experiences relects on a player's engagement with Minecraft and can be analyzed in relation to subscription to the newsletter. To see if there is a trend, we plot points on a graph.
- Players age may show how interested different age groups are in Minecraft.
- Players hours played will also show how often a player plays Minecraft, this can be analyzed the same way as the other two we are plotting.

**How to wrangle the dataset**
- Select variables specific to the question being answered
- Tidy the dataset by naming variables in similar matter
- Split the data into training and testing datasets

In [None]:
summary_mean <- players_data |>
summarize(mean_age = mean(Age, na.rm = TRUE),
          mean_played_hours = mean(played_hours, na.rm = TRUE))
summary_table <- summary_mean |>
pivot_longer(cols = everything(),
             names_to = "variable",
             values_to = "mean")
summary_table

In [None]:
options(repr.plot.width = 10, repr.plot.height = 15)
hours_played_plot <- players_data |>
ggplot(aes(x=Age,y=played_hours,color=subscribe)) +
geom_point(size=5, alpha = 0.8) +
labs(x="Age",y="Time played (hrs)",color="Subscribed", title="Play time vs. Age") +
ylim(0.0,50) +
scale_color_brewer(palette = "Set3")
hours_played_plot

In [None]:
library(dplyr)
library(ggplot2)
options(repr.plot.width=10, repr.plot.height = 10)
age_mean <- players_data |>
group_by(experience) |>
summarize(mean_age = mean(Age, na.rm=TRUE))

experience_plot <- ggplot(age_mean, aes(x=experience,y=mean_age,fill=experience)) +
geom_col() +
labs( x = "Player Experience", y = "Age", title = "Age vs. Player Experience")
experience_plot

**Methods and Plan**
