**Individual Planning Report**  
Aisyah Sudarmaji  
**Problem**: Predicting Usage of a Video Game Research Server


**DATA DESCRIPTION**  
The data set "players.csv" contains information about players who participated in UBC's Minecraft Research Project. Each row represents an individual player.

**The broad question we want to answer:**  
Which "kinds" of players are most likely to contribute a large amount of data so that we can target those players in our recruiting efforts?

**Our more specific question would be:** 
Can a player's experience level, age, and gender predict the number of hours they would play in the server?

**Column Description**  
1. experience: The player's experience level in Minecraft 
2. subscribe: Indicates whether or not the player subscribed to the game
3. hashedEmail: The player's anonymous identiy
4. played_hours: The number of hours the player spent playing on the server
5. name: The player's name
6. gender: The player's gender
7. Age: The player's age

The data set contains 196 players and 7 variables.   
We won't be using the columns "hashedEmail" and "name" as they don't provide useful information for the prediction.

In [None]:
library(tidyverse)
players <- read_csv("players.csv")
players
nrow(players)
ncol(players)

In [None]:
players_summary <- players |>
    summarize(mean_played_hours = mean(played_hours, na.rm = TRUE))
players_summary

players_summary_based_on_age <- players |>
    group_by(Age) |>
    summarize(mean_played_hours = mean(played_hours, na.rm = TRUE))
players_summary_based_on_age

players_summary_based_on_gender <- players |>
    group_by(gender) |>
    summarize(mean_played_hours = mean(played_hours, na.rm = TRUE))
players_summary_based_on_gender

players_summary_based_on_experience <- players |>
    group_by(experience) |>
    summarize(mean_played_hours = mean(played_hours, na.rm = TRUE))
players_summary_based_on_experience

In [None]:
experience_plot <- ggplot(players, aes(x = experience, y = mean_played_hours, fill = experience)) +
            geom_bar() +
            labs(title = "Average Played Hours by Experience Level",
            x = "Experience Level",
            y = "Average Played Hours") 
experience_plot
               