**Individual Planning Report**  
Aisyah Sudarmaji  
**Problem**: Predicting Usage of a Video Game Research Server


**DATA DESCRIPTION**  
The data set "players.csv" contains information about players who participated in UBC's Minecraft Research Project. Each row represents an individual player.

**The broad question we want to answer:**  
Which "kinds" of players are most likely to contribute a large amount of data so that we can target those players in our recruiting efforts?

**Our more specific question would be:** 
Can a player's experience level, age, and gender predict the number of hours they would play in the server?

**Column Description**  
1. experience: The player's experience level in Minecraft 
2. subscribe: Indicates whether or not the player subscribed to the game
3. hashedEmail: The player's anonymous identiy
4. played_hours: The number of hours the player spent playing on the server
5. name: The player's name
6. gender: The player's gender
7. Age: The player's age

The data set contains 196 players and 7 variables.   
We won't be using the columns "hashedEmail" and "name" as they don't provide useful information for the prediction.

In [None]:
library(tidyverse)
players <- read_csv("players.csv")
players
nrow(players)
ncol(players)

In [None]:
players_summary <- players |>
    summarize(mean_played_hours = mean(played_hours, na.rm = TRUE))
players_summary

players_summary_based_on_age <- players |>
    group_by(Age) |>
    summarize(mean_played_hours = mean(played_hours, na.rm = TRUE))
players_summary_based_on_age

players_summary_based_on_gender <- players |>
    group_by(gender) |>
    summarize(mean_played_hours = mean(played_hours, na.rm = TRUE))
players_summary_based_on_gender

players_summary_based_on_experience <- players |>
    group_by(experience) |>
    summarize(mean_played_hours = mean(played_hours, na.rm = TRUE))
players_summary_based_on_experience

In [None]:
age_plot <- ggplot(players_summary_based_on_age, aes(x = Age, y = mean_played_hours)) +
    geom_point() +
    labs(title = "Average Played Hours by Age",
         x = "Player's Age",
         y = "Average Played Hours")
age_plot

The graph shows that there is no clear relationship between the player's age and average number of hours playing in the server. There is no positive nor negative relationship between the two variables. 

In [None]:
gender_plot <- ggplot(players_summary_based_on_gender, aes(x = gender, y = mean_played_hours, fill = gender)) +
    geom_bar(stat = "identity") +
    labs(title = "Average Played Hours by Gender",
         x = "Gender",
         y = "Average Played Hours")
gender_plot

The graph shows that players who identify themselves as non-binary have the highest average number of hours of playing in the server, followed by female players. Two-spirited-gendered players have the lowest average number of hours playing in the server.

In [None]:
experience_plot <- ggplot(players_summary_based_on_experience, aes(x = experience, y = mean_played_hours, fill = experience)) +
            geom_bar(stat = "identity") +
            labs(title = "Average Played Hours by Experience Level",
            x = "Experience Level",
            y = "Average Played Hours") 
experience_plot
               

The graph shows that players with the a "Regular" experience level have the highest number of hours of playing in the server. This suggest that players who play regularly are more active in the server compared to amateur, beginner, pro, and veteran players.