## Predicting Experience of Players on a Video Game Research Server
### DSCI Final Project Report 
Project 009-35

`Introduction:
provide some relevant background information on the topic so that someone unfamiliar with it will be prepared to understand the rest of your report
clearly state the question you tried to answer with your project
identify and fully describe the dataset that was used to answer the question`

### Introduction 
It is important to understand how different players engage with a video game research server to manage resources and plan effective recruitment strategies. 

A research group in Computer Science at UBC, led by Frank Wood, gathers detailed gameplay information from a custom Minecraft server. Because maintaining this project demands substantial resources, the team must identify which players are most likely to produce large amounts of data. Doing so helps them focus their recruitment efforts on players who will provide the most useful and impactful contributions.

In this project, we focus on determining which "kinds" of players are most likely to contribute a large amount of data so that recruitment efforts can be better targeted. Specifically, we explore **whether a player’s experience level in Minecraft can be predicted using their age and total played hours**. We selected this focus because recruiters who aim to maximize gameplay data, such as total hours played, may benefit from understanding how these characteristics relate to experience. Our findings can help identify which types of players should be prioritized if the goal is to target particular age groups or players with the potential to generate more extensive gameplay data. Overall, we aim to determine whether measurable player characteristics are associated with being more experienced, which may indicate a higher likelihood of contributing large data.

Our intended information source is the `Players` data set 

Number of observed rows: `196`
Number of observed columns: `7`

Columns in the dataset:
- `experience` - The experience level of each player who plays Minecraft (Beginner, Amateur, Pro, Regular and Veteran)
- `subscribe` - Whether the player subscribed to a game-related newsletter or communications. (True or False)
- `hashedEmail` - An anonymous player ID
- `played_hours` - Total number of hours the player has spent playing the game
- `name` - The player's display name
- `gender` The player's reported gender
- `age` The player's age in years

`Methods & Results:`
- `describe the methods you used to perform your analysis from beginning to end that narrates the analysis code. your report should include code which:`
- `loads data `
- `wrangles and cleans the data to the format necessary for the planned analysis`
- `performs a summary of the data set that is relevant for exploratory data analysis related to the planned analysis `
- `creates a visualization of the dataset that is relevant for exploratory data analysis related to the planned analysis`
- `performs the data analysis`
- `creates a visualization of the analysis `
`note: all figures should have a figure number and a legend`

In [10]:
# load data
library(tidyverse)
url<-"https://raw.githubusercontent.com/audryleine-isidro/dsci-final-project/refs/heads/main/players.csv"
players<-read_csv(url)

# inspect the structure of the data
glimpse(players)
head(players)

[1mRows: [22m[34m196[39m [1mColumns: [22m[34m7[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (4): experience, hashedEmail, name, gender
[32mdbl[39m (2): played_hours, Age
[33mlgl[39m (1): subscribe

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


Rows: 196
Columns: 7
$ experience   [3m[90m<chr>[39m[23m "Pro", "Veteran", "Veteran", "Amateur", "Regular", "Amate…
$ subscribe    [3m[90m<lgl>[39m[23m TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, T…
$ hashedEmail  [3m[90m<chr>[39m[23m "f6daba428a5e19a3d47574858c13550499be23603422e6a0ee9728f8…
$ played_hours [3m[90m<dbl>[39m[23m 30.3, 3.8, 0.0, 0.7, 0.1, 0.0, 0.0, 0.0, 0.1, 0.0, 1.6, 0…
$ name         [3m[90m<chr>[39m[23m "Morgan", "Christian", "Blake", "Flora", "Kylie", "Adrian…
$ gender       [3m[90m<chr>[39m[23m "Male", "Male", "Male", "Female", "Male", "Female", "Fema…
$ Age          [3m[90m<dbl>[39m[23m 9, 17, 17, 21, 21, 17, 19, 21, 47, 22, 23, 17, 25, 22, 17…


experience,subscribe,hashedEmail,played_hours,name,gender,Age
<chr>,<lgl>,<chr>,<dbl>,<chr>,<chr>,<dbl>
Pro,True,f6daba428a5e19a3d47574858c13550499be23603422e6a0ee9728f8b53e192d,30.3,Morgan,Male,9
Veteran,True,f3c813577c458ba0dfef80996f8f32c93b6e8af1fa939732842f2312358a88e9,3.8,Christian,Male,17
Veteran,False,b674dd7ee0d24096d1c019615ce4d12b20fcbff12d79d3c5a9d2118eb7ccbb28,0.0,Blake,Male,17
Amateur,True,23fe711e0e3b77f1da7aa221ab1192afe21648d47d2b4fa7a5a659ff443a0eb5,0.7,Flora,Female,21
Regular,True,7dc01f10bf20671ecfccdac23812b1b415acd42c2147cb0af4d48fcce2420f3e,0.1,Kylie,Male,21
Amateur,True,f58aad5996a435f16b0284a3b267f973f9af99e7a89bee0430055a44fa92f977,0.0,Adrian,Female,17


In [11]:
# wrangle and clean data

# select the relevant variables, convert experience to a factor type, and remove rows containing missing value
players<-players|>
    select (experience,played_hours,Age)|>
    mutate(experience = as_factor(experience))|>
    drop_na()

players

experience,played_hours,Age
<fct>,<dbl>,<dbl>
Pro,30.3,9
Veteran,3.8,17
Veteran,0.0,17
Amateur,0.7,21
Regular,0.1,21
Amateur,0.0,17
Regular,0.0,19
Amateur,0.0,21
Amateur,0.1,47
Veteran,0.0,22


In [3]:
# summary of data set

In [4]:
# visualization

In [5]:
# perform data analysis

In [6]:
# visualization of data anaylsis 

`Discussion:`
`summarize what you found`
- `discuss whether this is what you expected to find`
- `discuss what impact could such findings have`
- `discuss what future questions could this lead to`

`References
You may include references if necessary, as long as they all have a consistent citation style.`