# insert title

## Introduction

The ongoing research done by a research group in the computer science faculty at UBC aims to advance AI technology by using data on players and their behaviour in video games. They created a Minecraft server and are collecting and studying player gameplay, with a goal of collecting over 10,000 hours of data. The objective of their research project is to use the gameplay patterns and data to train AGI (Artificial general intelligence)- like agents. Part of this project includes recruiting players and an efficient strategy is needed in order to recruit players that will contribute many hours of gameplay. Therefore, this project focuses on answering the question: Can we determine the players most likely to contribute and play the most amount of hours by their age and experience with the game?

There are two datasets, ```players.csv``` and ```sessions.csv```, that we will use to answer the question.

```players.csv``` is a list of all unique players and includes the data from the survey that each player fills out at the beginning, as well as their total playtime on the server and other identifying info. We will use it for determining the best kinds of players. Its nine variables are:
-  ```experience```: what experience the player has in Minecraft between the categories of beginner, amateur, regular, pro, or veteran player (self-reported)
- ```subscribe```: whether or not the player wants updates on the project
- ```hashedEmail```: the email address the player provided to identify them (hashed for privacy)
- ```played_hours```: number of hours the player has played in total on the server
- ```name```: the name the player selected to play with
- ```gender```: the gender of the player (self-reported)
- ```age```: the age of the player in years (self-reported)
- ```individual ID``` and ```organizationName```: unused columns


```sessions.csv```has data about each play session on the server and identifies who played, when (time and date) the session was, and how long the session was. Its five variables are:
- ```hashedEmail```: hashed email of player, same variable as in ```players.csv```
- ```start_time```: start time (24hr) and date (dd/mm/yyyy) of the session 
- ```end_time```: end time (24hr) and date (dd/mm/yyyy) of the sessio
- ```original_start_time```: time in seconds of the start time in Unix time
- ```original_end_time```: time in seconds of the end time in Unix time


## Methods & Results

We will be using knn-Regression to answer our question and assign the following numbers to the ```experience``` variable: 1 - Beginner, 2 - Amateur, 3 - Regular, 4 - Pro, and 5 - Veteran.

In [2]:
# load libraries
library(tidyverse) # loading for analysis
library(RColorBrewer) # loading for colorblind friendly graphs
options(repr.matrix.max.rows = 5)


── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.4     [32m✔[39m [34mreadr    [39m 2.1.5
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.1
[32m✔[39m [34mggplot2  [39m 3.5.1     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.3     [32m✔[39m [34mtidyr    [39m 1.3.1
[32m✔[39m [34mpurrr    [39m 1.0.2     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors


In [3]:
# load datasets
url = 'https://drive.google.com/uc?export=download&id=1Mw9vW0hjTJwRWx0bDXiSpYsO3gKogaPz'
url2 = 'https://drive.google.com/uc?export=download&id=14O91N5OlVkvdGxXNJUj5jIsV5RexhzbB'
download.file(url, 'players.csv') 
download.file(url2, 'sessions.csv')

players <- read_csv('players.csv') # reading players data
sessions <- read_csv('sessions.csv') |>  # reading session data 
    separate(start_time, into = c('start_date', 'start_time'), sep = ' ') |> # tidying time and date column
    separate(end_time, into = c('end_date', 'end_time'), sep = ' ')

playdata <- merge(players, sessions, by.x = 'hashedEmail', by.y = 'hashedEmail') # combining datasets

set.seed(100) # seed

head(playdata)

[1mRows: [22m[34m196[39m [1mColumns: [22m[34m9[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (4): experience, hashedEmail, name, gender
[32mdbl[39m (2): played_hours, age
[33mlgl[39m (3): subscribe, individualId, organizationName

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.
[1mRows: [22m[34m1535[39m [1mColumns: [22m[34m5[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (3): hashedEmail, start_time, end_time
[32mdbl[39m (2): original_start_time, original_end_time

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this messag

Unnamed: 0_level_0,hashedEmail,experience,subscribe,played_hours,name,gender,age,individualId,organizationName,start_date,start_time,end_date,end_time,original_start_time,original_end_time
Unnamed: 0_level_1,<chr>,<chr>,<lgl>,<dbl>,<chr>,<chr>,<dbl>,<lgl>,<lgl>,<chr>,<chr>,<chr>,<chr>,<dbl>,<dbl>
1,0088b5e134c3f0498a18c7ea6b8d77b4b0ff1636fc93355ccc95b45423367832,Regular,TRUE,1.5,Isaac,Male,20,,,23/05/2024,00:22,23/05/2024,01:07,1.71642e+12,1.71643e+12
2,0088b5e134c3f0498a18c7ea6b8d77b4b0ff1636fc93355ccc95b45423367832,Regular,TRUE,1.5,Isaac,Male,20,,,22/05/2024,23:12,23/05/2024,00:13,1.71642e+12,1.71642e+12
3,060aca80f8cfbf1c91553a72f4d5ec8034764b05ab59fe8e1cf0eee9a7b67967,Pro,FALSE,0.4,Lyra,Male,21,,,28/06/2024,04:28,28/06/2024,04:58,1.71955e+12,1.71955e+12
⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮
5,0d4d71be33e2bc7266ee4983002bd930f69d304288a8663529c875f40f1750f3,Regular,TRUE,5.6,Winslow,Male,17,,,30/08/2024,03:40,30/08/2024,04:04,1.72499e+12,1.72499e+12
6,0d4d71be33e2bc7266ee4983002bd930f69d304288a8663529c875f40f1750f3,Regular,TRUE,5.6,Winslow,Male,17,,,27/08/2024,19:18,27/08/2024,19:52,1.72479e+12,1.72479e+12


## Discussion

stuff

## References

if needed