DSCI 100 Project Final Report

Introduction:

This project aims to predict the usage of a video game research server by analyzing which types of players are most likely to generate large amounts of data. Specifically, the goal is to answer: "Can experience level can predict the total number of hours a player is likely to complete in a month?" This research will be conducted within the context of a Minecraft server developed by the Pacific Laboratory for Artificial Intelligence (PLAI), a research group within the Department of Computer Science at the University of British Columbia. The team's research focuses on developing a model of embodied AI that can realistically interact with real-time players. To support this, the team has set up a Minecraft server with the goal of collecting 10,000 hours of multiplayer gameplay. The data will be used to train AGI-like agents capable of responding appropriately in video and audio perceptual environments. To ensure the success of this project, the team needs to target their recruitment efforts effectively, which is where the analysis of this project will come forth.

In [1]:
library(tidyverse)
library(repr)
library(tidymodels)
options(repr.matrix.max.rows = 9)
library(RColorBrewer)

── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.4     [32m✔[39m [34mreadr    [39m 2.1.5
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.1
[32m✔[39m [34mggplot2  [39m 3.5.2     [32m✔[39m [34mtibble   [39m 3.3.0
[32m✔[39m [34mlubridate[39m 1.9.4     [32m✔[39m [34mtidyr    [39m 1.3.1
[32m✔[39m [34mpurrr    [39m 1.1.0     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors
── [1mAttaching packages[22m ────────────────────────────────────── tidymodels 1.3.0 ──

[32m✔[39m [34mbroom       [39m 1.0.9     [32m✔[39m [34mrsample     [39

In [2]:
url_players <- "https://drive.google.com/uc?export=download&id=1Mw9vW0hjTJwRWx0bDXiSpYsO3gKogaPz"
url_sessions <- "https://drive.google.com/uc?export=download&id=14O91N5OlVkvdGxXNJUj5jIsV5RexhzbB"

players_data <- read_csv(url_players)
players_data

sessions_data <- read_csv(url_sessions)
sessions_data

[1mRows: [22m[34m196[39m [1mColumns: [22m[34m9[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (4): experience, hashedEmail, name, gender
[32mdbl[39m (2): played_hours, age
[33mlgl[39m (3): subscribe, individualId, organizationName

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


experience,subscribe,hashedEmail,played_hours,name,gender,age,individualId,organizationName
<chr>,<lgl>,<chr>,<dbl>,<chr>,<chr>,<dbl>,<lgl>,<lgl>
Pro,TRUE,f6daba428a5e19a3d47574858c13550499be23603422e6a0ee9728f8b53e192d,30.3,Morgan,Male,9,,
Veteran,TRUE,f3c813577c458ba0dfef80996f8f32c93b6e8af1fa939732842f2312358a88e9,3.8,Christian,Male,17,,
Veteran,FALSE,b674dd7ee0d24096d1c019615ce4d12b20fcbff12d79d3c5a9d2118eb7ccbb28,0.0,Blake,Male,17,,
Amateur,TRUE,23fe711e0e3b77f1da7aa221ab1192afe21648d47d2b4fa7a5a659ff443a0eb5,0.7,Flora,Female,21,,
Regular,TRUE,7dc01f10bf20671ecfccdac23812b1b415acd42c2147cb0af4d48fcce2420f3e,0.1,Kylie,Male,21,,
⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮
Veteran,FALSE,71453e425f07d10da4fa2b349c83e73ccdf0fb3312f778b35c5802c3292c87bd,0.3,Pascal,Male,22,,
Amateur,FALSE,d572f391d452b76ea2d7e5e53a3d38bfd7499c7399db299bd4fedb06a46ad5bb,0.0,Dylan,Prefer not to say,17,,
Amateur,FALSE,f19e136ddde68f365afc860c725ccff54307dedd13968e896a9f890c40aea436,2.3,Harlow,Male,17,,
Pro,TRUE,d9473710057f7d42f36570f0be83817a4eea614029ff90cf50d8889cdd729d11,0.2,Ahmed,Other,91,,


[1mRows: [22m[34m1535[39m [1mColumns: [22m[34m5[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (3): hashedEmail, start_time, end_time
[32mdbl[39m (2): original_start_time, original_end_time

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


hashedEmail,start_time,end_time,original_start_time,original_end_time
<chr>,<chr>,<chr>,<dbl>,<dbl>
bfce39c89d6549f2bb94d8064d3ce69dc3d7e72b38f431d8aa0c4bf95ccee6bf,30/06/2024 18:12,30/06/2024 18:24,1.71977e+12,1.71977e+12
36d9cbb4c6bc0c1a6911436d2da0d09ec625e43e6552f575d4acc9cf487c4686,17/06/2024 23:33,17/06/2024 23:46,1.71867e+12,1.71867e+12
f8f5477f5a2e53616ae37421b1c660b971192bd8ff77e3398304c7ae42581fdc,25/07/2024 17:34,25/07/2024 17:57,1.72193e+12,1.72193e+12
bfce39c89d6549f2bb94d8064d3ce69dc3d7e72b38f431d8aa0c4bf95ccee6bf,25/07/2024 03:22,25/07/2024 03:58,1.72188e+12,1.72188e+12
36d9cbb4c6bc0c1a6911436d2da0d09ec625e43e6552f575d4acc9cf487c4686,25/05/2024 16:01,25/05/2024 16:12,1.71665e+12,1.71665e+12
⋮,⋮,⋮,⋮,⋮
7a4686586d290c67179275c7c3dfb4ea02f4d317d9ee0e2cee98baa27877a875,01/07/2024 04:08,01/07/2024 04:19,1.71981e+12,1.71981e+12
fd6563a4e0f6f4273580e5fedbd8dda64990447aea5a33cbb5e894a3867ca44d,28/07/2024 15:36,28/07/2024 15:57,1.72218e+12,1.72218e+12
fd6563a4e0f6f4273580e5fedbd8dda64990447aea5a33cbb5e894a3867ca44d,25/07/2024 06:15,25/07/2024 06:22,1.72189e+12,1.72189e+12
36d9cbb4c6bc0c1a6911436d2da0d09ec625e43e6552f575d4acc9cf487c4686,20/05/2024 02:26,20/05/2024 02:45,1.71617e+12,1.71617e+12


This project involves two datasets: one containing personal information for all players in the game and another containing data about the individual play sessions for these players. To address the research question, the personal data from the first dataset will be used. As shown in the player's table above, the players dataset includes 196 observations across 9 variables.

In [3]:
player_col_names <- colnames(players_data)
player_variables <- tibble(Variable_Name = player_col_names)
player_variables_types <- c("character", "logical", "character", "double", "character", "character", "double", "logical", "logical") 
player_variables_interp <- c("Gaming level of experience",
                             "Player's subscription status (TRUE = Subscribed, FALSE = Not Subscribed)",
                             "Hashed version of individual's email address",
                             "The number of hours the player has played",
                             "The name of the player",
                             "The gender of the player",
                             "The age of the player",
                             "A unique identifier for each player",
                             "Name of player's gaming organization")
player_variables_example <- c("Pro, Veteran, Amateur, Regular, Beginner",
                              "TRUE, FALSE",
                              "f6daba428a5e19a3d47574858c13550499be23603422e6a0ee9728f8b53e192d",
                              "30.3, 0, 1.6",
                              "Morgan, Christian, Blake, Flora",
                              "Male, Female, Non-binary, Prefer not to say, Two Spirited",
                              "9, 17, 21, 25",
                              "17, 21, 22, etc.",
                              "empty")

player_variables <- player_variables |>
                        mutate(Player_Data_Type = player_variables_types) |>
                        mutate(Variable_Interpretation = player_variables_interp) |>
                        mutate(Variable_Examples = player_variables_example)
player_variables


           

Variable_Name,Player_Data_Type,Variable_Interpretation,Variable_Examples
<chr>,<chr>,<chr>,<chr>
experience,character,Gaming level of experience,"Pro, Veteran, Amateur, Regular, Beginner"
subscribe,logical,"Player's subscription status (TRUE = Subscribed, FALSE = Not Subscribed)","TRUE, FALSE"
hashedEmail,character,Hashed version of individual's email address,f6daba428a5e19a3d47574858c13550499be23603422e6a0ee9728f8b53e192d
played_hours,double,The number of hours the player has played,"30.3, 0, 1.6"
name,character,The name of the player,"Morgan, Christian, Blake, Flora"
gender,character,The gender of the player,"Male, Female, Non-binary, Prefer not to say, Two Spirited"
age,double,The age of the player,"9, 17, 21, 25"
individualId,logical,A unique identifier for each player,"17, 21, 22, etc."
organizationName,logical,Name of player's gaming organization,empty
