In [1]:
suppressPackageStartupMessages(
    {suppressWarnings({
        install.packages("formatR")
        library(formatR)
        library(tidyverse)
        library(repr)
        library(tidymodels)
        library(GGally)
    })
})



Updating HTML index of packages in '.Library'

Making 'packages.html' ...
 done



In [2]:
options(repr.plot.width = 6, repr.plot.height = 4, repr.matrix.max.rows = 7,readr.show_col_types = FALSE)

player_data <- read_csv("data/players.csv")
session_data <- read_csv("data/sessions.csv")



# **Data Science Project: Planning Stage (Individual)**

## **(1) Data Description:**
Our sample was collected through a volunteer sign-up on the plaicraft.ai website, with participants submitting their email and phone number to participate in the research project. After entering information, the user is granted access to the server to play, and then further data is collected by recording the server itself.
 

In [3]:
# str(player_data)
# head(player_data, 1)
# options(repr.matrix.max.rows = 1000)
# distinct(player_data, Age)
# player_data




In [4]:
ply_summarised_num <- player_data |>
                summarise(subscribed_percent_decimal = mean(subscribe),
                       played_hours_avg = mean(played_hours),
                       played_hours_median = median(played_hours),
                       age_avg =  mean(Age, na.rm = TRUE), 
                       age_median = median(Age, na.rm = TRUE)) |>
                        select(subscribed_percent_decimal,played_hours_avg,played_hours_median,age_avg,age_median) |>
                        round(2)

ply_summarised_gender <- player_data |>
                        select(gender) |>
                        count(gender) |>
                        mutate(percent_decimal = round(n /sum(n), 2), count = n)

ply_summarised_experience <- player_data |>
                        select(experience) |>
                        count(experience) |>
                        mutate(percent_decimal = round(n /sum(n), 2), count = n)


 #### **Player Data:**"
Player info from the survey and total play time on the server.

Observation: 196 

Variables: 7 

 Our **variables** from the player dataset:
> - `experience` <chr> label of players' experience as either `Beginner`, `Amateur`, `Regular`,  `Veteran` and  `Pro`.
> - `subscribe` <lgl> subscribed to the newsletter `TRUE` or `FALSE`.
> - `hashedEmail` <chr> is a unique hash used as a data ID for the player.
> - `played_hours` <dbl> total hours spent playing for this research project server.
> - `name`<chr> users' first name.
> - `gender` <chr>  labels of users' gender from `Male`, `Female`, `Non-binary`, `Prefer not to say`, `Agender`, `Two-Spirited` and `Other`.
> - `Age` <dbl> players age in years.
>

>





#### Summary across numeric values:

In [5]:
ply_summarised_num

subscribed_percent_decimal,played_hours_avg,played_hours_median,age_avg,age_median
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
0.73,5.85,0.1,21.14,19


#### Summary of participents Gender:

In [6]:
ply_summarised_gender

gender,n,percent_decimal,count
<chr>,<int>,<dbl>,<int>
Agender,2,0.01,2
Female,37,0.19,37
Male,124,0.63,124
Non-binary,15,0.08,15
Other,1,0.01,1
Prefer not to say,11,0.06,11
Two-Spirited,6,0.03,6


#### Summary participents Experience:

In [7]:
ply_summarised_experience

experience,n,percent_decimal,count
<chr>,<int>,<dbl>,<int>
Amateur,63,0.32,63
Beginner,35,0.18,35
Pro,14,0.07,14
Regular,36,0.18,36
Veteran,48,0.24,48


 #### **Session Data:**
User session info, precise login time, and dates.

Observation: 1535 

Variables: 5


Our variables from the session dataset:
> - `hashedEmail` <chr> is a unique hash used as a data ID for the player.
> - `start_time`  <chr> DD/MM/YYYY 23:59 session start time including military time
> - `end_time`    <chr> DD/MM/YYYY 23:59 session end time including military time
> - `original_start_time` <chr> precise start time to the milisecond
> - `original_end_time` <chr> precise end time to the milisecond


> Summary Statistic:
> - mean ...
>
Potential data issues:
> - some ...
>
> 


### **(2) Questions:**

### **(3) Exploratory Data Analysis and Visualization:**

### **(4) Methods and Plan:**