In [30]:
#Load Packages
library(tidyverse)
library(repr)
library(tidymodels)
library(cowplot)

In [31]:
#Read in Players and Sessions after files were uploaded to GitHub and pulled into local repo
players <- read_csv("players.csv")
sessions <- read_csv("sessions.csv")

[1mRows: [22m[34m196[39m [1mColumns: [22m[34m7[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (4): experience, hashedEmail, name, gender
[32mdbl[39m (2): played_hours, Age
[33mlgl[39m (1): subscribe

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.
[1mRows: [22m[34m1535[39m [1mColumns: [22m[34m5[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (3): hashedEmail, start_time, end_time
[32mdbl[39m (2): original_start_time, original_end_time

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


In [32]:
#Tidy Players by changing experience and gender into factor columns for categorical analysis
players_tidy <- players|>
    mutate(experience = as_factor(experience), gender = as_factor(gender))
#Tidy Sessions by changing start_time and end_time to better date/time formats and adding session length
sessions_tidy <- sessions|> mutate(start_time = dmy_hm(start_time), end_time=dmy_hm(end_time))|>
    mutate(session_length_mins = as.numeric(end_time - start_time))

In [33]:
#Combine data sets
combined_tidy <- inner_join(players_tidy, sessions_tidy)

[1m[22mJoining with `by = join_by(hashedEmail)`


In [35]:
#Look at player characteristics with summary function
player_summary <- summary(players_tidy)
player_summary

    experience subscribe       hashedEmail         played_hours    
 Pro     :14   Mode :logical   Length:196         Min.   :  0.000  
 Veteran :48   FALSE:52        Class :character   1st Qu.:  0.000  
 Amateur :63   TRUE :144       Mode  :character   Median :  0.100  
 Regular :36                                      Mean   :  5.846  
 Beginner:35                                      3rd Qu.:  0.600  
                                                  Max.   :223.100  
                                                                   
     name                         gender         Age       
 Length:196         Male             :124   Min.   : 9.00  
 Class :character   Female           : 37   1st Qu.:17.00  
 Mode  :character   Non-binary       : 15   Median :19.00  
                    Prefer not to say: 11   Mean   :21.14  
                    Agender          :  2   3rd Qu.:22.75  
                    Two-Spirited     :  6   Max.   :58.00  
                    Other           

### Player Characteristics
|Total Players|Experience Level|Players|Gender|Players|Subscribed|Players|Age Range|Years|Total Play Time|Hours|
|-------------|----------------|-------|------|-------|----------|-------|---------|-----|---------------|-----|
|196|Pro|14|Male|124|Yes|144|Minimum Age|9|Minimum|0.00|
| |Veteren|48|Female|37|No|52|Median Age|19|Median|0.10|
| |Amateur|63|Non-binary|15| | |Mean Age|21.14|Mean|5.85|
| |Regular|36|Prefer not to say|11| | |Maximum Age|58|Maximum|223.10|
| |Beginner|35|Agender|2| | | | | | |
| |        |   |Two_spirited|6| | | | | | |
| |        |   |Other|1| | | | | | |

In [36]:
combined_summary <- summary(combined_tidy)
combined_summary

#create markdown table separated by the straight line (above backslash) 
#put variable descriptions into the table to prevent using word count

    experience  subscribe       hashedEmail         played_hours   
 Pro     : 39   Mode :logical   Length:1535        Min.   :  0.00  
 Veteran : 51   FALSE:103       Class :character   1st Qu.: 32.00  
 Amateur :820   TRUE :1432      Mode  :character   Median : 56.10  
 Regular :519                                      Mean   : 98.57  
 Beginner:106                                      3rd Qu.:178.20  
                                                   Max.   :223.10  
                                                                   
     name                         gender          Age       
 Length:1535        Male             :1015   Min.   : 9.00  
 Class :character   Female           : 382   1st Qu.:17.00  
 Mode  :character   Non-binary       : 104   Median :18.00  
                    Prefer not to say:  19   Mean   :19.43  
                    Agender          :  10   3rd Qu.:23.00  
                    Two-Spirited     :   4   Max.   :58.00  
                    Other    

### Combined Data - Player Analysis by Session
|Total Sessions|Experience Level|Sessions Played|Gender|Sessions Played|
|-------------|----------------|-------|------|-------|
|1535|Pro|39|Male|1015|
| |Veteren|51|Female|382|
| |Amateur|820|Prefer not to say|19|
| |Regular|519|Non-binary|104|
| |Beginner|106|Agender|10|
| |        |   |Two_spirited|4|
| |        |   |Other|1|




1. Descriptive summary
    - # of observations
    - summary stats to 2 decimals (round function?)
    - # of variables, type of variables, what they mean
    - any issues you see with the data or how it was collected
2. Question
    - Pick a broad question and make a specific question from it
    - use one response variable and one or more predictor variables in the form of a question
    - describe how you plan to wrangle data to answer that question
3. Exploratory Analysis
    - Load in data
    - minimum necessary wrangling for tidy data
    - mean value for each quantitative variable in players.csv in a table format
    - make some exploratory visualizations that may be relevant to your question
4. Method and plan