In [1]:
library(tidyverse)

── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.4     [32m✔[39m [34mreadr    [39m 2.1.5
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.1
[32m✔[39m [34mggplot2  [39m 3.5.1     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.3     [32m✔[39m [34mtidyr    [39m 1.3.1
[32m✔[39m [34mpurrr    [39m 1.0.2     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors


Players: Data Description

Players dataset:
- This dataset is a .csv file that describes the attributes of individual players who have logged into the Plaicraft server. There are 7 total variables:
| Variable       | Type      | Description                                                                                                                | Potential Issues / Notes  |
|----------------|-----------|----------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------|
| **Experience** | Character | Categorizes player comfort level with the Minecraft system: “Pro”, “Veteran”, “Regular”, “Amateur”, or “Beginner”.         | Self-reported → subject to subjectivity or overreporting  |
| **Subscribe**  | Logical   | Indicates whether the player subscribed to the game-related newsletter (`TRUE` = subscribed, `FALSE` = not subscribed).    | Binary variable |
| **Hashed Email** | Character | Encrypted (hashed) version of the player’s email address for anonymity and privacy.                                       | Used as a unique identifier only; not meaningful for analysis.                           |
| **Player Hours** | Double  | Number of hours the player spent in a session, as tracked by the server logs. Values ≤ 0.1 hr are recorded as 0.0.         | Sessions shorter than 6 minutes are undercounted as 0.0; could bias playtime distributions.     |
| **Name**       | Character | Player’s self-reported name.                                                                                               | Non-unique; may contain duplicates, pseudonyms, or missing data.                         |
| **Gender**     | Character | Self-reported gender: “Male”, “Female”, “Agender”, “Two-Spirited”, “Non-Binary”, or “Prefer not to say.”                   | Voluntary reporting may lead to missing or inconsistent entries.                         |
| **Age**        | Double    | Self-reported player age (in years).                                                                                       | May include outliers      |



There are 196 total observations in this dataset, each detailing the experience, newsletter subscription, hashed email, player hours, name, gender, and age of a player. The average playtime is 5.85 hours, with a median of 0.1 hours, and the average player age is 21.13 years old, with a median of 19 


Sessions: Data Description 

- This dataset is a .csv file that is fully documented by server logs and tracks the start time and end time of each play session of all registered players. There are 5 total variables.

| Variable              | Type       | Description                                                                                                                   | Potential Issues / Notes                                                                 |
|------------------------|------------|-------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------|
| **Hashed Email**       | Character  | Encrypted player email used as a unique identifier to link each play session to a player in `players.csv`.                    | Serves only as an identifier; not interpretable for analysis.                            |
| **Start Time**         | Datetime   | The date and time when the player began a play session (e.g., "30/06/2024 18:12").                                            | Time format may require conversion to standard POSIX datetime for analysis.              |
| **End Time**           | Datetime   | The date and time when the player ended a play session (e.g., "30/06/2024 18:24").                                            | May contain sessions that are cut off, overlapping, or incorrectly logged.               |
| **Original Start Time**| Numeric    | Unix timestamp (milliseconds since Jan 1, 1970) representing the original session start time before formatting (e.g., 1.71977E+12). | Requires conversion to standard datetime; mainly for technical tracking and not useful here as the rounded data hides any noticeable changes|
| **Original End Time**  | Numeric    | Unix timestamp (milliseconds since Jan 1, 1970) representing the original session end time.                                   | Requires conversion to standard datetime; mainly for technical tracking and not useful here as the rounded data hides any noticeable changes                         |


  
              
   

The broad question I will address is Question 2: We would like to know which "kinds" of players are most likely to contribute a large amount of data so that we can target those players in our recruiting efforts.

Specifically, I will explore if age and experience (predictor variables) can predict the number of hours a player will spend (response variable) in the Minecraft server. To do this, I will primarily use data from the players.csv file, focusing on variables "Experience", "Player Hours", and "Age". I will run separate univariate regression analyses for each predictor variable in addition to a multivariate regression analysis for both predictor variables.

This question will help address the broad question by informing the researchers of whether age, expertise with the Minecraft system, and these variables combined could predict player hours in game, thus allowing the researchers to strategically market their product to a targeted group of potential players to maximize efficiency and data yield. 

