# Project Final Report 

In [None]:
library(tidyverse)
library(repr)
library(tidymodels)
options(repr.matrix.rows = 6)

In [None]:
players_data <-read_csv("https://raw.githubusercontent.com/amberer60s/DSCI-100---Group-Project/refs/heads/main/players%20(1).csv?token=GHSAT0AAAAAADALMD353KYDXAOM4WUQ2ZNCZ7MSY7Q")
print(players_data)

## Introduction ##

In the world of gaming, game developers and companies want to keep players engaged and attract new ones. One way to do this is by figuring out which players are most likely to play a lot, as this gives them a better idea of where to focus their marketing and recruitment efforts. The more time a player spends, the better the developers can understand how to improve the game and 

A big question for game developers is whether certain types of players are more likely to play for longer periods. 

For our project, we tried to answer the question : 

**"Can a player’s experience and age predict how much time they will spend playing the game?"**

Our aim is to see if there is a relationship between how old a player is and how much they play. This could help game developers understand which age groups are more likely to be active players.

#### **players.csv**
This dataset contains 196 player records with various variables describing their characteristics and behavior.

| Column Name   | Data Type | Description |
|--------------|----------|-------------|
| `experience` | character (chr) | Player's experience level (`Pro`, `Veteran`, `Regular`, and `Amateur`). |
| `subscribe`  | logical (lgl) | Indicates whether the player is a subscriber to the server (`True` or `False`). |
| `hashedEmail` | character (chr) | Hashed representation of the player's email. |
| `played_hours` | dbl | Total hours the player has played. |
| `name` | character (chr) | Player's name. |
| `gender` | character (chr) | Player's gender (e.g., Male, Female, Non-binary, etc.). |
| `Age` | double (dbl) | Player’s age (years). |

For our exploration in this project, we will focus mainly on the columns **experience**, **Age**, and **played_hours**. Based on this, we can draw the conclusions, 
- **Response Variable** : What we want to predict. In this case, the response variable is **played_hours**, which represennts the total time a players spends playing the game.
- **Exploratory Variable** : What we use to predict the response variable. For our project, the explanatory variables are **experience** and **Age**, as we are looking to see if a player's age can help predict how manny hours they will play. 

## Explortary and Visualization

In [None]:
summary(players_data)

In [None]:
summary_players <- players_data |>
                    summarize(across(everything(), ~sum(is.na(.))))
summary_players

In [None]:
num_obs <- nrow(players_data)
players_data |>
  group_by(experience) |>
  summarize(
    count = n(),
    percentage = n() / num_obs * 100
  )

#reports the number of observations in each variable
num_observations <- players_data |>
  summarise(across(everything(), ~sum(!is.na(.))))
num_observations

In [None]:
players_data <- players_data|>
            mutate(experience_numeric = recode(experience,
                                      "Beginner" = 1,
                                      "Regular" = 2,
                                      "Amateur" = 3,
                                      "Pro" = 4,
                                      "Veteran" = 5))|>
            drop_na()

In [None]:
players_graph <- players_data |>
    ggplot(aes(x=Age,y=played_hours, color=experience)) +
    geom_point(alpha=0.7) 


players_graph