## Predicting Players Subscription Status using KNN Classification Modelling 

**Introduction**
provide some relevant background information on the topic so that someone unfamiliar with it will be prepared to understand the rest of your report
clearly state the question you tried to answer with your project
identify and fully describe the dataset that was used to answer the question

A reseach group in Computer Sciences at UBC have collected data from a Minecraft server, with the goal of predicting usage of a video game research server. In this study, the question explored is What player characteristics and behaviors are most predictive of subscribing to a game-related newsletter?

The specific question formulated is: Can age, gender, experience level, and hours played predict whether a player subscribes to the newsletter? The response variable is whether the player subscribes to the newsletter, represented by the subscribe column (a binary categorical variable). The explanatory variables are age, gender, experience level, and hours played.



To analyze this, we will be looking at a dataset that includes the following columns:

- `experience`: The level/rank of the player (catagorial)
- `played_hours`- Number of hours spent on the game (numerical)
-`gender-Players` gender identity (catagorial)
Age- The age of player in years (numberical, ordered quantity)
subscribe- If the player subscribed to a game-related newsletter (binary catagorial variable)


In [None]:
library(tidyverse)
library(repr)
library(tidymodels)
options(repr.matrix.max.rows = 6)
source("cleanup.R")

In [None]:
player_data <- read_csv("https://raw.githubusercontent.com/Cna-51/minecraft_indiv/refs/heads/main/players%20(1).csv") |>
    select(-hashedEmail, -name, -experience) |>
    filter(played_hours > 0) |>
    mutate(subscribe = as.factor(subscribe)) |>
    drop_na()
player_data

In [None]:
player_plot <- player_data |>
    ggplot(aes(x = Age, y = played_hours, colour = subscribe)) +
    geom_point() +
    labs(x = "Player's Age (yrs)", y = "Player hours (hrs)", colour = "Subscribed", title = "Player's Age vs Played Hours")
player_plot

In [None]:
player_split <- initial_split(player_data, prop= 0.7-0.3, strata= subscribe) 
player_training <- training(player_split)
player_testing <- testing(player_split)
player_training
player_testing

In [None]:
set.seed(1234)
player_recipe <- recipe(subscribe ~ played_hours + Age, data = player_training) |>
    step_scale(all_predictors()) |>
    step_center(all_predictors())
player_spec <- nearest_neighbor(weight_func = "rectangular", neighbors = 3) |>
    set_engine("kknn") |>
    set_mode("classification")
player_fit <- workflow() |>
    add_recipe(player_recipe) |>
    add_model(player_spec) |>
    fit(data = player_training)
player_predictions <- predict(player_fit, player_testing) |>
    bind_cols(player_testing)
prediction_accuracy <- player_predictions |>
        metrics(truth = subscribe, estimate = .pred_class)             
prediction_accuracy