# Data Science 100 Project

## introduction:
### Background: 
Video games are a popular way for people to play and connect with others. Game makers and researchers often use newsletters to share updates, events, or news with players. But not every player signs up for these newsletters. If we can find out which players are more likely to subscribe, we can better understand what kinds of players are more interested and involved.

In this project, we look at real data from a Minecraft research server. The data includes player information and how they behave in the game. We want to find out which player features and behaviors are most useful in predicting whether someone will subscribe to the newsletter. This can help game teams and researchers plan better ways to reach the right players.
### link to github:
https://github.com/90419359/data-science-project
### Questin 1 (the selection of the project):
Can a player’s gender predict whether they will subscribe to a game-related newsletter, and does this pattern differ across experience levels?

This project explores whether gender affects a player’s likelihood to subscribe. The response variable is subscribe (TRUE or FALSE), and the explanatory variable is gender. Using data from players.csv, we will compare subscription rates across different genders.The goal is to build a simple model to see if gender helps predict subscription behavior.
### Data Description:
## 1.players.csv
Each row in this dataset represents an individual player. The columns include:
	
    experience: Self-reported gaming experience, categorized as Beginner, Amateur, Regular, Veteran, or Pro.
	
    subscribe: Indicating whether the player subscribed to the server’s content or notifications.
	
    hashedEmail: A pseudonymized identifier for each player.
	
    played_hours: Total number of hours the player has played on the server.
	
    name: The first name of the player.
	
    gender: Gender identity (Male, Female, Non-binary).
	
    age: The player’s self-reported age (integer).

## 2.sessions.csv
Each row represents one gameplay session and includes:
	
    hashedEmail: useless in our project
	
    start_time: The human-readable start time of the session.
	
    end_time: The human-readable end time of the session.
	
    original_start_time: Start time in Unix timestamp format.
	
    original_end_time: End time in Unix timestamp format.

These fields allow for the analysis of session length, activity patterns, and player engagement over time.

In [None]:
library(tidyverse)

In [None]:
# load the data

#save the website
player_url <-"https://raw.githubusercontent.com/90419359/data-science-project/refs/heads/main/players.csv"
session_url <- "https://raw.githubusercontent.com/90419359/data-science-project/refs/heads/main/sessions.csv"
#download the file
download.file(player_url,destfile ="players.csv")
download.file(session_url,destfile ="sessions.csv")
#read the file
Player_data <- read_csv("players.csv")
Sessions_data <- read_csv("sessions.csv")

In [None]:
Player_data

In [None]:
Sessions_data

In [None]:
# make the data more clean and perform summaries

In [None]:
Player_data <- Player_data |>
  mutate(gender_simple = ifelse(
    gender == "Male", "Male",
    ifelse(gender == "Female", "Female", "Other")
  ))
Player_data

In [None]:
gender_subscribe <- Player_data |>
  group_by(gender_simple, subscribe) |>
  summarize(count = n())
gender_subscribe

In [None]:
# creates a visualization and explain the relationship between them
gender_subscribe_female_bar <-  gender_subscribe |> 
     filter(gender_simple == "Female") |>
     ggplot(aes(x=subscribe,y=count)) +
     geom_bar(stat="identity")+
     labs(x="Subscription Status",y="Number Of Female Players",title="Female User Subscription Overview") 
gender_subscribe_female_bar

In [None]:
gender_subscribe_male_bar <-  gender_subscribe |> 
     filter(gender_simple == "Male") |>
     ggplot(aes(x=subscribe,y=count)) +
     geom_bar(stat="identity")+
     labs(x="Subscription Status",y="Number Of Male Players",title="Male User Subscription Overview") 
gender_subscribe_male_bar

In [None]:
gender_subscribe_genfer_minorities_bar <-  gender_subscribe |> 
     filter(gender_simple == "Other") |>
     ggplot(aes(x=subscribe,y=count)) +
     geom_bar(stat="identity")+
     labs(x="Subscription Status",y="Number Of Minorities Players",title="Minorities User Subscription Overview") 
gender_subscribe_genfer_minorities_bar