## Introduction

In [None]:
library(tidyverse)
library(broom)




### Preliminary Results
In this section, we will demonstrate that we can read the data from the web and wrangle it into a tidy format. We will also address our primary question with plots and tables.
#### Reading the Data

In [None]:
steam_games <- read_csv('https://raw.githubusercontent.com/DanielZCode/STAT-301-Project/main/games-features-edit.csv') # read_csv on url which responds with the data
head(steam_games) # preview of the data 
nrow(steam_games) # 12624 data points (games)
colnames(steam_games) #variable names
length(colnames(steam_games)) #number of variables == 19


### Wrangling 
In order to correctly use the `ReleaseDate` variable, we will need to convert it from its current character-based format to a numerical value. Although it would be convenient to convert this variable year that the game was released, that would not be a true continuous variable since years are mostly discrete (especially in the case of Steam games, where the data available is fairly recent). Hence, we will need to convert `ReleaseDate` to a continuous variable representing the number of days passed since January 1st, 1970. This is a common representation of time in computers. 

In addition, we will filter the data for missing values, and in particular check that a game's `Metacritic` score is not 0, because that indicates a lack of reviews and an unplayed game. We will also filter only for games that were released after 1970.


In [None]:
steam_games$ReleaseDate <- steam_games$ReleaseDate %>% as.Date(format="%b%d%Y") %>%  as.numeric() #convert dates to days since 1970 (default internal representation) 


steam_games <- steam_games %>% drop_na() %>% filter(Metacritic > 0, ReleaseDate > 0) # filter for more than 0 days passed since 1970, metacritic score > 0


head(steam_games)
nrow(steam_games)



### Exploratory Analysis
In this section we will explore some estimates variables. In addition we will plot visualizations of our reponse variable, `Metacritic`, and its relation to other variables.

First, let us see the distribution of `Metacritic` scores.

In [None]:
options(repr.plot.width = 7, repr.plot.height = 5)

steam_games_metacritic_dist <- steam_games %>% ggplot() +
    geom_histogram(aes(x = Metacritic), bins = 35, color = '#FF00FF') +
    ggtitle('Figure 1: Distribution of Metacritic Scores') +
    labs(x = 'Metacritic Score', y = 'Count')

steam_games_metacritic_dist
    

Next, let us plot the relationship of Metacritic Score with the other continous variables.

In [None]:
steam_games_metacritic_releasedate_plot <- steam_games %>% ggplot() +
    geom_point(aes(x = ReleaseDate, y = Metacritic), size = 0.8, alpha = 0.7) +
    ggtitle('Relationship Between Metacritic Score and Release Date of a Game') +
    labs(y = 'Metacritic Score', x = 'Release Date (Number of Days passed since 1970)')
steam_games_metacritic_releasedate_plot

**Figure 1. Metacritic Score vs. Release Date**

In [None]:
steam_games_metacritic_price_plot <- steam_games %>% ggplot() +
    geom_point(aes(x = PriceInitial, y = Metacritic),  size = 0.8, alpha = 0.7) +
    ggtitle('Relationship Between Price of a Game and Metacritic Score') +
    labs(y = 'Metacritic Score', x = 'Price of a Game ($USD)')
steam_games_metacritic_price_plot

**Figure 2. Metacritic Score vs Price**

Below we will compute some estimates of the data.


In [None]:
steam_games_summarized <- steam_games %>% summarize(ReleaseDate_mean = mean(ReleaseDate), 
                                                    ReleaseDate_mean_Date = format(as.Date(ReleaseDate_mean, origin ='1970-01-01'), '%B %d, %Y'),
                                                    Metacritic_mean = mean(Metacritic), 
                                                    RecommendationCount_mean = mean(RecommendationCount),
                                                    PriceInitial_mean = mean(PriceInitial))
steam_games_summarized

**Figure 3. Averages of different variables**

As we can see, the average Metacritic score of a Steam game is approximately **72**, and the average release date was in **November, 2012**. In addition, Steam games have received **4637** recommendations on average, and have a mean price of **$14.86**.