# When Was the Golden Age of Video Games?

## 1.Overview
Video games are big business: the global gaming market is projected to be worth more than $300 billion by 2027 according to [Mordor Intelligence](https://www.mordorintelligence.com/industry-reports/global-gaming-market). With so much money at stake, the major game publishers are hugely incentivized to create the next big hit. But are games getting better, or has the golden age of video games already passed?

## 2. Objective

In this project, we'll explore the video games created between 1977 and 2020. We'll compare a dataset on game sales with critic and user reviews to determine whether or not video games have improved as the gaming market has grown.

## 3. Data Collection

This [data file](https://www.kaggle.com/datasets/holmjason2/videogamedata) contains over 13,000 games which includes games ranging from 1977 to the middle of 2020. Most of the data came from directly from the [VGChartz](https://www.vgchartz.com/) database but some has been manually entered in from other sources. For example a lot of the critic and user scores were entered from information available on Metacritic.

<span style="font-size: x-large;">game_sales_data</span>

|column			|data type	|description				 		 			|
|---------------|-----------|-----------------------------------------------|
|Rank			|int		|Ranking based on total sales		 			|
|Name 			|varchar	|Name of the video game							|
|Platform		|varchar	|Platform on which the video game was released	|
|Publisher  	|varchar	|Publisher of the video game		 			|
|Developer		|varchar	|Developer of the video game		 			|
|Critic_Score	|float		|Critic score given to the video game	 		|
|User_Score		|float		|User score given to the video game	 			|
|Total_Shipped	|float		|Total units sold(in millions)					|
|Year			|int		|Year the video game was released		 		|


### 3.1. Import libraries

In [1]:
# Data manipulation
import pandas as pd
# Data visualization
import matplotlib.pyplot as plt
import seaborn as sns
# Database connection
import os
from dotenv import load_dotenv
from sqlalchemy import create_engine
from urllib.parse import quote_plus

### 3.2. Database Connection

In [2]:
load_dotenv()

# MySQL database connection using SQLAlchemy
username = os.getenv('MYSQL_ROOT_USER')
password = os.getenv('MYSQL_ROOT_PASSWORD')
host = "localhost"
port = "3306"
databasename = "PROJECT"

# URL-encode the password
encoded_password = quote_plus(password)

# Construct the connection string with the encoded password
# db_uri = f"dialect+driver://username:password@host:port/database
db_uri = f"mysql+pymysql://{username}:{encoded_password}@{host}:{port}/{databasename}"
# set echo=False and all logging will be disabled
engine = create_engine(db_uri,echo=False)

In [3]:
%load_ext sql
%sql engine
%config SqlMagic.displaylimit = 20

### 3.2 Data loading
First, create a DataFrame in Python using the pandas library, and then load the dataset.


In [4]:
# Load datasets:
data = pd.read_csv('data/game_sales_data.csv', sep=',', encoding='latin1')
print('data shape:',data.shape)
print(data.head())

data shape: (19600, 9)
   Rank                              Name Platform         Publisher  \
0     1                        Wii Sports      Wii          Nintendo   
1     2                 Super Mario Bros.      NES          Nintendo   
2     3  Counter-Strike: Global Offensive       PC             Valve   
3     4                    Mario Kart Wii      Wii          Nintendo   
4     5     PLAYERUNKNOWN'S BATTLEGROUNDS       PC  PUBG Corporation   

           Developer  Critic_Score  User_Score  Total_Shipped  Year  
0       Nintendo EAD           7.7         8.0          82.90  2006  
1       Nintendo EAD          10.0         8.2          40.24  1985  
2  Valve Corporation           8.0         7.5          40.00  2012  
3       Nintendo EAD           8.2         9.1          37.32  2008  
4   PUBG Corporation           8.6         4.7          36.60  2017  


**OBS:** 
* The amount of data is almost 20k rows, we can load the data directly into the database without specifying the chunk size.
* We need some preprocessing steps before inserting the data into the MySQL database.

In [5]:
# rename columns using single words and lowercase letters:
new_column_names = {
    'Rank': 'rank',
    'Name': 'game',
    'Platform': 'platform',
    'Publisher': 'publisher',
    'Developer': 'developer',
    'Critic_Score': 'critic_score',
    'User_Score': 'user_score',
    'Total_Shipped': 'games_sold',
    'Year': 'year'
}
data.rename(columns=new_column_names, inplace=True)
print(data.head())

   rank                              game platform         publisher  \
0     1                        Wii Sports      Wii          Nintendo   
1     2                 Super Mario Bros.      NES          Nintendo   
2     3  Counter-Strike: Global Offensive       PC             Valve   
3     4                    Mario Kart Wii      Wii          Nintendo   
4     5     PLAYERUNKNOWN'S BATTLEGROUNDS       PC  PUBG Corporation   

           developer  critic_score  user_score  games_sold  year  
0       Nintendo EAD           7.7         8.0       82.90  2006  
1       Nintendo EAD          10.0         8.2       40.24  1985  
2  Valve Corporation           8.0         7.5       40.00  2012  
3       Nintendo EAD           8.2         9.1       37.32  2008  
4   PUBG Corporation           8.6         4.7       36.60  2017  


In [6]:
# remove 'rank' column and reset index:
data.drop(columns=['rank'], inplace=True)
data.reset_index(drop=True, inplace=True)

Second, verify if the `game_sales` table exists in database. Create `game_sales` table in MySQL Database:

In [7]:
%%sql
-- Check 'game_sales' table, if exist:
SHOW TABLES LIKE 'game_sales';

Tables_in_PROJECT (game_sales)


In [8]:
%%sql
-- Remove 'game_sales' TABLE, if exist:
DROP TABLE IF EXISTS game_sales;

In [9]:
%%sql
-- Create 'game_sales' table:
CREATE TABLE game_sales
(
	id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
    game VARCHAR(140) NOT NULL,
    platform VARCHAR(6),
    publisher VARCHAR(40),
    developer VARCHAR(70),
    critic_score NUMERIC(4, 2),
    user_score NUMERIC(4, 2),
    games_sold NUMERIC(5, 2),
    year INT
);

Finally, populate `game_sales` table from the python dataframe.

In [10]:
data.to_sql(name = "game_sales",
            con = engine,
            if_exists = 'append',
            index= False)
engine.dispose()

In [11]:
%%sql
-- Check 'game_sales' table:
SELECT * FROM game_sales LIMIT 5;

id,game,platform,publisher,developer,critic_score,user_score,games_sold,year
1,Wii Sports,Wii,Nintendo,Nintendo EAD,7.7,8.0,82.9,2006
2,Super Mario Bros.,NES,Nintendo,Nintendo EAD,10.0,8.2,40.24,1985
3,Counter-Strike: Global Offensive,PC,Valve,Valve Corporation,8.0,7.5,40.0,2012
4,Mario Kart Wii,Wii,Nintendo,Nintendo EAD,8.2,9.1,37.32,2008
5,PLAYERUNKNOWN'S BATTLEGROUNDS,PC,PUBG Corporation,PUBG Corporation,8.6,4.7,36.6,2017


## 4.Exploratory Data Analysis(EDA):
### 4.1. Data Dimensions

In [12]:
%%sql
-- Check the number of rows in 'game_sales' table:
SELECT COUNT(id) AS Total_rows_info FROM game_sales;

Total_rows_info
19600


**OBS:** There are 19600 rows.

### 4.2. Data Type

In [13]:
%%sql
-- Check the data type of each column in 'game_sales' table:
DESCRIBE game_sales;

Field,Type,Null,Key,Default,Extra
id,int,NO,PRI,,auto_increment
game,varchar(140),NO,,,
platform,varchar(6),YES,,,
publisher,varchar(40),YES,,,
developer,varchar(70),YES,,,
critic_score,"decimal(4,2)",YES,,,
user_score,"decimal(4,2)",YES,,,
games_sold,"decimal(5,2)",YES,,,
year,int,YES,,,


**OBS**: Data type is ok.

### 4.2. Missing values
Let's identify missing values to explore the limitations of our database. One big shortcoming is that there is not any reviews (critic_score or user_Score) data for some of the games.

In [14]:
%%sql

SELECT COUNT(*)
FROM game_sales
WHERE critic_score IS NULL
    AND user_score IS NULL;

COUNT(*)
9616


**OBS:** 
* There are 9616 missing values for the `critical_score` and `user_score` columns. We must eliminate them first to continue our analysis.

In [15]:
%%sql
-- # Remove missing values for critical_score and user score columns
DELETE FROM game_sales
WHERE critic_score IS NULL 
    AND user_score IS NULL;

### 4.3. Duplicated rows:
Let's indentify duplicated rows.

In [16]:
%%sql
-- Check if there are any duplicated row
SELECT 
    id, game, platform, publisher, 
    developer, games_sold, year, 
    critic_score, user_score
FROM game_sales
WHERE 
    (game, platform, publisher, 
    developer, games_sold, year, 
    critic_score, user_score)
    IN (SELECT
            game, platform, publisher, 
            developer, games_sold, year, 
            critic_score, user_score
        FROM game_sales
        GROUP BY
            game, platform, publisher, 
            developer, games_sold, year, 
            critic_score, user_score
        HAVING COUNT(*) > 1);

id,game,platform,publisher,developer,games_sold,year,critic_score,user_score
12498,Nights of Azure,PS4,Tecmo Koei,Gust,0.09,2016,6.7,7.2
12812,Nights of Azure,PS4,Tecmo Koei,Gust,0.09,2016,6.7,7.2


**OBS:** We found 2 duplicated rows, we must delete one of them.

In [17]:
%%sql
-- Delete duplicated row id =12812
DELETE FROM game_sales
WHERE id=12812;

## 5.Data Preprocessing:


## 6. Data Analysis


### 6.1. The ten best-selling video games
Let's begin by looking at some of the top selling video games of all time.

In [18]:
%%sql
-- Select all information for the top ten best-selling games
-- Order the results from best-selling game down to tenth best-selling
SELECT * FROM game_sales
ORDER BY games_sold DESC
LIMIT 10;

id,game,platform,publisher,developer,critic_score,user_score,games_sold,year
1,Wii Sports,Wii,Nintendo,Nintendo EAD,7.7,8.0,82.9,2006
2,Super Mario Bros.,NES,Nintendo,Nintendo EAD,10.0,8.2,40.24,1985
3,Counter-Strike: Global Offensive,PC,Valve,Valve Corporation,8.0,7.5,40.0,2012
4,Mario Kart Wii,Wii,Nintendo,Nintendo EAD,8.2,9.1,37.32,2008
5,PLAYERUNKNOWN'S BATTLEGROUNDS,PC,PUBG Corporation,PUBG Corporation,8.6,4.7,36.6,2017
6,Minecraft,PC,Mojang,Mojang AB,10.0,7.8,33.15,2010
7,Wii Sports Resort,Wii,Nintendo,Nintendo EAD,8.0,8.8,33.13,2009
8,Pokemon Red / Green / Blue Version,GB,Nintendo,Game Freak,9.4,8.8,31.38,1998
9,New Super Mario Bros.,DS,Nintendo,Nintendo EAD,9.1,8.1,30.8,2006
10,New Super Mario Bros. Wii,Wii,Nintendo,Nintendo EAD,8.6,9.2,30.3,2009


**OBS:**  
* The best-selling video games were released between 1985 to 2017.

### 6.2.Years that video game critics loved
There are lots of ways to measure the best years for video games! Let's start with what the critics think.

In [19]:
%%sql
-- Select release year and average critic score for each year, rounded and aliased
-- Group by release year
-- Order the data from highest to lowest avg_critic_score and limit to 10 results
SELECT
    year, 
    ROUND(AVG(critic_score),2) AS avg_critic_score
FROM game_sales
GROUP BY year
ORDER BY avg_critic_score DESC
LIMIT 10;

year,avg_critic_score
1984,9.5
1992,9.13
1982,9.0
1994,8.72
1990,8.63
1991,8.49
2020,8.26
1993,8.03
2019,7.88
1989,7.66


**OBS:**
* The TOP-10 great years according to critic reviews goes from 1982 until 2020. We are no closer to finding the golden age of video games!

### 6.3. Was 1984 really that great?
Some of those avg_critic_score values look like suspiciously round numbers for averages. Maybe there weren't many video games in our data set that were released in certain years. Let's update our query and find out if 1984 was really good years for video games.

In [20]:
%%sql
-- Use the previous query, update it to add a count of games released in each year called num_games
-- Update the query so that it only returns years that have more than four reviewed games
SELECT 
    year, 
    ROUND(AVG(critic_score),2) AS avg_critic_score,
    COUNT(game) AS num_games
FROM game_sales
GROUP BY year
HAVING COUNT(game) > 4
ORDER BY avg_critic_score DESC
LIMIT 10;

year,avg_critic_score,num_games
1992,9.13,5
1994,8.72,6
1990,8.63,6
1991,8.49,8
2020,8.26,9
1993,8.03,11
2019,7.88,37
1989,7.66,5
1985,7.58,5
2014,7.54,265


**OBS:** The num_games column convinces us that our new list of the critics' top games reflects years that had quite a few well-reviewed games rather than just one or two hits.

### 6.4. Years that dropped off the critics' favorites list
which years dropped off the list due to having four or fewer reviewed games? Let's identify them so that someday we can track down more game reviews for those years and determine whether they might rightfully be considered as excellent years for video game releases!  
To get started, Let's create `top_critic_years` and `top_critic_years_more_than_four_games` tables with the results of our previous two queries:

<span style="font-size: larger;">top_critic_years</span>

|column			  |data type |description				 		|
|-----------------|----------|----------------------------------|
|year			  |int       |Year of video game release.	 	|
|avg_critic_score |float	 |Average of all critic scores		|
|				  |			 |for games released in that year.	|



<span style="font-size: larger;">top_critic_years_more_than_four_games</span>

|column			  |data type |description				 		 |
|-----------------|----------|-----------------------------------|
|year			  |int       |Year of video game release.	 	 |
|num_games		  |int		 |Count of the number of video games |
|				  |			 |released in that year.			 |
|avg_critic_score |float	 |Average of all critic scores		 |
|				  |			 |for games released in that year.	 |

In [21]:
%%sql
-- Remove 'top_critic_years' TABLE, if exist:
DROP TABLE IF EXISTS top_critic_years;

-- Create 'top_critic_years' table:
CREATE TABLE top_critic_years AS
SELECT
    year, 
    ROUND(AVG(critic_score), 2) AS avg_critic_score
FROM game_sales
GROUP BY year
ORDER BY avg_critic_score DESC
LIMIT 10;

-- # Check 'top_critic_years' Table
SELECT * FROM top_critic_years LIMIT 5;

year,avg_critic_score
1984,9.5
1992,9.13
1982,9.0
1994,8.72
1990,8.63


In [22]:
%%sql
-- Remove 'top_critic_years_more_than_four_games' TABLE, if exist:
DROP TABLE IF EXISTS top_critic_years_more_than_four_games;

-- Create 'top_critic_years_more_than_four_games' table:
CREATE TABLE top_critic_years_more_than_four_games AS
SELECT 
    year, 
    ROUND(AVG(critic_score),2) AS avg_critic_score,
    COUNT(game) AS num_games
FROM game_sales
GROUP BY year
HAVING COUNT(game) > 4
ORDER BY avg_critic_score DESC
LIMIT 10;

-- # Check 'top_critic_years_more_than_four_games' Table
SELECT * FROM top_critic_years_more_than_four_games LIMIT 5;

year,avg_critic_score,num_games
1992,9.13,5
1994,8.72,6
1990,8.63,6
1991,8.49,8
2020,8.26,9


Let's identify the years that dropped off the critics' favorites list

In [23]:
%%sql 

-- Select the year and avg_critic_score for those years that dropped off the list of critic favorites 
-- Order the results from highest to lowest avg_critic_score
SELECT year, avg_critic_score
FROM top_critic_years
EXCEPT
SELECT year, avg_critic_score
FROM top_critic_years_more_than_four_games
ORDER BY avg_critic_score DESC;

year,avg_critic_score
1984,9.5
1982,9.0


**OBS:** 
* It looks like between 1982 and 1984 might be considered as the golden age of video games based on critic_score alone.

Let's check the games between 1982 and 1984.

In [24]:
%%sql
-- Filter by year between 1982 and 1984
SELECT *
FROM game_sales
WHERE year BETWEEN 1982 AND 1984;

id,game,platform,publisher,developer,critic_score,user_score,games_sold,year
140,Pac-Man,2600,Atari,Atari,9.0,7.4,7.7,1982
1477,F1 Race,NES,Nintendo,Nintendo,9.5,7.4,1.52,1984


**OBS:** 
* Regarding `critic_score` column, some games like Pac-man and F1 Race got a High critic_Score.
* Pac-Man is considered by many to be one of the most influential video games of all time. [Reference](https://web.archive.org/web/20151003105413/http://www.1up.com/features/essential-50-pac-man)
* We'd need to gather more games and reviews data to do further analysis.

### 6.5. Years video game players loved
Let's move on to looking at the opinions of another important group of people: players! To begin, let's create a query very similar to the one we used in Task Four, except this one will look at user_score averages by year rather than critic_score averages.

In [25]:
%%sql 

-- Select year, an average of user_score, and a count of games released in a given year.
-- Include only years with more than four reviewed games; group data by year
-- Order data by avg_user_score, and limit to ten results
SELECT
    year, 
    ROUND(AVG(user_score),2) AS avg_user_score,
    COUNT(game) AS num_games
FROM game_sales
GROUP BY year
HAVING COUNT(game) > 4
ORDER BY avg_user_score DESC
LIMIT 10;

year,avg_user_score,num_games
1993,9.5,11
1990,9.3,6
1997,9.25,139
1999,9.1,253
1991,8.8,8
1987,8.8,5
1998,8.8,176
1994,8.8,6
2010,8.78,529
2007,8.57,719


**OBS:** we've got a list of the top ten years according user reviews.

### 6.6. Years that both players and critics loved
First, let's save the results of our top user years query from the previous task into a new table:

<span style="font-size: larger;">top_user_years_more_than_four_games</span>

|column			  |data type |description				 		 |
|-----------------|----------|-----------------------------------|
|year			  |int       |Year of video game release.	 	 |
|num_games		  |int		 |Count of the number of video games |
|				  |			 |released in that year.			 |
|avg_user_score   |float	 |Average of all user scores		 |
|				  |			 |for games released in that year.	 |

In [26]:
%%sql
-- Remove 'top_user_years_more_than_four_games' table, if exist:
DROP TABLE IF EXISTS top_user_years_more_than_four_games;

-- Create 'top_user_years_more_than_four_games' table
CREATE TABLE top_user_years_more_than_four_games AS
SELECT
    year, 
    ROUND(AVG(user_score),2) AS avg_user_score,
    COUNT(game) AS num_games
FROM game_sales
GROUP BY year
HAVING COUNT(game) > 4
ORDER BY avg_user_score DESC
LIMIT 10;

-- Check 'top_user_years_more_than_four_games' table
SELECT * FROM top_user_years_more_than_four_games LIMIT 5;

year,avg_user_score,num_games
1993,9.5,11
1990,9.3,6
1997,9.25,139
1999,9.1,253
1991,8.8,8


Now, let's find out the Years that both players and critics loved.

In [27]:
%%sql 

-- Select the year that appear on both tables
SELECT TC.year, TC.avg_critic_score, TU.avg_user_score, TU.num_games
FROM top_critic_years_more_than_four_games AS TC
INNER JOIN top_user_years_more_than_four_games AS TU
    ON TC.year = TU.year;

year,avg_critic_score,avg_user_score,num_games
1993,8.03,9.5,11
1990,8.63,9.3,6
1991,8.49,8.8,8
1994,8.72,8.8,6


**OBS:**
* It looks like we have the years between 1990 and 1994 that both users and critics agreed were in the top ten.
* There are many other ways of measuring what the best years for video games are, but let's stick with these years for now.

### 6.7. Sales in the best video game years
We know that critics and players liked these years, but what about video game makers? Were sales good? Let's find out. we'll use the query from the previous task as a subquery.

In [28]:
%%sql 

-- Select year and sum of games_sold, aliased as total_games_sold; order results by total_games_sold descending
-- Filter game_sales based on whether each year is in the list returned in the previous task
SELECT
    year,
    SUM(games_sold) AS total_games_sold
FROM game_sales
WHERE year IN (
    SELECT TC.year 
    FROM top_critic_years_more_than_four_games AS TC
    INNER JOIN top_user_years_more_than_four_games AS TU
        ON TC.year = TU.year)
GROUP BY year
ORDER BY total_games_sold DESC;

year,total_games_sold
1991,45.3
1993,25.64
1990,23.63
1994,22.61


**OBS:** It is interesting that between 1990 and 1994 sales were also good.

### 6.8. Video games between 1990 and 1994
Finally, lest check the video games between the years of 1990 and 1994.

In [29]:
%%sql
-- Query the games between 1990 and 1994
SELECT *
FROM game_sales
WHERE year BETWEEN 1990 AND 1994
    AND critic_score IS NOT NULL 
    AND user_score IS NOT NULL;

id,game,platform,publisher,developer,critic_score,user_score,games_sold,year
22,Super Mario World,SNES,Nintendo,Nintendo EAD,8.5,9.3,20.61,1991
34,Super Mario Bros. 3,NES,Nintendo,Nintendo R&D2,9.8,9.3,17.28,1990
45,Sonic the Hedgehog,GEN,Sega,Sonic Team,8.6,8.3,15.0,1991
85,Super Mario Land 2: 6 Golden Coins,GB,Nintendo,Nintendo R&D1,9.0,8.5,11.18,1992
95,Super Mario All-Stars,SNES,Nintendo,Nintendo EAD,9.2,9.5,10.55,1993
114,Donkey Kong Country,SNES,Nintendo,Rare Ltd.,9.0,8.8,9.3,1994


**OBS:** Between 1990 and 1994, `Super Mario` and `Donkey Kong` emerged as beloved games among both players and critics on the SNES console, a sentiment clearly reflected in their impressive unit sales numbers.

## 7.Conclusion

* Although our database was not complete with review information, we managed to enter enough information to discover the golden age of video games.
* We discovered that between 1990 and 1994, both users and critics could consider it the golden age of video games.
* From a financial point of view, the games sold a lot. Between 9 and 20 million units each.

## 8.References
* https://www.datacamp.com/projects/1413
* https://www.kaggle.com/datasets/holmjason2/videogamedata
* https://github.com/NII-CODES/When-Was-the-Golden-Age-of-Video-Games/tree/main