# Video Game Sales with Metacritic Ratings and Comments
This dataset is a combination of Gregory Smiths's web scrape of VGChartz's video game sales with accompanying variables from a web scrape of Metacritic by Rush Kirubi. VGChartz is a video game sales tracking website, and Metacritic is a review aggregator for movies, TV shows, music albums, and video games. 

Sources: https://www.kaggle.com/gregorut/videogamesales  
&emsp;&emsp;&emsp;&emsp;https://www.kaggle.com/rush4ratio/video-game-sales-with-ratings, accessed on July 26th, 2020. 
 
The Metacritic scraper is based on https://github.com/wtamu-cisresearch/scraper, and the VGChartz scraper is available at https://github.com/GregorUT/vgchartzScrape, and is based on BeautifulSoup.

Not all of the VGChartz observations have accompanying Metacritic data, as Metacritic does not cover all video game platforms. Those that are covered may not have complete Metacritic data.
There are around 6,900 observations with complete VGChartz and Metacritic data.

The data are through December 22nd, 2016.

In [6]:
#Import necessary packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

try:
    import seaborn as sns
    print("Module 'seaborn' is installed")
except ModuleNotFoundError:
    %pip install seaborn
    import seaborn as sns

import matplotlib.ticker as tick

import datetime

Module 'seaborn' is installed


In [None]:
#Read in data
df = pd.read_csv("Datasets/Video_Games_Sales_as_at_22_Dec_2016.csv")

In [None]:
#Display top 50 rows
df.head(10)

In [None]:
#Group by Year and Genre, getting sum totals, and filter to 2000-2011

dfCatYear = df.groupby(by = ["Year_of_Release", "Genre"]).sum()
dfCatYear = dfCatYear.reset_index()
dfCatYear.head(5)

dfCatYear2000s = dfCatYear[dfCatYear["Year_of_Release"].isin([2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011])]
dfCatYear2000s.dtypes
#dfCatYear2000s = pd.to_datetime(dfCatYear2000s["Year_of_Release"])

In [None]:
#Create arrays for heatmaps by region, and create heatmaps
arrayJP = dfCatYear2000s.pivot("Genre", "Year_of_Release", "JP_Sales")
arrayNA = dfCatYear2000s.pivot("Genre", "Year_of_Release", "NA_Sales")
arrayEU = dfCatYear2000s.pivot("Genre", "Year_of_Release", "EU_Sales")

fig, ax = plt.subplots(1, 3)
fig.set_figwidth(19.2)
fig.set_figheight(10.8)
fig.tight_layout(pad = 10)

xLabels = []
for i in range(12):
    if i < 10:
        xLabels.append("200" + str(i))
    else:
        xLabels.append("20" + str(i))

ax1 = sns.heatmap(arrayJP, ax=ax[0], square = True)
ax1.invert_yaxis()
ax1.set_title("Japanese Game Sales (in millions)")
ax1.set(xlabel="Year of Release")
ax1.set_xticklabels(xLabels)

ax2 = sns.heatmap(arrayNA, ax=ax[1], square = True)
ax2.invert_yaxis()
ax2.set_title("North American Game Sales (in millions)")
ax2.set(xlabel="Year of Release")
ax2.set_xticklabels(xLabels)

ax3 = sns.heatmap(arrayEU, ax=ax[2], square = True)
ax3.invert_yaxis()
ax3.set_title("European Game Sales (in millions)")
ax3.set(xlabel="Year of Release")
ax3.set_xticklabels(xLabels)