# Comparing Sales vs. User and Critic Ratings for Video Games released between 2015 and 2019

dataset: https://www.kaggle.com/datasets/ashaheedq/video-games-sales-2019?resource=download

webscraper available at: https://github.com/ashaheedq/vgchartzScrape

### Purpose

The purpose of this project is to determine the relationship of sales and perceived quality. Does higher quality equate to better sales? This is a question companies may have when determining how much time to spend on a project before considering it complete. This relationship may be especially noticeable in the gaming world where finished products may be rife with bugs, low-quality textures, or lack of innovation. Many gamers are convinced that games that sell well do not provide an incentive to game developers to innovate or make their games better, regardless if they are well received or not.

### Findings

A link to the Tableau dashboard: https://public.tableau.com/app/profile/ng1808/viz/VGSalesvs_Scores/SalesvScores

Based on the scatterplot, games that receive better critic scores tend to sell more copies but only slightly. However, this does not hold true for user scores where there does not appear to be a clear relationship between score and sales. Poorly reviewed games, such as those published by Electronic Arts, EA Sports, and Activision, still sell more than higher rated titles. For example, The Legend of Zelda: Breath of the Wild (which is considered one of the best games on the Nintendo Switch) is highly rated by critics and users and just barely outsold FIFA 18 which is the 3rd lowest rated game by users in the dataset. FIFA 18 outsold Super Mario Odyssey (another bestseller on the Nintendo Switch that is highly rated).

There appears to be an outlier in the dataset. PlayerUnknown's Battlegrounds (PUBG) has sold 27M more copies than the next highest selling title (Mario Kart 8 Deluxe). Despite selling so many copies, it only receieved a score of 81 and 43 from critics and users, respectively. Why did it sell so much despite lower ratings? PUBG was the first "battle royale" style game that was released that wasn't a mod for an existing game. Battle royale games became increasingly popular toward the end of the 2010s. Many of the popular games of this genre were not included in this dataset because they were free-to-play and PUBG did not become free-to-play until 2022. However, if this data point is excluded from the rest of the data, we can see that the trendlines show a positive relationship between score and sales.

What is the conclusion of this visualization? Does it pay off for big publishers to spend more time on individual video games? It's hard to say with just this data alone. For example, FIFA games are produced annually and likely don't take a lot of development time due to the overall gameplay not changing between iterations and being able to reuse the game engine. FIFA 16 - FIFA 19 sold a combined 79M copies over the course of 4 years. On the other hand, Nintendo's flagship titles on the Switch, The Legend of Zelda: Breath of the Wild and Super Mario Odyssey, were in development for 5 years and 3 years, respectively. These games sold a combined 52M copies. But what about Mario Kart 8 Deluxe that sold over 48M copies? Well, this is confounded by a few variables. For one, this game was originally released on the Wii U in 2014 and re-released on the Switch (as a new game that included content not in the original release). Therefore, there essentially wasn't any development time for this game. Secondly, this game was bundled with Nintendo Switch consoles which would also add to its sales. PUBG was only in development for one year until it released and sold 75M copies, while Red Dead Redemption 2 was in development for 7 years and sold 46M copies. It appears well-received games don't actually sell more than poorly-rated games and that effort put into a game by developers (measured by time in development) does not lead to more sales. 

### Data Cleaning

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

In [2]:
data = pd.read_csv('./Data/vgsales-12-4-2019-short.csv')
data.drop(['Rank', 'ESRB_Rating', 'Genre', 'Developer', 'Platform', 'Total_Shipped', 'Global_Sales', 
           'NA_Sales', 'PAL_Sales', 'JP_Sales', 'Other_Sales'], axis=1, inplace=True)

# remove rows where name, publisher, and year are same
data.drop_duplicates(subset=['Name', 'Publisher', 'Year'], inplace=True)

# remove GTAV because only PC version released in time period
data.drop(data[data['Name'] == 'Grand Theft Auto V'].index, inplace=True)

# remove Let's Go, Pikachu because Pikachu/Eevee already used
data.drop(data[data['Name'] == "Pokemon: Let's Go, Pikachu!"].index, inplace=True)

Top50 = data.loc[data['Year'].isin([2015, 2016, 2017, 2018, 2019, 2020])].head(50)

In [3]:
# sales data taken from https://www.vgchartz.com/

Top50_sales = {"PlayerUnknown's Battlegrounds": 75,
                'Pokemon Sun/Moon': 16.29,
                'Call of Duty: Black Ops 3': 15.09+7.37+2.01+1.95+0.3,
                'Mario Kart 8 Deluxe': 48.41,
                'Red Dead Redemption 2': 46,
                'Super Mario Odyssey': 24.4,
                    'Call of Duty: WWII': 13.4+6.23+0.19,
                    'Super Smash Bros. Ultimate': 29.53,
                    'FIFA 18': 26.4,
                    'The Legend of Zelda: Breath of the Wild': 27.79,
                    'FIFA 17': 10.94+3.71+1.36+0.82+0.19,
                    'Horizon: Zero Dawn': 20,
                    "Pokemon: Let's Go, Pikachu/Eevee": 14.81,
                    'Call of Duty: Black Ops IIII': 9.32+4.85+0.13,
                    'FIFA 19': 20,
                    'Rust': 12.48,
                    "Marvel's Spider-Man": 20,
                    "Uncharted 4: A Thief's End": 16.25,
                    'Call of Duty: Infinite Warfare': 8.48+4.79+0.33,
                    'Fallout 4': 20,
                    'Pokemon: Ultra Sun and Ultra Moon': 9.12,
                    'Splatoon 2': 13.3,
                    'FIFA 16': 8.22+3.25+2.55+1.59+0.21,
                    'Star Wars Battlefront (2015)': 14,
                    'Battlefield 1': 7.26+5.13+0.77,
                    'Yokai Watch 2: Bony Spirits / Fleshy Souls / Psychic Specters': 7.3,
                    'God of War (2018)': 23,
                    'Uncharted: The Nathan Drake Collection': 5.7,
                    'The Witcher 3: Wild Hunt': 40,
                    'Super Mario Party': 18.35,
                    'Final Fantasy XV': 10,
                    'Cities: Skylines': 6,
                    'Halo 5: Guardians': 5,
                    'Splatoon': 26.15,
                    'Stardew Valley': 20,
                    'Crash Bandicoot N. Sane Trilogy': 10,
                    'Monster Hunter: World': 20,
                    'Overwatch': 4.54+2.72+0.92,
                    'Star Wars Battlefront II (2017)': 9,
                    'ARK: Survival Evolved': 7,
                    "Tom Clancy's The Division": 10,
                    "Tom Clancy's Rainbow Six: Siege": 10,
                    'Monster Hunter Generations': 8.8,
                    'Monster Hunter 4 Ultimate': 4.2,
                    'Destiny 2': 4.14+2.35+0.16,
                    'Batman: Arkham Knight': 5,
                    "Assassin's Creed Origins": 10,
                    'Super Mario Maker': 4.02,
                    'NBA 2K16': 8,
                    'Far Cry 5': 10}

Top50['Total_Sales (millions)'] = Top50['Name'].map(Top50_sales)

In [4]:
# exported data to CSV to more easily add user and critic scores

# import data back, drop old index, sort data, rename columns, reorganize columns
Top50_scored = pd.read_csv('./vgsales_2019.csv')
Top50_scored.drop('Unnamed: 0', axis=1, inplace=True)
Top50_scored.sort_values(['Total_Sales (millions)', 'User_Score', 'Critic_Score'], ascending=False, inplace=True)
Top50_scored.rename({'Name': 'Game',
                     'Total_Sales (millions)': 'Total_Sales_Millions'},
                    axis=1, inplace=True)
Top50_scored = Top50_scored[['Game', 'Publisher', 'Total_Sales_Millions', 'User_Score',
                            'Critic_Score', 'Year']]

# For Tableau
# Top50_scored.to_csv('Top 50 Video Games Sales and Reviews 2015-2019.csv', index=False)

In [5]:
Top50_scored

Unnamed: 0,Game,Publisher,Total_Sales_Millions,User_Score,Critic_Score,Year
0,PlayerUnknown's Battlegrounds,PUBG Corporation,75.0,43.0,81.0,2017
3,Mario Kart 8 Deluxe,Nintendo,48.41,86.0,92.0,2017
4,Red Dead Redemption 2,Rockstar Games,46.0,86.0,97.0,2018
28,The Witcher 3: Wild Hunt,Warner Bros. Interactive Entertainment,40.0,92.0,92.0,2015
7,Super Smash Bros. Ultimate,Nintendo,29.53,86.0,93.0,2018
9,The Legend of Zelda: Breath of the Wild,Nintendo,27.79,87.0,97.0,2017
2,Call of Duty: Black Ops 3,Activision,26.72,52.0,81.0,2015
8,FIFA 18,EA Sports,26.4,36.0,84.0,2017
33,Splatoon,Nintendo,26.15,87.0,81.0,2015
5,Super Mario Odyssey,Nintendo,24.4,89.0,97.0,2017
