## A quick look at the platforms and genres in this data set.

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns; sns.set()

import matplotlib.pyplot as plt
%matplotlib inline 

from matplotlib_venn import venn3 

#ignore some anoying warnings from the seaborn plot
import warnings
warnings.filterwarnings('ignore')

In [None]:
# Import the data
gamesDB=pd.read_csv('../input/ign.csv')


# Print an overview of the data set
print("number of rows: ",gamesDB.shape[0])
gamesDB.head()

## A look at the generes

There are a whole lot of generes with few titles in this data set. To avoid drowning in details I only keep generes with more than 500 titles in total.

In [None]:
# Remove genres with less than 500 titles.
genres=gamesDB.groupby('genre')['genre']
genres_count=genres.count()
large_genres=genres_count[genres_count>=500]
large_genres.sort_values(ascending=False,inplace=True)
print("number of genres with at least 500 titles: ",large_genres.shape[0] )
print(large_genres)


### Lets plot the genres by the year of the review

In [None]:
genre_list=large_genres.index.values

#Extract only the genres with more than 500 titles.
gamesDB_large_genre=gamesDB[gamesDB['genre'].isin(genre_list)]

#Use pandas pivot table to aggregate the number of releases by year
table_genre_by_year = pd.pivot_table(gamesDB_large_genre,values=['score'],index=['release_year'],columns=['genre'],aggfunc='count',margins=False)
table_genre_by_year['score'].plot(kind='bar', stacked=True,figsize=(12,7),colormap='Accent')

To play with other colormaps, this link is usefull: 

http://scipy.github.io/old-wiki/pages/Cookbook/Matplotlib/Show_colormaps

## A look at the platforms

As with the genres, there are a whole lot of platforms with few titles in this data set. To avoid drowning in details I only keep platforms with more than 500 titles in total.

In [None]:
# Remove platforms with less than 500 titles.
platforms=gamesDB.groupby('platform')['platform']
platforms_count=platforms.count()
large_platforms=platforms_count[platforms_count>=500]
large_platforms.sort_values(ascending=False,inplace=True)
print("number of platforms with at least 500 titles: ",large_platforms.shape[0] )
print(large_platforms)

### Lets plot the platforms by the year of the review

In [None]:
platform_list=large_platforms.index.values

#Extract only the platforms with more than 500 titles.
gamesDB_large_platform=gamesDB[gamesDB['platform'].isin(platform_list)]

#Use pandas pivot table to aggregate the number of releases by year
table_platform_by_year = pd.pivot_table(gamesDB_large_platform,values=['score'],index=['release_year'],columns=['platform'],aggfunc='count',margins=False)
table_platform_by_year['score'].plot(kind='bar', stacked=True,figsize=(12,7),colormap='Accent')


## Lets look at platform and genre in combination.
According to , ["How to look smart in meetings"](http://thecooperreview.com/10-tricks-appear-smart-meetings/) one should always draw a Venn diagram... so here is mine :-)

This diagram shows how much of the data we loose by selecting reviews that are for a large platform and a in a large genre (i.e. more than 500 reviews)

In [None]:
# Venn diagram of selected rows
all_titles_set=set(gamesDB['Unnamed: 0'].values.flatten())
platform_set=set(gamesDB_large_platform['Unnamed: 0'].values.flatten())
genre_set=set(gamesDB_large_genre['Unnamed: 0'].values.flatten())

venn3([all_titles_set, platform_set, genre_set], ('All titles', 'large platforms', 'large genres'))

plt.show()

from this diagram we can see that: 

 - 2,938 titles are on a large platform but not in a large genre.
 - 2,449 titles are in a large genre but not on a large platform.
 - 12,806 titles are both on a large platform and in a large genre. this is about 69% of all the data, and this subset will be used in the combined analysis.


In [None]:
#screen for both large genre and platform
gamesDB_large=gamesDB[gamesDB['genre'].isin(genre_list)]
gamesDB_large=gamesDB_large[gamesDB_large['platform'].isin(platform_list)]

#create pivot table of the number of reviews for each platform and genre combination
table_count = pd.pivot_table(gamesDB_large,values=['score'],index=['platform'],columns=['genre'],aggfunc='count',margins=False)



## number of reviews by platform and genre

In [None]:
sns.set_context("talk") # make the table a bit bigger than the default

# use the table_count['score'], since the pivot_table returns a multiindex 
# that does not look nice when drawn in the heat map 
ax=sns.heatmap(table_count['score'], linewidths=.5,annot=True, fmt="d")

Action and sports games seems to be the most prevalent across all platforms, except for PC where the strategy games is by far the largest category... this table confirmed my suspicion that Strategy games are by far most popular by PC gamers. 

## average review score by platform and genre

In [None]:
#create pivot table of the average review score for each platform and genre combination
table_avg_score = pd.pivot_table(gamesDB_large,values=['score'],index=['platform'],columns=['genre'],aggfunc=np.average,margins=True)


In [None]:
cmap=sns.diverging_palette(10, 220, sep=80, n=7, as_cmap=True)
sns.set_context("talk") # make the table a bit bigger than the default

ax=sns.heatmap(table_avg_score['score'], linewidths=.5,annot=True,cmap=cmap)

The RPG Genre is to be the genre that has the highest average score 7.6, and har been especially sucessfull in the Xbox and Wireless platforms.

iPhone and Xbox is the platforms that has the highest average score of 7.3


### Thanks for uploading this interesting dataset, and I hope this notebook has been interesting and useful.