## 20 Years of Games -- IGN Dataset

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import warnings
%matplotlib inline
warnings.filterwarnings("ignore")



ign_df = pd.read_csv("../input/ign.csv") # DataForm


ign_df = ign_df.drop("url", axis=1) # no need to analyze the url#

colnames = list(ign_df.columns.values)
print(ign_df.shape[0])

ign_df.head()

### Console wars with data: What's the best platform? 

In [None]:

g_per_platform = []
for platform,data in ign_df.groupby('platform'):
    if data.shape[0] >= 70: #filter platforms with little games
        g_per_platform.append((platform,data.score.mean(),data.shape[0]))
    
g_per_platform = pd.DataFrame.from_records(g_per_platform,
                                    columns=('name','avg_score','n_games'))
g_per_platform = g_per_platform.sort('avg_score')

g_per_platform

It's sad to see Nintendo taking the 4 first places on worst average score; I sure expected to see the Nintendo Wii heading the list, but the GBC is surely a surprise. It also reassures us of what we already knew: The Macintosh is the ultimate gamer platform (sarcasm!). 

In [None]:
# remove platforms with a small number of entries
df= ign_df[ign_df['platform'].isin(g_per_platform.name)]

hist = df['score'].hist(alpha=0.5,bins=40)
hist.set_xlabel('Scores')
hist.set_ylabel('Number of games')

The entire dataset doesn't seem to follow a well known distribution. The mode lies around 8.0, which indicates that, if you pick a game at random, it's got a better chance of being good than bad. 

### The platform with the most "Editor's Choice" awards

In [None]:
awards = []
award_rates = []
for plat,data in df.groupby('platform'):
    n_awards = data[data['editors_choice'] == 'Y'].shape[0]
    
    awards.append((plat,n_awards,data.shape[0],n_awards/data.shape[0]))
awards = pd.DataFrame.from_records(awards,
                                    columns=('platform','n_awards','n_games','award_rate'))



awards

In [None]:
bar = awards[['award_rate']].plot.bar(x=awards['platform'],
                                legend=False)
bar.set_ylabel('award_rate')

And again, the Macintosh gets the prize! For every two Mac games reviewed on the site, one gets the Editor's Choice. The number of Mac games is not that big (81 samples in the database), so we can't say get many conclusions.

### Game releases by month
`
It's common knowledge a great part of the games come out
by the end of the year. Let's make an histogram by months to see if it's true.
`

In [None]:
month_list = ['January','February','March','April','May','June',
                 'July','August','September','October',
                     'November','December']

by_month = []
for month,data in ign_df.groupby('release_month'):
    by_month.append((month_list[month-1],data.score.mean(),data.shape[0]))
                    
by_month = pd.DataFrame.from_records(by_month,
                                    columns=('month','avg_score','n_games'))

by_month

In [None]:
by_month = by_month[['month','n_games']]
pie = by_month.plot.pie('n_games',labels=month_list,
                       legend=False)
pie.set_ylabel('Releases per month')

That confirms what we already knew: almost 30% of all games are published during October and November. Also notice that the average 
score in december is lower than the average. Maybe the december games are released in a hurry!

### Growth of the industry

Let's check how the growth of the game industry in releases per year.

In [None]:
by_year = []
for year,data in ign_df.groupby('release_year'):
    by_year.append((year,data.score.mean(),data.shape[0]))
                    
by_year = pd.DataFrame.from_records(by_year,
                                    columns=('year','avg_score','n_games'))

bar = by_year[['n_games']].plot.bar(x=by_year['year'],
                                legend=False)

by_year

I didn't expect that; There seems to be a slowdown in the game industry going on. But before making any conclusions, let's filter for unique title and do it again.

(The 1970's game is an 'Easter Egg' introduced by IGN)

In [None]:
by_year = []
no_duplicates = ign_df[['title','release_year']].drop_duplicates()
for year,data in no_duplicates.groupby('release_year'):
    by_year.append((year,data.shape[0]))
                    
by_year = pd.DataFrame.from_records(by_year,
                                    columns=('year','n_games'))

bar = by_year[['n_games']].plot.bar(x=by_year['year'],
                                legend=False)

by_year

It seems to be roughly the same allure. Maybe IGN is stopping making reviews? Or maybe it's really a decline in the game industry? 