## Game of Thrones Analysis

For the first step of this exercise we need to import the pandas, numpy and datetime libraries for that we use the code below:

In [1]:
import pandas as pd
import numpy as np
import datetime as dt

The dataset is stored in Dropbox. We import it using the following code:

In [2]:
GoT_df = pd.read_csv('https://www.dropbox.com/s/qx1r7c872qp776y/Game_of_Thrones.csv?dl=1')

In the previouse cell we named our dataframe 'GoT_df'. We use the .head() method to preview the dataframe:

In [3]:
GoT_df.head()

Unnamed: 0,Season,No. of Episode (Season),No. of Episode (Overall),Title of the Episode,Running Time (Minutes),Directed by,Written by,Original Air Date,U.S. Viewers (Millions),Music by,Cinematography by,Editing by,IMDb Rating,Rotten Tomatoes Rating (Percentage),Metacritic Ratings,Ordered,Filming Duration,Novel(s) Adapted,Synopsis
0,1,1,1,Winter Is Coming,61,Tim Van Patten,"David Benioff, D. B. Weiss",4/17/2011,2.22,Ramin Djawadi,Alik Sakharov,Oral Norrie Ottey,8.9,100,9.1,3/2/2010,Second half of 2010,A Game of Thrones,"North of the Seven Kingdoms of Westeros, Night..."
1,1,2,2,The Kingsroad,55,Tim Van Patten,"David Benioff, D. B. Weiss",4/24/2011,2.2,Ramin Djawadi,Alik Sakharov,Oral Norrie Ottey,8.6,100,8.9,3/2/2010,Second half of 2010,A Game of Thrones,"Ned, the new Hand of the King, travels to King..."
2,1,3,3,Lord Snow,57,Brian Kirk,"David Benioff, D. B. Weiss",5/1/2011,2.44,Ramin Djawadi,Marco Pontecorvo,Frances Parker,8.5,81,8.7,3/2/2010,Second half of 2010,A Game of Thrones,Ned attends the King's Small Council and learn...
3,1,4,4,"Cripples, Bastards, and Broken Things",55,Brian Kirk,Bryan Cogman,5/8/2011,2.45,Ramin Djawadi,Marco Pontecorvo,Frances Parker,8.6,100,9.1,3/2/2010,Second half of 2010,A Game of Thrones,"While returning to King's Landing, Tyrion stop..."
4,1,5,5,The Wolf and the Lion,54,Brian Kirk,"David Benioff, D. B. Weiss",5/15/2011,2.58,Ramin Djawadi,Marco Pontecorvo,Frances Parker,9.0,95,9.0,3/2/2010,Second half of 2010,A Game of Thrones,"King Robert's eunuch spy, Varys, has uncovered..."


We will then check on the data types to see if we need to make any changes:

In [4]:
GoT_df.dtypes

Season                                   int64
No. of Episode (Season)                  int64
No. of Episode (Overall)                 int64
Title of the Episode                    object
Running Time (Minutes)                   int64
Directed by                             object
Written by                              object
Original Air Date                       object
U.S. Viewers (Millions)                float64
Music by                                object
Cinematography by                       object
Editing by                              object
IMDb Rating                            float64
Rotten Tomatoes Rating (Percentage)      int64
Metacritic Ratings                     float64
Ordered                                 object
Filming Duration                        object
Novel(s) Adapted                        object
Synopsis                                object
dtype: object

We can see that the Original Air Date and Ordered columns are an 'object' data type. We need to change this to a date time format:

In [5]:
GoT_df['Original Air Date'] = pd.to_datetime(GoT_df['Original Air Date'],format = "%m/%d/%Y")
GoT_df['Ordered'] = pd.to_datetime(GoT_df['Ordered'],format = "%m/%d/%Y")

We then run the .dtypes method again to verify that the column is the correct data type:

In [6]:
GoT_df.dtypes

Season                                          int64
No. of Episode (Season)                         int64
No. of Episode (Overall)                        int64
Title of the Episode                           object
Running Time (Minutes)                          int64
Directed by                                    object
Written by                                     object
Original Air Date                      datetime64[ns]
U.S. Viewers (Millions)                       float64
Music by                                       object
Cinematography by                              object
Editing by                                     object
IMDb Rating                                   float64
Rotten Tomatoes Rating (Percentage)             int64
Metacritic Ratings                            float64
Ordered                                datetime64[ns]
Filming Duration                               object
Novel(s) Adapted                               object
Synopsis                    

With the columns on their correct data types we can start working on our analysis. The first question we need to answer is What is the average viewership per Season. We do that with the following function:

In [7]:
GoT_df_Season_Info = GoT_df.groupby('Season')

GoT_df_Season_Info_Views = GoT_df_Season_Info['U.S. Viewers (Millions)'].agg(np.average)

print (GoT_df_Season_Info_Views)

Season
1     2.515000
2     3.795000
3     4.966000
4     6.846000
5     6.880000
6     7.688000
7    10.261429
8    11.993333
Name: U.S. Viewers (Millions), dtype: float64


The second question we need to answer is What is the longest and shortest season in terms or running time?. We do that with the following function:

In [8]:
GoT_df_Season_Info_Time = GoT_df_Season_Info['Running Time (Minutes)'].agg(np.sum)

print (GoT_df_Season_Info_Time)

Season
1    557
2    540
3    551
4    542
5    555
6    555
7    434
8    423
Name: Running Time (Minutes), dtype: int64


The third question we need to answer is What is the best and worst rated season? In both IMDB, Rotten Tomatoes and Metatritic?. We do that with the following function:

In [9]:
GoT_df_Season_Info_Ratings = (GoT_df_Season_Info['IMDb Rating'].agg(np.average), 
GoT_df_Season_Info['Rotten Tomatoes Rating (Percentage)'].agg(np.average), 
GoT_df_Season_Info['Metacritic Ratings'].agg(np.average))

print (GoT_df_Season_Info_Ratings)

(Season
1    8.970000
2    8.810000
3    8.930000
4    9.230000
5    8.710000
6    8.990000
7    9.028571
8    6.416667
Name: IMDb Rating, dtype: float64, Season
1    97.100000
2    97.000000
3    93.700000
4    96.700000
5    89.500000
6    92.400000
7    91.857143
8    67.833333
Name: Rotten Tomatoes Rating (Percentage), dtype: float64, Season
1    9.120000
2    8.700000
3    8.750000
4    8.950000
5    8.300000
6    6.700000
7    5.857143
8    4.083333
Name: Metacritic Ratings, dtype: float64)


As we can see for all 3 ratign systems the top rated season was season 1. The lowest rated season was Season 8