<h1 align="center">Exploratory Analysis : Game of Thrones</h1> 
![Game of Thrones](https://upload.wikimedia.org/wikipedia/en/d/d8/Game_of_Thrones_title_card.jpg)

One of the most popular television series of all time, Game of Thrones is a fantasy drama set in fictional continents of Westeros and Essos filled with multiple plots and a huge number of characters all battling for the Iron Throne! It is an adaptation of _Song of Ice and Fire_ novel series by **George R. R. Martin**.

Being a popular series, it has caught the attention of many, and Data Scientists aren't to be excluded. This notebook presents **Exploratory Data Analysis (EDA)** on the _Kaggle_ dataset enhanced by _Myles O'Neill_ (more details: [click here](https://www.kaggle.com/mylesoneill/game-of-thrones)). This dataset is based on a combination of multiple datasets collected and contributed by multiple people. We utilize the ```battles.csv``` in this notebook. The original battles data was presented by _Chris Albon_, more details are on [github](https://github.com/chrisalbon/war_of_the_five_kings_dataset)

---
The image was taken from Game of Thrones, or from websites created and owned by HBO, the copyright of which is held by HBO. All trademarks and registered trademarks present in the image are proprietary to HBO, the inclusion of which implies no affiliation with the Game of Thrones. The use of such images is believed to fall under the fair dealing clause of copyright law.

## Import required packages

In [1]:
import cufflinks as cf

import pandas as pd
from collections import Counter

# pandas display data frames as tables
from IPython.display import display, HTML

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.


### Set Configurations

In [2]:
cf.set_config_file(theme='white')

In [3]:
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)

## Load Dataset

In this step we load the ```battles.csv``` for analysis

In [4]:
# load dataset using cufflinks wrapper for later usage with plot.ly plots
battles_df = cf.pd.read_csv('battles.csv')

In [5]:
# Display sample rows
display(battles_df.head())

Unnamed: 0,name,year,battle_number,attacker_king,defender_king,attacker_1,attacker_2,attacker_3,attacker_4,defender_1,...,major_death,major_capture,attacker_size,defender_size,attacker_commander,defender_commander,summer,location,region,note
0,Battle of the Golden Tooth,298,1,Joffrey/Tommen Baratheon,Robb Stark,Lannister,,,,Tully,...,1.0,0.0,15000.0,4000.0,Jaime Lannister,"Clement Piper, Vance",1.0,Golden Tooth,The Westerlands,
1,Battle at the Mummer's Ford,298,2,Joffrey/Tommen Baratheon,Robb Stark,Lannister,,,,Baratheon,...,1.0,0.0,,120.0,Gregor Clegane,Beric Dondarrion,1.0,Mummer's Ford,The Riverlands,
2,Battle of Riverrun,298,3,Joffrey/Tommen Baratheon,Robb Stark,Lannister,,,,Tully,...,0.0,1.0,15000.0,10000.0,"Jaime Lannister, Andros Brax","Edmure Tully, Tytos Blackwood",1.0,Riverrun,The Riverlands,
3,Battle of the Green Fork,298,4,Robb Stark,Joffrey/Tommen Baratheon,Stark,,,,Lannister,...,1.0,1.0,18000.0,20000.0,"Roose Bolton, Wylis Manderly, Medger Cerwyn, H...","Tywin Lannister, Gregor Clegane, Kevan Lannist...",1.0,Green Fork,The Riverlands,
4,Battle of the Whispering Wood,298,5,Robb Stark,Joffrey/Tommen Baratheon,Stark,Tully,,,Lannister,...,1.0,1.0,1875.0,6000.0,"Robb Stark, Brynden Tully",Jaime Lannister,1.0,Whispering Wood,The Riverlands,


## Explore raw properties

In [6]:
print("Number of attributes available in the dataset = {}".format(battles_df.shape[1]))

Number of attributes available in the dataset = 25


In [7]:
# View available columns and their data types
battles_df.dtypes

name                   object
year                    int64
battle_number           int64
attacker_king          object
defender_king          object
attacker_1             object
attacker_2             object
attacker_3             object
attacker_4             object
defender_1             object
defender_2             object
defender_3            float64
defender_4            float64
attacker_outcome       object
battle_type            object
major_death           float64
major_capture         float64
attacker_size         float64
defender_size         float64
attacker_commander     object
defender_commander     object
summer                float64
location               object
region                 object
note                   object
dtype: object

<h3 align="center">Battles for the Iron Throne</h3> 
![Throne](https://res.cloudinary.com/beamly/image/upload/s--FJg3Gevq--/c_fill,g_face,q_70,w_479/f_jpg/v1/tvbuzz/sites/7/2015/02/GameofThronesIronThrone.jpg)

In [8]:
# Analyze properties of numerical columns
battles_df.describe()

Unnamed: 0,year,battle_number,defender_3,defender_4,major_death,major_capture,attacker_size,defender_size,summer
count,38.0,38.0,0.0,0.0,37.0,37.0,24.0,19.0,37.0
mean,299.105263,19.5,,,0.351351,0.297297,9942.541667,6428.157895,0.702703
std,0.68928,11.113055,,,0.483978,0.463373,20283.092065,6225.182106,0.463373
min,298.0,1.0,,,0.0,0.0,20.0,100.0,0.0
25%,299.0,10.25,,,0.0,0.0,1375.0,1070.0,0.0
50%,299.0,19.5,,,0.0,0.0,4000.0,6000.0,1.0
75%,300.0,28.75,,,1.0,1.0,8250.0,10000.0,1.0
max,300.0,38.0,,,1.0,1.0,100000.0,20000.0,1.0


---

## Number of Battles Fought
This data is till **season 5** only

In [9]:
print("Number of battles fought={}".format(battles_df.shape[0]))

Number of battles fought=38


## Battle Distribution Across Years
The plot below shows that maximum bloodshed happened in the year 299 with a total of 20 battles fought!

In [10]:
battles_df.year.value_counts().iplot(kind='barh',
                                    xTitle='Number of Battles',
                                    yTitle='Year',
                                    title='Battle Distribution over Years',
                                    showline=True)

## Which Regions saw most Battles?
<img src="https://racefortheironthrone.files.wordpress.com/2016/11/riverlands-political-map.jpg?w=580&h=781" alt="RiverLands" style="width: 200px;" align="left"/> **Riverland**s seem to be the favorite battle ground followed by the famous **The North**. Interestingly, till season 5, there was only 1 battle beyond the wall. Spoiler Alert: Winter is Coming!

In [11]:
battles_df.region.value_counts().iplot(kind='bar',
                                    xTitle='Regions',
                                    yTitle='Number of Battles',
                                    title='Battles by Regions',
                                    showline=True)

### Death or Capture of Main Characters by Region

No prizes for guessing that Riverlands have seen some of the main characters being killed or captured. Though _The Reach_ has seen 2 battles, none of the major characters seemed to have fallen there.

In [12]:
battles_df.groupby('region').agg({'major_death':'sum',
                                  'major_capture':'sum'}).iplot(kind='bar')

## Who Attacked the most?
The Baratheon boys love attacking as they lead the pack with 38% while Rob Stark has been the attacker in close second with 27.8% of the battles.

<img src="http://vignette3.wikia.nocookie.net/gameofthrones/images/4/4c/JoffreyBaratheon-Profile.PNG/revision/latest?cb=20160626094917" alt="joffrey" style="width: 200px;" align="left"/>  <img src="https://meninblazers.com/.image/t_share/MTMwMDE5NTU4NTI5NDk1MDEw/tumblr_mkzsdafejy1r2xls3o1_400.png" alt="robb" style="width: 200px; height: 200px" align="right"/>

In [13]:
king_attacked = battles_df.attacker_king.value_counts().reset_index()
king_attacked.rename(columns={'index':'king','attacker_king':'battle_count'},inplace=True)
king_attacked.iplot(kind='pie',labels='king',values='battle_count')

## Who Defended the most?
Rob Stark and Baratheon boys are again on the top of the pack. Looks like they have been on either sides of the war lot many times.

In [14]:
king_defended = battles_df.defender_king.value_counts().reset_index()
king_defended.rename(columns={'index':'king','defender_king':'battle_count'},inplace=True)
king_defended.iplot(kind='pie',labels='king',values='battle_count')

## Battle Style Distribution
Plenty of battles all across, yet the men of Westeros and Essos are men of honor. 
This is visible in the distribution which shows **pitched battle** as the most common style of battle.

In [15]:
battles_df.battle_type.value_counts().iplot(kind='barh')

## Attack or Defend?
Defending your place in Westeros isn't easy, this is clearly visible from the fact that 32 out of 37 battles were won by attackers

In [16]:
battles_df.attacker_outcome.value_counts().iplot(kind='barh')

## Winners
Who remembers losers? (except if you love the Starks)
The following plot helps us understand who won how many battles and how, by attacking or defending.

In [17]:
attack_winners = battles_df[battles_df.attacker_outcome=='win']['attacker_king'].value_counts().reset_index()
attack_winners.rename(columns={'index':'king','attacker_king':'attack_wins'},inplace=True)

defend_winners = battles_df[battles_df.attacker_outcome=='loss']['defender_king'].value_counts().reset_index()
defend_winners.rename(columns={'index':'king','defender_king':'defend_wins'},inplace=True)

winner_df = pd.merge(attack_winners,defend_winners,how='outer',on='king')
winner_df.fillna(0,inplace=True)
winner_df['total_wins'] = winner_df.apply(lambda row: row['attack_wins']+row['defend_wins'],axis=1)
winner_df[['king','attack_wins','defend_wins']].set_index('king').iplot(kind='bar',barmode='stack',
                                                                        xTitle='King',
                                                                        yTitle='Number of Wins',
                                                                        title='Wins per King',
                                                                        showline=True)

## Battle Commanders
A battle requires as much brains as muscle power. 
The following is a distribution of the number of commanders involved on attacking and defending sides.

In [18]:
battles_df['attack_commander_count'] = battles_df.dropna(subset=['attacker_commander']).apply(lambda row: len(row['attacker_commander'].split()),axis=1)
battles_df['defend_commander_count'] = battles_df.dropna(subset=['defender_commander']).apply(lambda row: len(row['defender_commander'].split()),axis=1)

In [19]:
battles_df[['attack_commander_count',
            'defend_commander_count']].iplot(kind='box',boxpoints='suspectedoutliers')

## How many houses fought in a battle?
Were the battles evenly balanced? The plots tell the whole story.

<img src="https://c1.staticflickr.com/4/3893/14834104277_54d309b4ca_b.jpg" style="height: 200px;"/>

In [20]:
battles_df['attacker_house_count'] = (4 - battles_df[['attacker_1', 
                                                'attacker_2', 
                                                'attacker_3', 
                                                'attacker_4']].isnull().sum(axis = 1))

battles_df['defender_house_count'] = (4 - battles_df[['defender_1',
                                                'defender_2', 
                                                'defender_3', 
                                                'defender_4']].isnull().sum(axis = 1))

battles_df['total_involved_count'] = battles_df.apply(lambda row: row['attacker_house_count']+row['defender_house_count'],
                                                      axis=1)
battles_df['bubble_text'] = battles_df.apply(lambda row: '{} had {} house(s) attacking {} house(s) '.format(row['name'],
                                                                                                            row['attacker_house_count'],
                                                                                                            row['defender_house_count']),
                                             axis=1)

## Unbalanced Battles
Most battles so far have seen more houses forming alliances while attacking. 
There are only a few friends when you are under attack!

In [21]:
house_balance = battles_df[battles_df.attacker_house_count != battles_df.defender_house_count][['name',
                                                                                'attacker_house_count',
                                                                                'defender_house_count']].set_index('name')
house_balance.iplot(kind='bar',tickangle=-25)

## Battles and The size of Armies
Attackers don't take any chances, they come in huge numbers, keep your eyes open

In [22]:
battles_df.dropna(subset=['total_involved_count',
                          'attacker_size',
                          'defender_size',
                         'bubble_text']).iplot(kind='bubble', 
                                                  x='defender_size',
                                                  y='attacker_size',
                                                  size='total_involved_count',
                                                  text='bubble_text',
                                                  #color='red',
                                                  xTitle='Defender Size', 
                                                  yTitle='Attacker Size')

## Archenemies?
The Stark-Baratheon friendship has taken a complete U-turn with a total of 19 battles and counting. Indeed there is no one to be trusted in this land.

In [23]:
temp_df = battles_df.dropna(subset = ["attacker_king", 
                                      "defender_king"])[[
                                                    "attacker_king", 
                                                    "defender_king"
                                                        ]]

archenemy_df = pd.DataFrame(list(Counter([tuple(set(king_pair)) 
                                          for king_pair in temp_df.values 
                                          if len(set(king_pair))>1]).items()),
                              columns=['king_pair','battle_count'])

archenemy_df['versus_text'] = archenemy_df.apply(lambda row:
                                                 '{} Vs {}'.format(
                                                     row['king_pair'][0], 
                                                     row['king_pair'][1]),
                                                 axis=1)
archenemy_df.sort_values('battle_count',
                         inplace=True,
                         ascending=False)

In [24]:
archenemy_df[['versus_text',
              'battle_count']].set_index('versus_text').iplot(
                                                            kind='bar')

---
Note: A lot more exploration is possible with the remaining attributes and their different combinations. This is just tip of the iceberg