<a href="https://colab.research.google.com/github/ccal2/dataScienceProject/blob/master/project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introdução


Esse projeto foi desenvolvido utilizando dois arquivos `.csv` que contêm dados relacionados aos livros da série *Game of Thrones* (A Guerra dos Tronos).

Os arquivos foram baixados diretamente de um dataset do [kaggle](https://www.kaggle.com/mylesoneill/game-of-thrones).


No arquivo `battles.csv` temos informações de várias batalhas que ocorreram durante a história e em `character-deaths.csv` podemos ver uma lista de personagens com dados relacionados à suas mortes.

# Setup

**Lembre-se de dar upload dos arquivos `battles.csv` e `character-deaths.csv`.**

In [1]:
import pandas as pd
import numpy as np

In [2]:
battles = pd.read_csv('battles.csv')
deaths = pd.read_csv('character-deaths.csv')

# Pré-processamento

## Battles (batalhas)

In [3]:
battles.head()

Unnamed: 0,name,year,battle_number,attacker_king,defender_king,attacker_1,attacker_2,attacker_3,attacker_4,defender_1,defender_2,defender_3,defender_4,attacker_outcome,battle_type,major_death,major_capture,attacker_size,defender_size,attacker_commander,defender_commander,summer,location,region,note
0,Battle of the Golden Tooth,298,1,Joffrey/Tommen Baratheon,Robb Stark,Lannister,,,,Tully,,,,win,pitched battle,1.0,0.0,15000.0,4000.0,Jaime Lannister,"Clement Piper, Vance",1.0,Golden Tooth,The Westerlands,
1,Battle at the Mummer's Ford,298,2,Joffrey/Tommen Baratheon,Robb Stark,Lannister,,,,Baratheon,,,,win,ambush,1.0,0.0,,120.0,Gregor Clegane,Beric Dondarrion,1.0,Mummer's Ford,The Riverlands,
2,Battle of Riverrun,298,3,Joffrey/Tommen Baratheon,Robb Stark,Lannister,,,,Tully,,,,win,pitched battle,0.0,1.0,15000.0,10000.0,"Jaime Lannister, Andros Brax","Edmure Tully, Tytos Blackwood",1.0,Riverrun,The Riverlands,
3,Battle of the Green Fork,298,4,Robb Stark,Joffrey/Tommen Baratheon,Stark,,,,Lannister,,,,loss,pitched battle,1.0,1.0,18000.0,20000.0,"Roose Bolton, Wylis Manderly, Medger Cerwyn, H...","Tywin Lannister, Gregor Clegane, Kevan Lannist...",1.0,Green Fork,The Riverlands,
4,Battle of the Whispering Wood,298,5,Robb Stark,Joffrey/Tommen Baratheon,Stark,Tully,,,Lannister,,,,win,ambush,1.0,1.0,1875.0,6000.0,"Robb Stark, Brynden Tully",Jaime Lannister,1.0,Whispering Wood,The Riverlands,


In [4]:
# (linhas, colunas)
battles.shape

(38, 25)

### Redefinição do *index*

A coluna `battle_number` é um ID único atribuído a cada batalha. Nós podemos confirmar essa informação comparando a quantidade de linhas no *data frame* (38) com a quantidade de valores únicos dessa coluna:

In [5]:
battles['battle_number'].nunique()

38

Já que temos um ID único, podemos substituir o *index* automático por ele:

In [6]:
battles.set_index('battle_number', inplace=True)
battles.head()

Unnamed: 0_level_0,name,year,attacker_king,defender_king,attacker_1,attacker_2,attacker_3,attacker_4,defender_1,defender_2,defender_3,defender_4,attacker_outcome,battle_type,major_death,major_capture,attacker_size,defender_size,attacker_commander,defender_commander,summer,location,region,note
battle_number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1
1,Battle of the Golden Tooth,298,Joffrey/Tommen Baratheon,Robb Stark,Lannister,,,,Tully,,,,win,pitched battle,1.0,0.0,15000.0,4000.0,Jaime Lannister,"Clement Piper, Vance",1.0,Golden Tooth,The Westerlands,
2,Battle at the Mummer's Ford,298,Joffrey/Tommen Baratheon,Robb Stark,Lannister,,,,Baratheon,,,,win,ambush,1.0,0.0,,120.0,Gregor Clegane,Beric Dondarrion,1.0,Mummer's Ford,The Riverlands,
3,Battle of Riverrun,298,Joffrey/Tommen Baratheon,Robb Stark,Lannister,,,,Tully,,,,win,pitched battle,0.0,1.0,15000.0,10000.0,"Jaime Lannister, Andros Brax","Edmure Tully, Tytos Blackwood",1.0,Riverrun,The Riverlands,
4,Battle of the Green Fork,298,Robb Stark,Joffrey/Tommen Baratheon,Stark,,,,Lannister,,,,loss,pitched battle,1.0,1.0,18000.0,20000.0,"Roose Bolton, Wylis Manderly, Medger Cerwyn, H...","Tywin Lannister, Gregor Clegane, Kevan Lannist...",1.0,Green Fork,The Riverlands,
5,Battle of the Whispering Wood,298,Robb Stark,Joffrey/Tommen Baratheon,Stark,Tully,,,Lannister,,,,win,ambush,1.0,1.0,1875.0,6000.0,"Robb Stark, Brynden Tully",Jaime Lannister,1.0,Whispering Wood,The Riverlands,


### Expansão de `attacker_commander` e `defender_commander`

As colunas `attacker_commander` e `defender_commander` possuem vários nomes separados por vírgulas. Vamos expandir essas colunas para ficar com um nome por coluna:

In [7]:
# separar nomes de 'attacker_commander'
split_attacker_commanders = battles["attacker_commander"].str.split(",", expand=True)
split_attacker_commanders.head()

Unnamed: 0_level_0,0,1,2,3,4,5
battle_number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,Jaime Lannister,,,,,
2,Gregor Clegane,,,,,
3,Jaime Lannister,Andros Brax,,,,
4,Roose Bolton,Wylis Manderly,Medger Cerwyn,Harrion Karstark,Halys Hornwood,
5,Robb Stark,Brynden Tully,,,,


In [8]:
# separar nomes de 'defender_commander'
split_defender_commanders = battles["defender_commander"].str.split(",", expand=True)
split_defender_commanders.head()

Unnamed: 0_level_0,0,1,2,3,4,5,6
battle_number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,Clement Piper,Vance,,,,,
2,Beric Dondarrion,,,,,,
3,Edmure Tully,Tytos Blackwood,,,,,
4,Tywin Lannister,Gregor Clegane,Kevan Lannister,Addam Marbrand,,,
5,Jaime Lannister,,,,,,


In [9]:
# remover colunas 'attacker_commander' e 'defender_commander'
battles = battles.drop(columns=['attacker_commander', 'defender_commander'])

# adicionar novas colunas de 'attacker_commander_x'
battles['attacker_commander_1'] = split_attacker_commanders[0]
battles['attacker_commander_2'] = split_attacker_commanders[1]
battles['attacker_commander_3'] = split_attacker_commanders[2]
battles['attacker_commander_4'] = split_attacker_commanders[3]
battles['attacker_commander_5'] = split_attacker_commanders[4]
battles['attacker_commander_6'] = split_attacker_commanders[5]

# adicionar novas colunas de 'defender_commander_x'
battles['defender_commander_1'] = split_defender_commanders[0]
battles['defender_commander_2'] = split_defender_commanders[1]
battles['defender_commander_3'] = split_defender_commanders[2]
battles['defender_commander_4'] = split_defender_commanders[3]
battles['defender_commander_5'] = split_defender_commanders[4]
battles['defender_commander_6'] = split_defender_commanders[5]
battles['defender_commander_7'] = split_defender_commanders[6]

battles.head()

Unnamed: 0_level_0,name,year,attacker_king,defender_king,attacker_1,attacker_2,attacker_3,attacker_4,defender_1,defender_2,defender_3,defender_4,attacker_outcome,battle_type,major_death,major_capture,attacker_size,defender_size,summer,location,region,note,attacker_commander_1,attacker_commander_2,attacker_commander_3,attacker_commander_4,attacker_commander_5,attacker_commander_6,defender_commander_1,defender_commander_2,defender_commander_3,defender_commander_4,defender_commander_5,defender_commander_6,defender_commander_7
battle_number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1
1,Battle of the Golden Tooth,298,Joffrey/Tommen Baratheon,Robb Stark,Lannister,,,,Tully,,,,win,pitched battle,1.0,0.0,15000.0,4000.0,1.0,Golden Tooth,The Westerlands,,Jaime Lannister,,,,,,Clement Piper,Vance,,,,,
2,Battle at the Mummer's Ford,298,Joffrey/Tommen Baratheon,Robb Stark,Lannister,,,,Baratheon,,,,win,ambush,1.0,0.0,,120.0,1.0,Mummer's Ford,The Riverlands,,Gregor Clegane,,,,,,Beric Dondarrion,,,,,,
3,Battle of Riverrun,298,Joffrey/Tommen Baratheon,Robb Stark,Lannister,,,,Tully,,,,win,pitched battle,0.0,1.0,15000.0,10000.0,1.0,Riverrun,The Riverlands,,Jaime Lannister,Andros Brax,,,,,Edmure Tully,Tytos Blackwood,,,,,
4,Battle of the Green Fork,298,Robb Stark,Joffrey/Tommen Baratheon,Stark,,,,Lannister,,,,loss,pitched battle,1.0,1.0,18000.0,20000.0,1.0,Green Fork,The Riverlands,,Roose Bolton,Wylis Manderly,Medger Cerwyn,Harrion Karstark,Halys Hornwood,,Tywin Lannister,Gregor Clegane,Kevan Lannister,Addam Marbrand,,,
5,Battle of the Whispering Wood,298,Robb Stark,Joffrey/Tommen Baratheon,Stark,Tully,,,Lannister,,,,win,ambush,1.0,1.0,1875.0,6000.0,1.0,Whispering Wood,The Riverlands,,Robb Stark,Brynden Tully,,,,,Jaime Lannister,,,,,,


### Definição de tipos

In [10]:
battles.dtypes

name                     object
year                      int64
attacker_king            object
defender_king            object
attacker_1               object
attacker_2               object
attacker_3               object
attacker_4               object
defender_1               object
defender_2               object
defender_3              float64
defender_4              float64
attacker_outcome         object
battle_type              object
major_death             float64
major_capture           float64
attacker_size           float64
defender_size           float64
summer                  float64
location                 object
region                   object
note                     object
attacker_commander_1     object
attacker_commander_2     object
attacker_commander_3     object
attacker_commander_4     object
attacker_commander_5     object
attacker_commander_6     object
defender_commander_1     object
defender_commander_2     object
defender_commander_3     object
defender