#### Titanic: Machine Learning from Disaster
This Kaggle competition is about predicting the survivors of Titanic from their socio-economic or demographic data. The team will analyze and understand the data prior to developing predictive models later submit results on Kaggle and compete with other Kagglers. Code will be shared on GitHub repositories also. <br/>

Access to competition via <https://www.kaggle.com/c/titanic/overview>

### Data Import
Kaggle API is used to download and access to the data. Same API will be used later to commit notebook files on GitHub and Kaggle also.<br />
<b/>See Kaggle API Documentay at https://github.com/Kaggle/kaggle-api

In [7]:
!kaggle competitions download -c titanic

Downloading titanic.zip to C:\Users\aaydogan.SELCO\Titanic Kaggle




  0%|          | 0.00/34.1k [00:00<?, ?B/s]
100%|##########| 34.1k/34.1k [00:00<00:00, 1.06MB/s]


  * Files downloaded as .zip files, code below will extract them in the directory. <br />
  train.csv, test.csv and sample submission file gender_submission.csv files are now ready to use.

In [14]:
import os
from zipfile import ZipFile
with ZipFile('titanic.zip','r') as ZipObj:
    ZipObj.extractall()
    
files = os.listdir()
print(files)

['.ipynb_checkpoints', 'gender_submission.csv', 'Kaggle - Titanic Surviver Prediction.ipynb', 'test.csv', 'titanic.zip', 'train.csv']


#### Library Imports 

In [16]:
import pandas as pd
import plotly as plt
import numpy as np
import scipy as sci

## 1. Checking Data

### Data Dictionary

<strong>Variable /	Definition	  /                                Key</strong><br/>
survival /	Survival	/                                  0 = No, 1 = Yes<br/>
pclass	 /   Ticket class /	                              1 = 1st, 2 = 2nd, 3 = 3rd<br/>
sex	      /  Sex	             /                             -<br/>
Age	    /    Age in years	    /                          -<br/>
sibsp	 /   # of siblings - spouses aboard the Titanic	/  -<br/>
parch	 /   # of parents - children aboard the Titanic	 / -<br/>
ticket	 /   Ticket number	            /                  - <br/>
fare	 /   Passenger fare	            /                  -<br/>
cabin	 /   Cabin number	           /                   -<br/>
embarked /	Port of Embarkation	       /                   C = Cherbourg, Q = Queenstown, S = Southampton<br/>

#### Variable Notes
<strong>pclass:</strong> A proxy for socio-economic status (SES)<br/>
1st = Upper<br/>
2nd = Middle<br/>
3rd = Lower<br/>

<strong>age</strong>: Age is fractional if less than 1. If the age is estimated, is it in the form of xx.5

<strong>sibsp</strong>: The dataset defines family relations in this way...<br/>
Sibling = brother, sister, stepbrother, stepsister<br/>
Spouse = husband, wife (mistresses and fiancés were ignored)<br/>

<strong>parch</strong>: The dataset defines family relations in this way...<br/>
Parent = mother, father<br/>
Child = daughter, son, stepdaughter, stepson<br/>
Some children travelled only with a nanny, therefore parch=0 for them.<br/>

In [37]:
df = pd.read_csv('train.csv')

cabin_G6 = df[df.Name.str.contains('Johnson, Mrs. Oscar W')]
cabin_G6.head(50)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
8,9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27.0,0,2,347742,11.1333,,S


In [21]:
df.describe(include='all')

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
count,891.0,891.0,891.0,891,891,714.0,891.0,891.0,891.0,891.0,204,889
unique,,,,891,2,,,,681.0,,147,3
top,,,,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",male,,,,1601.0,,G6,S
freq,,,,1,577,,,,7.0,,4,644
mean,446.0,0.383838,2.308642,,,29.699118,0.523008,0.381594,,32.204208,,
std,257.353842,0.486592,0.836071,,,14.526497,1.102743,0.806057,,49.693429,,
min,1.0,0.0,1.0,,,0.42,0.0,0.0,,0.0,,
25%,223.5,0.0,2.0,,,20.125,0.0,0.0,,7.9104,,
50%,446.0,0.0,3.0,,,28.0,0.0,0.0,,14.4542,,
75%,668.5,1.0,3.0,,,38.0,1.0,0.0,,31.0,,
