# Titanic

## The route of the Titanic
![](../images/route.png)

### The night of 14 April 1912
![](../images/iceberg_titanic.jpg)

## <a href="https://en.wikipedia.org/wiki/David_Blair_(mariner)">David Blair</a> (mariner)

Blair’s daughter eventually donated the key to the International Sailors Society, and it was [auctioned off](http://news.bbc.co.uk/2/hi/uk_news/northern_ireland/7008300.stm) in 2007, fetching £90,000 (about $130,000)

![](../images/key_to_binoculars.jpg)

## [Olympic](https://en.wikipedia.org/wiki/RMS_Olympic), [Titanic](https://en.wikipedia.org/wiki/RMS_Titanic), [Britanic](https://en.wikipedia.org/wiki/HMHS_Britannic) (Gigantic)
![](../images/olympic_titanic_gigantic.jpg)

## Hawke collision (20 September 1911)
![](../images/olympic_hawke.jpg)
![](../images/riddle_titanic.jpg)

## [JP Morgan](https://www.youtube.com/watch?v=Z-Wd-hJw8nk)
![](../images/jp_morgan.png)

[Did the Titanic Really Sink or was it Olympic?](https://social.shorthand.com/TitanicMystery/jCPyIbzzPVc/did-the-titanic-really-sink-or-was-it-olympic)

There're about **1300 passangers** and **900 crew** in summary were around **2200** people and only [**20 lifeboats**](https://en.wikipedia.org/wiki/Lifeboats_of_the_RMS_Titanic) (1 lifeboat for 65 passengers).
That means, that by definition at least a half of the people from Titanic will not be survived.

![](../images/life_boats.jpg)
![](../images/lifeboats.png)

![](../images/sinking_animation.gif)

## [Titanic Memorial](https://en.wikipedia.org/wiki/Titanic_Memorial_(Washington,_D.C.))
![](../images/man.jpg)
![](../images/man_memorial.png)

In [39]:
df_train = pd.read_csv('../input/train.csv')
df_test = pd.read_csv('../input/test.csv')

print("train", df_train.shape)
print("test", df_test.shape)

train (891, 12)
test (418, 11)


## Total passengers

In [37]:
df_train.shape[0] + df_test.shape[0]

1309

In [38]:
df_train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
PassengerId    891 non-null int64
Survived       891 non-null int64
Pclass         891 non-null int64
Name           891 non-null object
Sex            891 non-null object
Age            714 non-null float64
SibSp          891 non-null int64
Parch          891 non-null int64
Ticket         891 non-null object
Fare           891 non-null float64
Cabin          204 non-null object
Embarked       889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.6+ KB


In [30]:
df_train.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


## Features

- **PassengerId** - A numerical id assigned to each passenger
- **Survived** - Whether the passenger survived (1), or didn't (0). **Target variable**.
- **Pclass** - Ticket class (1st = Upper, 2nd = Middle, 3rd = Lower)
- **Name** - The name of the passenger
- **Sex** - The gender of the passenger (male or female)
- **Age** - The age of the passenger
- **SibSp** - # of siblings (brother, sister, stepbrother, stepsister) / spouses (husband, wife [mistresses and fiancés were ignored]) aboard the Titanic	
- **Parch** - # of parents (mother, father) / children (daughter, son, stepdaughter, stepson) aboard the Titanic. *Note*: Some children travelled only with a nanny, therefore parch=0 for them..
- **Ticket** - The ticket number of the passenger
- **Fare** - How much the passenger paid for the ticker
- **Cabin** - Cabin number
- **Embarked** - Where the passenger boarded the Titanic Port of Embarkation (C = Cherbourg, Q = Queenstown, S = Southampton)

## Pclass & Sex => Survived

In [32]:
print(df_train.groupby(["Pclass", "Sex"])["Survived"].value_counts(normalize=True))

Pclass  Sex     Survived
1       female  1           0.968085
                0           0.031915
        male    0           0.631148
                1           0.368852
2       female  1           0.921053
                0           0.078947
        male    0           0.842593
                1           0.157407
3       female  0           0.500000
                1           0.500000
        male    0           0.864553
                1           0.135447
Name: Survived, dtype: float64


### [Isidor](https://en.wikipedia.org/wiki/Isidor_Straus) and [Ida](https://en.wikipedia.org/wiki/Ida_Straus) Straus
who refused to board a lifeboat while there were younger people still waiting to board

![](../images/Rosalie_Ida_Blun_Straus.jpg)

## Train vs Test

In [28]:
describe_fields = ["Age", "Fare", "Pclass", "SibSp", "Parch"]

print("Train: males")
print(df_train[df_train.Sex == "male"][describe_fields].describe())

print("Test: males")
print(df_test[df_test.Sex == "male"][describe_fields].describe())

print("Train: females")
print(df_train[df_train.Sex == "female"][describe_fields].describe())

print("Test: females")
print(df_test[df_test.Sex == "female"][describe_fields].describe())

Train: males
              Age        Fare      Pclass       SibSp       Parch
count  453.000000  577.000000  577.000000  577.000000  577.000000
mean    30.726645   25.523893    2.389948    0.429809    0.235702
std     14.678201   43.138263    0.813580    1.061811    0.612294
min      0.420000    0.000000    1.000000    0.000000    0.000000
25%     21.000000    7.895800    2.000000    0.000000    0.000000
50%     29.000000   10.500000    3.000000    0.000000    0.000000
75%     39.000000   26.550000    3.000000    0.000000    0.000000
max     80.000000  512.329200    3.000000    8.000000    5.000000
Test: males
              Age        Fare      Pclass       SibSp       Parch
count  205.000000  265.000000  266.000000  266.000000  266.000000
mean    30.272732   27.527877    2.334586    0.379699    0.274436
std     13.389528   41.079423    0.808497    0.843735    0.883745
min      0.330000    0.000000    1.000000    0.000000    0.000000
25%     22.000000    7.854200    2.000000    0.0000