## Some Background Information


**The sinking of the RMS Titanic in the early morning of 15 April 1912, four days into the ship's maiden voyage from Southampton to New York City, was one of the deadliest peacetime maritime disasters in history, killing more than 1,500 people. The largest passenger liner in service at the time, Titanic had an estimated 2,224 people on board when she struck an iceberg in the North Atlantic. The ship had received six warnings of sea ice but was travelling at near maximum speed when the lookouts sighted the iceberg. Unable to turn quickly enough, the ship suffered a glancing blow that buckled the starboard (right) side and opened five of sixteen compartments to the sea. The disaster caused widespread outrage over the lack of lifeboats, lax regulations, and the unequal treatment of the three passenger classes during the evacuation. Inquiries recommended sweeping changes to maritime regulations, leading to the International Convention for the Safety of Life at Sea (1914), which continues to govern maritime safety.**  
*from Wikipedia*

**Imports**

In [1]:
import pandas as pd

In [4]:
df = pd.read_csv("titanic_data.csv")

## Exploratory Data Analysis

In [5]:
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


**Question**

**How many passengers are in the dataset? Are there any missing values in the dataset?**

In [7]:
#TODO
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
PassengerId    891 non-null int64
Survived       891 non-null int64
Pclass         891 non-null int64
Name           891 non-null object
Sex            891 non-null object
Age            714 non-null float64
SibSp          891 non-null int64
Parch          891 non-null int64
Ticket         891 non-null object
Fare           891 non-null float64
Cabin          204 non-null object
Embarked       889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.6+ KB


**Question**

**Of all passengers in df, how many survived?**

In [12]:
#TODO
p_survived = df.Survived.sum()

In [13]:
p_number = df.Survived.count()

In [14]:
print(f"% of passengers survived : {p_survived/p_number}")

% of passengers survived : 0.3838383838383838


**Question**

**Sex: how likely are Female and Male passengers of surviving?**

In [15]:
#TODO
df.groupby(['Survived','Sex'])['Survived'].count()

Survived  Sex   
0         female     81
          male      468
1         female    233
          male      109
Name: Survived, dtype: int64

In [17]:
f_survived = df[df.Sex == 'female'].Survived.sum()
m_survived = df[df.Sex == 'male'].Survived.sum()
f_number = df[df.Sex == 'female'].Survived.count()
m_number = df[df.Sex == 'male'].Survived.count()

In [18]:
print(f"% of women that survived is : {f_survived/f_number}")
print(f"% of man that survived is : {m_survived/m_number}")

% of women that survived is : 0.7420382165605095
% of man that survived is : 0.18890814558058924


**Question**

**Class: how likely are passengers of surviving accordingly to their class?**

In [21]:
#TODO
pd.crosstab(df.Pclass, df.Survived, margins = True)

Survived,0,1,All
Pclass,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,80,136,216
2,97,87,184
3,372,119,491
All,549,342,891


**Question**

**How lileky are passengers to survive based on their Class and Sex?**

In [23]:
#TODO
pd.crosstab([df.Sex, df.Survived],df.Pclass, margins = True)

Unnamed: 0_level_0,Pclass,1,2,3,All
Sex,Survived,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
female,0.0,3,6,72,81
female,1.0,91,70,72,233
male,0.0,77,91,300,468
male,1.0,45,17,47,109
All,,216,184,491,891


**Question**

**How lileky are passengers to survive based on the Embarked variable?**

In [25]:
#TODO
pd.crosstab(df.Survived,df.Embarked, margins = True)

Embarked,C,Q,S,All
Survived,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,75,47,427,549
1,93,30,217,340
All,168,77,644,889


**Question**

**How likely are passengers to survive based on the variables Embarked and Sex?**

In [26]:
#TODO
pd.crosstab([df.Sex, df.Survived],df.Embarked, margins = True)

Unnamed: 0_level_0,Embarked,C,Q,S,All
Sex,Survived,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
female,0.0,9,9,63,81
female,1.0,64,27,140,231
male,0.0,66,38,364,468
male,1.0,29,3,77,109
All,,168,77,644,889


**Question**

**How likely are passengers to survive based on the variables Embarked, Pclass and Sex**

In [27]:
#TODO
pd.crosstab([df.Sex, df.Survived, df.Embarked],df.Pclass, margins = True)

Unnamed: 0_level_0,Unnamed: 1_level_0,Pclass,1,2,3,All
Sex,Survived,Embarked,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
female,0.0,C,1,0,8,9
female,0.0,Q,0,0,9,9
female,0.0,S,2,6,55,63
female,1.0,C,42,7,15,64
female,1.0,Q,1,2,24,27
female,1.0,S,46,61,33,140
male,0.0,C,25,8,33,66
male,0.0,Q,1,1,36,38
male,0.0,S,51,82,231,364
male,1.0,C,17,2,10,29
