# Tragic Fate of MS Estonia

![Credit: Estonian Maritime Museum](https://live.staticflickr.com/1774/43124756545_7a47756351_b.jpg)

## What, How, When?

On September 27, 1994 the ferry Estonia set sail on a night voyage across the Baltic Sea from the port of Tallin in Estonia to Stockholm. She departed at 19.00 carrying 989 passengers and crew, as well as vehicles, and was due to dock at 09.30 the following morning, Tragically, the Estonia never arrived.

An official inquiry found that failure of the locks on the bow visor, which broke away under the punishing waves, caused water to flood the car deck and quickly capsize the ship. The report also noted a lack of action, delay in sounding the alarm, lack of guidance from the bridge and a failure to light distress flares.

## How things went down that night?

![Credit: Hamburg University of Technology](https://cdn.prod.www.spiegel.de/images/d0eed47d-0001-0005-0000-000001064832.gif)

The strong maneuver which was supposed to counteract, instead made the situation even worse leading to the eventual sinking of the ship.
Winds and Waves also played their part.

## Documentry on that Night

[![Sinking of MS Estonia-Documentry](https://img.youtube.com/vi/eFDGL_ehpkI/0.jpg)](https://www.youtube.com/watch?v=eFDGL_ehpkI "Sinking of MS Estonia-Documentry")

(Clicking on Image opens the video on new tab)

<strong><span style="color:red">If you like my work, please don't forget to upvote this notebook!</span></strong>

<strong><span style="color:blue"> If you don't, atleast leave a comment on what should I do to improve it!</span></strong>

In [None]:
# Install DABL (it's a secret tool that'll help us later 😉)
! pip install -q dabl
! pip install -q country_converter

In [None]:
# Import some basic libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import plotly
import plotly.express as px
import plotly.graph_objs as go
import plotly.figure_factory as ff

import country_converter as coco

import dabl

In [None]:
# Import the data
data = pd.read_csv("../input/passenger-list-for-the-estonia-ferry-disaster/estonia-passenger-list.csv")
data.head()

In [None]:
# Drop the PassengerId column, since it's not a usefull feature
data = data.drop(['PassengerId'], axis=1)

In [None]:
# Check Null values in the dataset
data.isna().sum()

Cool! So the data has no null values. That's a good news now that we don't need to drop any values as the data is already too small.

## What Country are these Passengers from?
Let's look at different passengers countries.

In [None]:
data['Country'].value_counts()

In [None]:
# Pie Chart of the different countries
vals = list(data['Country'].unique())
values, labels = [], []
for val in vals:
    values.append(len(data[data['Country']==val]))
    labels.append(val)

fig = px.pie(
    names=labels,
    values=values,
    title="Passenger Country Distribution",
    color_discrete_sequence=px.colors.sequential.RdBu,
)
fig.show()

In [None]:
# Also a bar chart for the same
fig = px.bar(
    x=labels,
    y=values,
    title="Passengers by Country",
    labels={
        'x': 'Country',
        'y': 'Passenger Count'
    },
    color=values
)
fig.show()

To plot countries on the map, we need to convert their names to corresponding country codes and then make a new data frame with regrouped data.

In [None]:
# Convert country name to ISO3 country code
con_cod = [coco.convert(x, to='ISO3') for x in data['Country']]

# Append the country codes to the orignal dataframe
data['code'] = con_cod

# Get the country codes and corresponding number of 
country_codes = list(dict(data['code'].value_counts()).keys())
country_pass = list(data['code'].value_counts())

# Make a new dataframe based on number of passengers based on each country
country_df = pd.DataFrame()
country_df['code'] = country_codes
country_df['passengers'] = country_pass

# View the new dataframe
country_df.head()

In [None]:
# Now let's plot it!
fig = px.choropleth(country_df, locations="code",
                    color="passengers",
                    hover_name="code",
                    color_continuous_scale=px.colors.sequential.Jet,
                    title="Passengers Country Distribution"
                   )
fig.show()

## What are top-10 Last Names?
Let's look at the top-10 last names of the passengers.

In [None]:
data['Lastname'].value_counts()[:10]

In [None]:
# Bar Chart for top-10 last names
values = data['Lastname'].value_counts().tolist()[:10]
names = list(dict(data['Lastname'].value_counts()).keys())[:10]

fig = px.bar(
    x=names,
    y=values,
    title="10 Most Popular Last Names",
    color=values,  
)

fig.show()

## What is Gender statistic of the data?

In [None]:
target = [data[data['Sex']=='M'].count().max(), data[data['Sex']=='F'].count().max()]
names = ['Male', 'Female']

fig = px.pie(
    names=names,
    values=target,
    hole=0.3,
    title="Gender Distribution among Passengers",
    color_discrete_sequence=['Blue', 'Magenta']
)
fig.show()

As we can see, the ages are distributed almost equally which won't create any problems when modelling.

## What about the Age Statistic of the Populace?
Let's see the age statistic of the passengers.

In [None]:
# Get Male and Female Ages in a List
male_ages = data[data['Sex'] == 'M']['Age'].tolist()
female_ages = data[data['Sex'] == 'F']['Age'].tolist()

fig = ff.create_distplot(
    hist_data=[male_ages, female_ages],
    group_labels=['Male', 'Female'],
    colors=['#1500ff', '#ff00e1'],
    show_hist=False,
    show_rug=False,
)

fig.layout.update({'title':f'Age Distribution of both Genders<br>[Average Age: {np.mean(male_ages+female_ages):.2f} years]'})

fig.show()

### Density Plot between Age and Sex
Let's Look at a Density Plot between Age and Sex.

Note: *We have to encode the Genders (**1 for Male, 0 for Female**) to make it suitable for plotting*

In [None]:
fig = ff.create_2d_density(x=data['Age'], 
                           y=data['Sex'].apply(lambda x: 1 if x=='M' else 0),
                           title="Age-Sex Density Plot",
                           colorscale=['#7A4579', '#D56073', 'rgb(236,158,105)', (1, 1, 0.2), (0.98,0.98,0.98)])
fig.show()

## Passengers v/s Crew members
Let's see how many of the present people were passengers and how many were crew members?

In [None]:
target = [data[data['Category']=='P'].count().max(), data[data['Category']=='C'].count().max()]
names = ['Passengers', 'Crew Members']

fig = px.pie(
    names=names,
    values=target,
    hole=0.5,
    title="Crew members vs Passengers",
    color_discrete_sequence=['Red', 'Blue']
)
fig.show()

## What about Survival? 
Let's look at statistics of Survival and also it's correlation with other features

In [None]:
target = [data[data['Survived']==0].count().max(), data[data['Survived']==1].count().max()]
names = ['Did Not Survive', 'Survived']

fig = px.pie(
    names=names,
    values=target,
    hole=0.5,
    title="How many Survived?",
    color_discrete_sequence=['Black', 'Green']
)
fig.show()

As we can see, infortunately `86%` of the members (including both Crew and Passengers) did not survive the sinking of MS Estonia.

## Did more Crew members survive than passengers?
Let's see if %-age of crew members survived is greater than the passengers.

In [None]:
# See How many crew members survived
target_c = [data[(data['Survived'] == 0)&(data['Category']=='C')].count().max(), data[(data['Survived'] == 1)&(data['Category']=='C')].count().max()]
names_c = ['Did Not Survive - Crew', 'Survived - Crew']

target_p = [data[(data['Survived'] == 0)&(data['Category']=='P')].count().max(), data[(data['Survived'] == 1)&(data['Category']=='P')].count().max()]
names_p = ['Did Not Survive - Passenger', 'Survived - Passenger']

fig = plotly.subplots.make_subplots(rows=1, cols=2, specs=[[{'type':'domain'}, {'type':'domain'}]])

fig.add_trace(go.Pie(
    labels=names_c,
    values=target_c,
    hole=0.6,
    title="Crew Members",
), 1,1)

fig.add_trace(go.Pie(
    labels=names_p,
    values=target_p,
    hole=0.6,
    title="Passengers",
), 1,2)

fig.update_layout(title_text="Crew v/s Passenger Survival")

fig.show()

We can see, less %age of Passengers have survived than crew members.

## Which Countries do Passenger/Crew Members belong?
Let's see where does our Passengers and Crew come from?

In [None]:
fig = px.bar(
    data_frame=data,
    x='Country',
    y='Survived',
    color='Category',
    title="Populace: Country and survival [P: Passenger | C: Crew Member]",
    color_discrete_sequence=['Cyan', 'Blue']
)
fig.show()

It seems only 2 people from Russia were present in the ship, and both of them were crew members.

## DABL Plot
Now we transfer the command of Data Viz. to DABL

In [None]:
# First Drop the Country Code Column as it's redundant since we have country names
data = data.drop(['code'], axis=1)

# Plot!
dabl.plot(data, target_col='Survived')