# General instructions

- Install the package `Pandas` in your course-specific virtual environment, if you have not done so already.
- Store the data file `titanic.csv` either in the same directory as the current Jupyter notebook, or in a subdirectory named `data`.
- Read in the data set 'titanic.csv' as a Pandas DataFrame.
- Answer the questions below, and other questions that may arise in the process.

Note that the source of the data set is [Encyclopedia Titanica](https://www.encyclopedia-titanica.org/). Several preprocessing steps have been carried out on the raw dataset. Note that there is also a (less complete) variant of the data set available via the Python package `seaborn`.

There may be occasional problems or errors in the data (unplausible, wrong, ...). If you find some error or strange anomaly, then please give me a hint, so that I can further curate the dataset for the future.


# 1. Traveler type


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [3]:
df = pd.read_csv('titanic.csv')

In [42]:
df.columns

Index(['first_name', 'family_name', 'type', 'ticket_number', 'pclass',
       'departure', 'price', 'survived', 'lifeboat', 'body_no', 'nationality',
       'gender', 'age', 'marital_status', 'number_relatives_onboard',
       'home_location', 'destination', 'age_at_death', 'occupation',
       'department', 'works_for'],
      dtype='object')

In [97]:
df.gender

434     Male
221     Male
1132    Male
2428    Male
1       Male
        ... 
143     Male
505     Male
659     Male
946     Male
60       NaN
Name: gender, Length: 2474, dtype: object

How many crew members and how many passengers are recorded in the dataset?


In [40]:
members = df.groupby('type').count()
members

Unnamed: 0_level_0,first_name,family_name,ticket_number,pclass,departure,price,survived,lifeboat,body_no,nationality,gender,age,marital_status,number_relatives_onboard,home_location,destination,age_at_death,occupation,department,works_for
type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
Crew,1122,1122,0,0,1122,0,1122,172,0,1112,1122,1120,929,1122,906,0,691,1122,974,1122
Passenger,1343,1351,1346,1351,1351,1324,1352,344,0,1333,1350,1325,920,1352,1216,909,1051,728,0,11


In [43]:
df

Unnamed: 0,first_name,family_name,type,ticket_number,pclass,departure,price,survived,lifeboat,body_no,...,gender,age,marital_status,number_relatives_onboard,home_location,destination,age_at_death,occupation,department,works_for
434,Ernest Owen,Abbott,Crew,,,Southampton,,False,,,...,Male,21.0,Single,1,"Southampton, Hampshire, England",,21.0,Lounge Pantry Steward,Victualling Crew,White Star Line
221,William Thomas,Abrams,Crew,,,Southampton,,False,,,...,Male,34.0,Married,1,"Southampton, Hampshire, England",,,Fireman,Engineering Crew,White Star Line
1132,Robert John,Adams,Crew,,,Southampton,,False,,,...,Male,26.0,Single,0,"Southampton, Hampshire, England",,26.0,Fireman,Engineering Crew,White Star Line
2428,Percy Snowden,Ahier,Crew,,,Southampton,,False,,,...,Male,20.0,Single,0,"Southampton, Hampshire, England",,20.0,Saloon Steward,Victualling Crew,White Star Line
1,Joseph Francis,Akerman,Crew,,,Southampton,,False,,,...,Male,37.0,Married,1,"Southampton, Hampshire, England",,37.0,Assistant Pantryman Steward,Victualling Crew,White Star Line
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
143,Mubārik Sulaymān Abī Āsī,Ḥannā,Passenger,2663.0,3rd Class Passengers,Cherbourg,7.0,True,boat 15,,...,Male,27.0,Single,0,"Hardīn, Batroun, Syria","Wilkes Barre, Pennsylvania, United States",66.0,,,
505,Mansūr,Ḥannā Al-Hāj,Passenger,2693.0,3rd Class Passengers,Cherbourg,7.0,False,,,...,Male,35.0,Married,1,"Kafr Mishki, Syria","Ottawa, Ontario, Canada",,,,
659,Ṭannūs Ḥannā Mu'awwad,Ṭannūs,Passenger,2684.0,3rd Class Passengers,Cherbourg,7.0,False,,,...,Male,16.0,,0,,"Columbus, Ohio, United States",,Scholar,,
946,Ḥannā,Ṭannūs Mu'awwad,Passenger,2681.0,3rd Class Passengers,Cherbourg,6.0,False,,,...,Male,34.0,,4,,"Columbus, Ohio, United States",34.0,Dealer,,


Sort the dataset permanently according to traveler type (crew vs. passenger) and lastname?


In [17]:
df = df. sort_values(by=['type','family_name'])
df

Unnamed: 0,first_name,family_name,type,ticket_number,pclass,departure,price,survived,lifeboat,body_no,...,gender,age,marital_status,number_relatives_onboard,home_location,destination,age_at_death,occupation,department,works_for
434,Ernest Owen,Abbott,Crew,,,Southampton,,False,,,...,Male,21.0,Single,1,"Southampton, Hampshire, England",,21.0,Lounge Pantry Steward,Victualling Crew,White Star Line
221,William Thomas,Abrams,Crew,,,Southampton,,False,,,...,Male,34.0,Married,1,"Southampton, Hampshire, England",,,Fireman,Engineering Crew,White Star Line
1132,Robert John,Adams,Crew,,,Southampton,,False,,,...,Male,26.0,Single,0,"Southampton, Hampshire, England",,26.0,Fireman,Engineering Crew,White Star Line
2428,Percy Snowden,Ahier,Crew,,,Southampton,,False,,,...,Male,20.0,Single,0,"Southampton, Hampshire, England",,20.0,Saloon Steward,Victualling Crew,White Star Line
1,Joseph Francis,Akerman,Crew,,,Southampton,,False,,,...,Male,37.0,Married,1,"Southampton, Hampshire, England",,37.0,Assistant Pantryman Steward,Victualling Crew,White Star Line
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
143,Mubārik Sulaymān Abī Āsī,Ḥannā,Passenger,2663.0,3rd Class Passengers,Cherbourg,7.0,True,boat 15,,...,Male,27.0,Single,0,"Hardīn, Batroun, Syria","Wilkes Barre, Pennsylvania, United States",66.0,,,
505,Mansūr,Ḥannā Al-Hāj,Passenger,2693.0,3rd Class Passengers,Cherbourg,7.0,False,,,...,Male,35.0,Married,1,"Kafr Mishki, Syria","Ottawa, Ontario, Canada",,,,
659,Ṭannūs Ḥannā Mu'awwad,Ṭannūs,Passenger,2684.0,3rd Class Passengers,Cherbourg,7.0,False,,,...,Male,16.0,,0,,"Columbus, Ohio, United States",,Scholar,,
946,Ḥannā,Ṭannūs Mu'awwad,Passenger,2681.0,3rd Class Passengers,Cherbourg,6.0,False,,,...,Male,34.0,,4,,"Columbus, Ohio, United States",34.0,Dealer,,


How many crew members died?


In [41]:
df[(df.type=='Crew') & (df.survived==False)].count()[0]

  df[(df.type=='Crew') & (df.survived==False)].count()[0]


np.int64(677)

# 2. Lifeboats


Which were the 5 boats that saved the largest number of people?


In [92]:

df[df['survived']==True].groupby('lifeboat')['survived'].count().sort_values(ascending=False)[:5]


lifeboat
boat 13    46
boat 11    44
boat 15    38
boat 14    35
boat 4     34
Name: survived, dtype: int64

How many lifeboats were there in total?


In [93]:
df['lifeboat'].nunique()

20

Which lifeboats saved the largest number of male passengers?


In [99]:

df[(df['survived']==True) & (df['gender']=='Male') ].groupby('lifeboat')['survived'].count().sort_values(ascending=False)


lifeboat
boat 15    32
boat 13    30
boat B     23
boat 9     20
boat 5     18
boat 3     18
boat 7     16
boat 11    16
boat 14    15
boat 4     12
boat A     11
boat 1     10
boat D     10
boat 10     8
boat C      8
boat 2      7
boat 16     5
boat 6      4
boat 12     3
boat 8      3
Name: survived, dtype: int64

Which lifeboats saved the largest proportion of male adult passengers?


What was the average number of people saved on lifeboats?


Which boats saved in particular people from the 1st (2nd, 3rd) class?


What was the average age of rescued people, by lifeboat?


What was the (1) average age, and the number of people saved by lifeboat? (Note: Calculate this in one single query)


# 2. Investigation of ticket prices

The prices are given in British Pounds. 1 British Pound at that time corresponds to 161 US Dollar today. And 1 US Dollar corresponds currently to 0.90 Euro. Please convert the original fares to current day Euros, and store it in a new column.


Are the prices provided in the data the prices paid by person, or the prices paid by ticket (potentially covering multiple people)? Carry out a detailed data analysis to answer this question.


What are other interesting patterns related to the prices paid? Freely explore!


# 3. Freely Explore
