# General instructions

- Install the package `Pandas` in your course-specific virtual environment, if you have not done so already.
- Store the data file `titanic.csv` either in the same directory as the current Jupyter notebook, or in a subdirectory named `data`.
- Read in the data set 'titanic.csv' as a Pandas DataFrame.
- Answer the questions below, and other questions that may arise in the process.

Note that the source of the data set is [Encyclopedia Titanica](https://www.encyclopedia-titanica.org/). Several preprocessing steps have been carried out on the raw dataset. Note that there is also a (less complete) variant of the data set available via the Python package `seaborn`.

There may be occasional problems or errors in the data (unplausible, wrong, ...). If you find some error or strange anomaly, then please give me a hint, so that I can further curate the dataset for the future.


# 1. Traveler type


In [13]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [14]:
df = pd.read_csv('titanic.csv')

In [15]:
df.columns

Index(['first_name', 'family_name', 'type', 'ticket_number', 'pclass',
       'departure', 'price', 'survived', 'lifeboat', 'body_no', 'nationality',
       'gender', 'age', 'marital_status', 'number_relatives_onboard',
       'home_location', 'destination', 'age_at_death', 'occupation',
       'department', 'works_for'],
      dtype='object')

In [16]:
df.gender

0         Male
1         Male
2       Female
3         Male
4         Male
         ...  
2469      Male
2470      Male
2471      Male
2472      Male
2473      Male
Name: gender, Length: 2474, dtype: object

How many crew members and how many passengers are recorded in the dataset?


In [17]:
members = df.groupby('type').count()
members

Unnamed: 0_level_0,first_name,family_name,ticket_number,pclass,departure,price,survived,lifeboat,body_no,nationality,gender,age,marital_status,number_relatives_onboard,home_location,destination,age_at_death,occupation,department,works_for
type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
Crew,1122,1122,0,0,1122,0,1122,172,0,1112,1122,1120,929,1122,906,0,691,1122,974,1122
Passenger,1343,1351,1346,1351,1351,1324,1352,344,0,1333,1350,1325,920,1352,1216,909,1051,728,0,11


In [18]:
df

Unnamed: 0,first_name,family_name,type,ticket_number,pclass,departure,price,survived,lifeboat,body_no,...,gender,age,marital_status,number_relatives_onboard,home_location,destination,age_at_death,occupation,department,works_for
0,Luka,Orešković,Passenger,315094.0,3rd Class Passengers,Southampton,8.0,False,,,...,Male,20.0,Married,3,"Konjsko Brdo, Croatia [Austria-Hungary]","Chicago, Illinois, United States",20.0,Farmer,,
1,Joseph Francis,Akerman,Crew,,,Southampton,,False,,,...,Male,37.0,Married,1,"Southampton, Hampshire, England",,37.0,Assistant Pantryman Steward,Victualling Crew,White Star Line
2,Mary Elizabeth,Davison,Passenger,386525.0,3rd Class Passengers,Southampton,16.0,True,boat 10,,...,Female,34.0,Married,2,"Liverpool, Lancashire, England","Bedford, Ohio, United States",61.0,,,
3,Ernest Edward Samuel,Freeman,Crew,,,Belfast,,False,,,...,Male,45.0,Married,1,"Southampton, Hampshire, England",,,Deck Steward (1st Class),Victualling Crew,White Star Line
4,George Alfred,Levett,Crew,,,Belfast,,False,,,...,Male,25.0,Married,2,"Southampton, Hampshire, England",,25.0,Assistant Pantryman Steward (1st Class),Victualling Crew,White Star Line
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2469,Francis,Ford,Crew,,,Southampton,,False,,,...,Male,44.0,Married,1,"Southampton, Hampshire, England",,44.0,Bedroom Steward (2nd class),Victualling Crew,White Star Line
2470,Philipp Edmund,Mock,Passenger,13236.0,1st Class Passengers,Cherbourg,57.0,True,boat 11,,...,Male,30.0,Single,0,"New York City, New York, United States",,69.0,,,
2471,Robert Douglas,Norman,Passenger,218629.0,2nd Class Passengers,Southampton,13.0,False,,,...,Male,27.0,Single,0,"Glasgow, Scotland","Vancouver, British Colombia, Canada",27.0,Electrical Engineer,,
2472,Wilhelm Johansson,Skoog,Passenger,347088.0,3rd Class Passengers,Southampton,27.0,False,,,...,Male,40.0,Married,8,"Hällekis, Västergötland, Sweden","Iron Mountain, Michigan, United States",40.0,General Labourer,,


Sort the dataset permanently according to traveler type (crew vs. passenger) and lastname?


In [19]:
df = df. sort_values(by=['type','family_name'])
df

Unnamed: 0,first_name,family_name,type,ticket_number,pclass,departure,price,survived,lifeboat,body_no,...,gender,age,marital_status,number_relatives_onboard,home_location,destination,age_at_death,occupation,department,works_for
434,Ernest Owen,Abbott,Crew,,,Southampton,,False,,,...,Male,21.0,Single,1,"Southampton, Hampshire, England",,21.0,Lounge Pantry Steward,Victualling Crew,White Star Line
221,William Thomas,Abrams,Crew,,,Southampton,,False,,,...,Male,34.0,Married,1,"Southampton, Hampshire, England",,,Fireman,Engineering Crew,White Star Line
1132,Robert John,Adams,Crew,,,Southampton,,False,,,...,Male,26.0,Single,0,"Southampton, Hampshire, England",,26.0,Fireman,Engineering Crew,White Star Line
2428,Percy Snowden,Ahier,Crew,,,Southampton,,False,,,...,Male,20.0,Single,0,"Southampton, Hampshire, England",,20.0,Saloon Steward,Victualling Crew,White Star Line
1,Joseph Francis,Akerman,Crew,,,Southampton,,False,,,...,Male,37.0,Married,1,"Southampton, Hampshire, England",,37.0,Assistant Pantryman Steward,Victualling Crew,White Star Line
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
143,Mubārik Sulaymān Abī Āsī,Ḥannā,Passenger,2663.0,3rd Class Passengers,Cherbourg,7.0,True,boat 15,,...,Male,27.0,Single,0,"Hardīn, Batroun, Syria","Wilkes Barre, Pennsylvania, United States",66.0,,,
505,Mansūr,Ḥannā Al-Hāj,Passenger,2693.0,3rd Class Passengers,Cherbourg,7.0,False,,,...,Male,35.0,Married,1,"Kafr Mishki, Syria","Ottawa, Ontario, Canada",,,,
659,Ṭannūs Ḥannā Mu'awwad,Ṭannūs,Passenger,2684.0,3rd Class Passengers,Cherbourg,7.0,False,,,...,Male,16.0,,0,,"Columbus, Ohio, United States",,Scholar,,
946,Ḥannā,Ṭannūs Mu'awwad,Passenger,2681.0,3rd Class Passengers,Cherbourg,6.0,False,,,...,Male,34.0,,4,,"Columbus, Ohio, United States",34.0,Dealer,,


How many crew members died?


In [20]:
df[(df.type=='Crew') & (df.survived==False)].count()[0]

  df[(df.type=='Crew') & (df.survived==False)].count()[0]


np.int64(677)

# 2. Lifeboats


Which were the 5 boats that saved the largest number of people?


In [21]:

df[df['survived']==True].groupby('lifeboat')['survived'].count().sort_values(ascending=False)[:5]


lifeboat
boat 13    46
boat 11    44
boat 15    38
boat 14    35
boat 4     34
Name: survived, dtype: int64

How many lifeboats were there in total?


In [22]:
df['lifeboat'].nunique()

20

Which lifeboats saved the largest number of male passengers?


In [23]:

df[(df['survived']==True) & (df['gender']=='Male') ].groupby('lifeboat')['survived'].count().sort_values(ascending=False)


lifeboat
boat 15    32
boat 13    30
boat B     23
boat 9     20
boat 5     18
boat 3     18
boat 7     16
boat 11    16
boat 14    15
boat 4     12
boat A     11
boat 1     10
boat D     10
boat 10     8
boat C      8
boat 2      7
boat 16     5
boat 6      4
boat 12     3
boat 8      3
Name: survived, dtype: int64

Which lifeboats saved the largest proportion of male adult passengers?


In [31]:
df[(df['survived']==True) & (df['gender']=='Male') & (df['age']>18) ].groupby('lifeboat')['survived'].count().sort_values(ascending=False)[:1]


lifeboat
boat 15    31
Name: survived, dtype: int64

What was the average number of people saved on lifeboats?


In [68]:
survivors_per_boat = df[df['survived']==True].groupby('lifeboat')['survived'].count()
survivors_per_boat.mean()

np.float64(25.8)

Which boats saved in particular people from the 1st (2nd, 3rd) class?


In [80]:
df[(df['survived']==True) & (df['pclass']=="3rd Class Passengers	")].groupby('lifeboat')['pclass'].value_counts().sort_values(ascending=False)[:1]


Series([], Name: count, dtype: int64)

What was the average age of rescued people, by lifeboat?


In [81]:
df[(df['survived']==True)].groupby('lifeboat')['age'].mean()


lifeboat
boat 1     34.416667
boat 10    28.433333
boat 11    31.048780
boat 12    28.250000
boat 13    26.311111
boat 14    27.971429
boat 15    30.263158
boat 16    27.800000
boat 2     30.111111
boat 3     36.516129
boat 4     34.852941
boat 5     36.058824
boat 6     34.090909
boat 7     31.040000
boat 8     37.200000
boat 9     31.454545
boat A     32.833333
boat B     26.869565
boat C     29.428571
boat D     33.692308
Name: age, dtype: float64

What was the (1) average age, and the number of people saved by lifeboat? (Note: Calculate this in one single query)


In [83]:
df[(df['survived']==True)].groupby('lifeboat').agg(
    {
        'age':'mean',
        'survived':'count'
    }
)


Unnamed: 0_level_0,age,survived
lifeboat,Unnamed: 1_level_1,Unnamed: 2_level_1
boat 1,34.416667,12
boat 10,28.433333,31
boat 11,31.04878,44
boat 12,28.25,16
boat 13,26.311111,46
boat 14,27.971429,35
boat 15,30.263158,38
boat 16,27.8,10
boat 2,30.111111,18
boat 3,36.516129,31


# 2. Investigation of ticket prices

The prices are given in British Pounds. 1 British Pound at that time corresponds to 161 US Dollar today. And 1 US Dollar corresponds currently to 0.90 Euro. Please convert the original fares to current day Euros, and store it in a new column.


In [88]:
df['today_euro_price'] = df['price']*161*0.9


Unnamed: 0,price,today_euro_price
2054,8.0,1159.2
195,7.0,1014.3
734,7.0,1014.3
772,7.0,1014.3
2148,8.0,1159.2
143,7.0,1014.3
505,7.0,1014.3
659,7.0,1014.3
946,6.0,869.4
60,,


Are the prices provided in the data the prices paid by person, or the prices paid by ticket (potentially covering multiple people)? Carry out a detailed data analysis to answer this question.


What are other interesting patterns related to the prices paid? Freely explore!


# 3. Freely Explore
