# Reading data
Read the train.csv file as a pandas dataframe.

In [10]:
import pandas as pd
titanic = pd.read_csv("data/train.csv")
titanic

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S
...,...,...,...,...,...,...,...,...,...,...,...,...
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0000,,S
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0000,B42,S
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.4500,,S
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0000,C148,C


# Indexing
1. Create a function that returns the name of a passenger given their PassengerId.
2. Create a function that returns the PassengerId of a passenger given their Name.
3. Print a message with the ID of passenger **Montvila, Rev. Juozas** with the following format: 'The ID pf passenger Montvila, Rev. Juozas is ##'
4. Print a message with the name of the passenger with ID **42** with the following format: 'The passenger with ID 42 is X'

5. Print all information about the oldest passenger.

In [11]:
# Name given an Id

def id_to_name(df: pd.DataFrame, id: int) -> str:
  return df[df.PassengerId == id].Name.iloc[0]

id_to_name(titanic, 34)

'Wheadon, Mr. Edward H'

In [12]:
# Id given a Name

def name_to_id(df: pd.DataFrame, name: str) -> int:
  return df[df.Name == name].PassengerId.iloc[0]

name_to_id(titanic, "Wheadon, Mr. Edward H")

np.int64(34)

In [13]:
# Id of Montvila, Rev. Juozas

montvila_id = name_to_id(titanic, "Montvila, Rev. Juozas")
print("The Id of passenger Montvila, Rev. Juozas is " + str(montvila_id))

The Id of passenger Montvila, Rev. Juozas is 887


In [14]:
# Name of passenger 42

id42_name = id_to_name(titanic, 42)
print(f"The passenger with ID 42 is ", id42_name)

The passenger with ID 42 is  Turpin, Mrs. William John Robert (Dorothy Ann Wonnacott)


In [15]:
# Oldest passenger info
 
oldest_passenger = titanic["Age"].idxmax()
titanic.loc[oldest_passenger]

PassengerId                                     631
Survived                                          1
Pclass                                            1
Name           Barkworth, Mr. Algernon Henry Wilson
Sex                                            male
Age                                            80.0
SibSp                                             0
Parch                                             0
Ticket                                        27042
Fare                                           30.0
Cabin                                           A23
Embarked                                          S
Name: 630, dtype: object

# Subseting
We are asked to share data for analysis by a third party. Since our dataset contains personal details, we only want to share with them the following information: ticket classes, fares and port of embarkation. We are asked to deliver a sample of the first 100 rows of this dataset.

6. Create and save the new dataset in **data/port_fares.csv**.

In [7]:
if titanic.index.name is None: titanic.set_index("PassengerId", inplace = True)
titanic_alt = titanic.drop(columns = [columns for columns in titanic.columns if columns not in ["Ticket", "Fare", "Embarked"]])
first_100 = titanic_alt.iloc[:100]
first_100.to_csv("data/port_fares.csv")
first_100

Unnamed: 0_level_0,Ticket,Fare,Embarked
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,A/5 21171,7.2500,S
2,PC 17599,71.2833,C
3,STON/O2. 3101282,7.9250,S
4,113803,53.1000,S
5,373450,8.0500,S
...,...,...,...
96,374910,8.0500,S
97,PC 17754,34.6542,C
98,PC 17759,63.3583,C
99,231919,23.0000,S


# Counting
7. We want to know if there were any survivors over the age of 60, print all of their information.
8. How many people over 60 survived?
9. What percentage of people over 60 survived?

In [8]:
if titanic.index.name is not None: titanic.reset_index(drop = True)

# Passengers over 60
ages_over_60 = titanic[titanic["Age"] >= 60]
print(f"Passengers over 60 yrs old ->", ages_over_60.shape[0])
print("")
# Survivors over 60
survivors_over_60 = ages_over_60[ages_over_60["Survived"] == 1]
print(f"Survivors over 60 yrs old ->", survivors_over_60.shape[0])
print("")

# Percentage of people over 60 that survived
percentage_survivors_over_60 = survivors_over_60.shape[0] / ages_over_60.shape[0] * 100
print(f"Percentage of people over 60 that survived ->", round(percentage_survivors_over_60, 2), "%")
print("")

survivors_over_60.iloc[::]

Passengers over 60 yrs old -> 26

Survivors over 60 yrs old -> 7

Percentage of people over 60 that survived -> 26.92 %



Unnamed: 0_level_0,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
276,1,1,"Andrews, Miss. Kornelia Theodosia",female,63.0,1,0,13502,77.9583,D7,S
367,1,1,"Warren, Mrs. Frank Manley (Anna Sophia Atkinson)",female,60.0,1,0,110813,75.25,D37,C
484,1,3,"Turkula, Mrs. (Hedwig)",female,63.0,0,0,4134,9.5875,,S
571,1,2,"Harris, Mr. George",male,62.0,0,0,S.W./PP 752,10.5,,S
588,1,1,"Frolicher-Stehli, Mr. Maxmillian",male,60.0,1,1,13567,79.2,B41,C
631,1,1,"Barkworth, Mr. Algernon Henry Wilson",male,80.0,0,0,27042,30.0,A23,S
830,1,1,"Stone, Mrs. George Nelson (Martha Evelyn)",female,62.0,0,0,113572,80.0,B28,


# Women and children first?
10. Find out if women and children were more likely to survive.

11. Write a function that returns the percentage of people that survived from a subset given as a boolean Pandas series.

# Summarizing

12. What is the median age of the passengers?
13. How many passengers embarked from each port?

14. Generate two hypotheses about how does the survival rate differ among groups of passengers. Write your code to explore both hypotheses.