# Titanic Dataset

|Variable|Definition|Key|
|---|---|---|
|Survived|Survival|0 = No, 1 = Yes|
|Pclass|Ticket class|1 = 1st, 2 = 2nd, 3 = 3rd|
|Name|Name||
|Sex|Sex||
|Age|Age in years	||
|SibSp|# of siblings / spouses aboard the Titanic||
|Parch|# of parents / children aboard the Titanic||
|Ticket|Ticket number||
|Fare|Passenger fare||
|Cabin|Cabin number||
|Embarked|Port of Embarkation|C = Cherbourg, Q = Queenstown, S = Southampton|

## Import Libraries and Data
Import Necessary Libraries

In [2]:
# Libraries to help with reading and manipulating data
import numpy as np
import pandas as pd

# Libraries to help with data visualization
import matplotlib.pyplot as plt
import seaborn as sns

# to restrict the float value to 3 decimal places
pd.set_option('display.float_format', lambda x: '%.3f' % x)



Import The Dataset

In [3]:
#import data as a pandas dataframe

# Access the .csv file in Google Drive folder. The file path must be correct
data = pd.read_csv('Titanic_Dataset.csv')

## Data Check
- View the first and last rows of the dataframe
- Determine the number of entries in the dataframe
- Check the data types for each entry
- Check for missing values
- Check and remove duplicate values
- Create a statistical summary of the numerical data

In [5]:
data

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.000,1,0,A/5 21171,7.250,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.000,1,0,PC 17599,71.283,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.000,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.000,1,0,113803,53.100,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.000,0,0,373450,8.050,,S
...,...,...,...,...,...,...,...,...,...,...,...,...
886,887,0,2,"Montvila, Rev. Juozas",male,27.000,0,0,211536,13.000,,S
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.000,0,0,112053,30.000,B42,S
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.450,,S
889,890,1,1,"Behr, Mr. Karl Howell",male,26.000,0,0,111369,30.000,C148,C


In [7]:
data.shape

(891, 12)

In [9]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  891 non-null    int64  
 1   Survived     891 non-null    int64  
 2   Pclass       891 non-null    int64  
 3   Name         891 non-null    object 
 4   Sex          891 non-null    object 
 5   Age          714 non-null    float64
 6   SibSp        891 non-null    int64  
 7   Parch        891 non-null    int64  
 8   Ticket       891 non-null    object 
 9   Fare         891 non-null    float64
 10  Cabin        204 non-null    object 
 11  Embarked     889 non-null    object 
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB


In [10]:
data.isnull().sum()

PassengerId      0
Survived         0
Pclass           0
Name             0
Sex              0
Age            177
SibSp            0
Parch            0
Ticket           0
Fare             0
Cabin          687
Embarked         2
dtype: int64

In [14]:
data.duplicated().sum()


0

In [15]:
data.describe()

Unnamed: 0,PassengerId,Survived,Pclass,Age,SibSp,Parch,Fare
count,891.0,891.0,891.0,714.0,891.0,891.0,891.0
mean,446.0,0.384,2.309,29.699,0.523,0.382,32.204
std,257.354,0.487,0.836,14.526,1.103,0.806,49.693
min,1.0,0.0,1.0,0.42,0.0,0.0,0.0
25%,223.5,0.0,2.0,20.125,0.0,0.0,7.91
50%,446.0,0.0,3.0,28.0,0.0,0.0,14.454
75%,668.5,1.0,3.0,38.0,1.0,0.0,31.0
max,891.0,1.0,3.0,80.0,8.0,6.0,512.329


## Question 1 - Create a dataframe containing only passengers who embarked from Southhampton

In [16]:
data[data["Embarked"]== 'S']

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.000,1,0,A/5 21171,7.250,,S
2,3,1,3,"Heikkinen, Miss. Laina",female,26.000,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.000,1,0,113803,53.100,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.000,0,0,373450,8.050,,S
6,7,0,1,"McCarthy, Mr. Timothy J",male,54.000,0,0,17463,51.862,E46,S
...,...,...,...,...,...,...,...,...,...,...,...,...
883,884,0,2,"Banfield, Mr. Frederick James",male,28.000,0,0,C.A./SOTON 34068,10.500,,S
884,885,0,3,"Sutehall, Mr. Henry Jr",male,25.000,0,0,SOTON/OQ 392076,7.050,,S
886,887,0,2,"Montvila, Rev. Juozas",male,27.000,0,0,211536,13.000,,S
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.000,0,0,112053,30.000,B42,S


## Question 2 - Create a dataframe containing the last 30 entries in the dataset and include only the names and cabin numbers for these passengers.

In [18]:
data.loc[[x for x in range(860,891)], ["Name", "Cabin"]]

Unnamed: 0,Name,Cabin
861,"Giles, Mr. Frederick Edward",
862,"Swift, Mrs. Frederick Joel (Margaret Welles Ba...",D17
863,"Sage, Miss. Dorothy Edith ""Dolly""",
864,"Gill, Mr. John William",
865,"Bystrom, Mrs. (Karolina)",
866,"Duran y More, Miss. Asuncion",
867,"Roebling, Mr. Washington Augustus II",A24
868,"van Melkebeke, Mr. Philemon",
869,"Johnson, Master. Harold Theodor",
870,"Balkic, Mr. Cerin",


## Question 3 - How many passengers travelled 1st class, 2nd class and 3rd class?

In [None]:
data[]

## Question 4 - What was the mean age of survivors of the Titanic? What is the sum of all the ages for the survivors?

## Question 5 - What percentage of the passengers were women? (women/total passengers)

## Question 6: From which port did the most number of 3rd class passengers embark?

## Question 7 - What percentage of women survived? ( female survivors / total women) And what percentage of men survived? (male survivors/ total men)

## Question 8 - Create a table displaying the survival rates(percentage) of female and male passengers traveling in 1st, 2nd, 3rd class. (number of survivors/total number of passengers in that class of a particular class)