# **Predicting Passenger Survivor**
****

**Table content**
* **About Data**
* **Import Libraries**
* **load Datasets**
* **Explorer Datasets**
* **Data Split**
* **Data Cleaning**
* **Data Visualization**
* **Choose Model**
* **Fit Models**
* **Evaluate Model**
* **Fine The Model**

## **About Data**
RMS Titanic was a British passenger liner, operated by the White Star Line, which sank in the North Atlantic Ocean on 15 April 1912 after striking an iceberg. for more info click <a href='https://en.wikipedia.org/wiki/Titanic' style='text-decoration:none'>here</a>

<pre style="font-family: 'Brush Script MT', cursive, serif;">
<h3 style='font-size: 12'>Defination Of Feature Columns</h3>
<b>Survived:</b> Passager Survived 
0 = No 
1 = Yes
<b>Embarked:</b> Port of Embarkation 
C = Cherbourg
Q = Queenstown
S = Southampton
<b>Pclass:</b> ticket class
A proxy for socio-economic status (SES)
1st = Upper
2nd = Middle
3rd = Lower
<b>Sex:</b> passenger sex
<b>Age:</b> Age in years
Age is fractional if less than 1. 
If the age is estimated, is it in the form of xx.5
<b>Sibsp:</b> of siblings / spouses aboard the Titanic
The dataset defines family relations in this way...
Sibling = brother, sister, stepbrother, stepsister
Spouse = husband, wife (mistresses and fiancés were ignored)
<b>Parch:</b> of parents / children aboard the Titanic.
The dataset defines family relations in this way...
Parent = mother, father
Child = daughter, son, stepdaughter, stepson
Some children travelled only with a nanny, therefore parch=0 for them.
<b>Ticket:</b> ticket number
<b>Fare:</b> fare paid for a ticket
<b>cabin:</b> Cabin number
</pre>

## **Import Libraries**

In [3]:
library(tidyverse)
library(tidymodels)
library(mice)
library(ggthemes)
library(corrplot)

## **Load Datasets**

In [4]:
load_data <- function(train_path, test_path) {
    # Instantiate the train dataframe
    train_df <- readr::read_csv(train_path)

    # Instantiate the test dataframe
    test_df <- readr::read_csv(test_path)

    return(list(train = train_df, test = test_df))
}

In [5]:
# Initialize the data path
train_path <- "../datasets/train.csv"
test_path <- "../datasets/test.csv"

# Assign datasets to vector of dataframe
datasets <- load_data(train_path, test_path)

# Instantiate train and test dataframe
train_df <- datasets$train
test_df <- datasets$test

Parsed with column specification:
cols(
  PassengerId = col_double(),
  Survived = col_double(),
  Pclass = col_double(),
  Name = col_character(),
  Sex = col_character(),
  Age = col_double(),
  SibSp = col_double(),
  Parch = col_double(),
  Ticket = col_character(),
  Fare = col_double(),
  Cabin = col_character(),
  Embarked = col_character()
)
Parsed with column specification:
cols(
  PassengerId = col_double(),
  Pclass = col_double(),
  Name = col_character(),
  Sex = col_character(),
  Age = col_double(),
  SibSp = col_double(),
  Parch = col_double(),
  Ticket = col_character(),
  Fare = col_double(),
  Cabin = col_character(),
  Embarked = col_character()
)


## **Explorer Datasets**

### **Train And Test Dataframe**

#### **Train**

In [6]:
# Check out the first six row in train data frame
head(train_df)

PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38.0,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S
6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q


#### **Test**

In [7]:
# And First six rows in test dataframe
head(test_df)

PassengerId,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
892,3,"Kelly, Mr. James",male,34.5,0,0,330911,7.8292,,Q
893,3,"Wilkes, Mrs. James (Ellen Needs)",female,47.0,1,0,363272,7.0,,S
894,2,"Myles, Mr. Thomas Francis",male,62.0,0,0,240276,9.6875,,Q
895,3,"Wirz, Mr. Albert",male,27.0,0,0,315154,8.6625,,S
896,3,"Hirvonen, Mrs. Alexander (Helga E Lindqvist)",female,22.0,1,1,3101298,12.2875,,S
897,3,"Svensson, Mr. Johan Cervin",male,14.0,0,0,7538,9.225,,S


### **Structure Of The Dataframe**

#### **Train**

In [8]:
# View train data information
str(train_df)

spec_tbl_df [891 x 12] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ PassengerId: num [1:891] 1 2 3 4 5 6 7 8 9 10 ...
 $ Survived   : num [1:891] 0 1 1 1 0 0 0 0 1 1 ...
 $ Pclass     : num [1:891] 3 1 3 1 3 3 1 3 3 2 ...
 $ Name       : chr [1:891] "Braund, Mr. Owen Harris" "Cumings, Mrs. John Bradley (Florence Briggs Thayer)" "Heikkinen, Miss. Laina" "Futrelle, Mrs. Jacques Heath (Lily May Peel)" ...
 $ Sex        : chr [1:891] "male" "female" "female" "female" ...
 $ Age        : num [1:891] 22 38 26 35 35 NA 54 2 27 14 ...
 $ SibSp      : num [1:891] 1 1 0 1 0 0 0 3 0 1 ...
 $ Parch      : num [1:891] 0 0 0 0 0 0 0 1 2 0 ...
 $ Ticket     : chr [1:891] "A/5 21171" "PC 17599" "STON/O2. 3101282" "113803" ...
 $ Fare       : num [1:891] 7.25 71.28 7.92 53.1 8.05 ...
 $ Cabin      : chr [1:891] NA "C85" NA "C123" ...
 $ Embarked   : chr [1:891] "S" "C" "S" "S" ...
 - attr(*, "spec")=
  .. cols(
  ..   PassengerId = col_double(),
  ..   Survived = col_double(),
  ..   Pclass = col_double(

#### **Test**

In [9]:
# View train data information
str(test_df)

spec_tbl_df [418 x 11] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ PassengerId: num [1:418] 892 893 894 895 896 897 898 899 900 901 ...
 $ Pclass     : num [1:418] 3 3 2 3 3 3 3 2 3 3 ...
 $ Name       : chr [1:418] "Kelly, Mr. James" "Wilkes, Mrs. James (Ellen Needs)" "Myles, Mr. Thomas Francis" "Wirz, Mr. Albert" ...
 $ Sex        : chr [1:418] "male" "female" "male" "male" ...
 $ Age        : num [1:418] 34.5 47 62 27 22 14 30 26 18 21 ...
 $ SibSp      : num [1:418] 0 1 0 0 1 0 0 1 0 2 ...
 $ Parch      : num [1:418] 0 0 0 0 1 0 0 1 0 0 ...
 $ Ticket     : chr [1:418] "330911" "363272" "240276" "315154" ...
 $ Fare       : num [1:418] 7.83 7 9.69 8.66 12.29 ...
 $ Cabin      : chr [1:418] NA NA NA NA ...
 $ Embarked   : chr [1:418] "Q" "S" "Q" "S" ...
 - attr(*, "spec")=
  .. cols(
  ..   PassengerId = col_double(),
  ..   Pclass = col_double(),
  ..   Name = col_character(),
  ..   Sex = col_character(),
  ..   Age = col_double(),
  ..   SibSp = col_double(),
  ..   Parch = col_dou

### **Data Description**

In [10]:
summary(train_df)

  PassengerId       Survived          Pclass          Name          
 Min.   :  1.0   Min.   :0.0000   Min.   :1.000   Length:891        
 1st Qu.:223.5   1st Qu.:0.0000   1st Qu.:2.000   Class :character  
 Median :446.0   Median :0.0000   Median :3.000   Mode  :character  
 Mean   :446.0   Mean   :0.3838   Mean   :2.309                     
 3rd Qu.:668.5   3rd Qu.:1.0000   3rd Qu.:3.000                     
 Max.   :891.0   Max.   :1.0000   Max.   :3.000                     
                                                                    
     Sex                 Age            SibSp           Parch       
 Length:891         Min.   : 0.42   Min.   :0.000   Min.   :0.0000  
 Class :character   1st Qu.:20.12   1st Qu.:0.000   1st Qu.:0.0000  
 Mode  :character   Median :28.00   Median :0.000   Median :0.0000  
                    Mean   :29.70   Mean   :0.523   Mean   :0.3816  
                    3rd Qu.:38.00   3rd Qu.:1.000   3rd Qu.:0.0000  
                    Max.   :80.00 

In [11]:
summary(test_df)

  PassengerId         Pclass          Name               Sex           
 Min.   : 892.0   Min.   :1.000   Length:418         Length:418        
 1st Qu.: 996.2   1st Qu.:1.000   Class :character   Class :character  
 Median :1100.5   Median :3.000   Mode  :character   Mode  :character  
 Mean   :1100.5   Mean   :2.266                                        
 3rd Qu.:1204.8   3rd Qu.:3.000                                        
 Max.   :1309.0   Max.   :3.000                                        
                                                                       
      Age            SibSp            Parch           Ticket         
 Min.   : 0.17   Min.   :0.0000   Min.   :0.0000   Length:418        
 1st Qu.:21.00   1st Qu.:0.0000   1st Qu.:0.0000   Class :character  
 Median :27.00   Median :0.0000   Median :0.0000   Mode  :character  
 Mean   :30.27   Mean   :0.4474   Mean   :0.3923                     
 3rd Qu.:39.00   3rd Qu.:1.0000   3rd Qu.:0.0000                     
 Max

## **Split Data**

*Split dataset into train and validation dataset*

In [14]:
sample <- initial_split(train_df, prop = 0.80)
training_df <- training(sample)
testing_df <- testing(sample)

In [41]:
train_shape <- dim(training_df)
cat("Training_df shape:\n")
cat("\tNrow:", as.character(train_shape[1]), "\n")
cat("\tNcolumns:", as.character(train_shape[2]))

Training_df shape:
	Nrow: 712 
	Ncolumns: 12

In [42]:
test_shape <- dim(testing_df)
cat("Testing_df shape:\n")
cat("\tNrow:", as.character(test_shape[1]), "\n")
cat("\tNcolumns:", as.character(test_shape[2]))

Testing_df shape:
	Nrow: 179 
	Ncolumns: 12