Skip to content

Mehrads/Titanic-Disaster

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Titanic-Disaster

Titanic - Machine Learning from Disaster 🛳️

"Our framework"

1. Problem Definition

The competition is simple: use machine learning to create a model that predicts which passengers survived the Titanic shipwreck.

While there was some element of luck involved in surviving, it seems some groups of people were more likely to survive than others.

In this challenge, we ask to build a predictive model that answers the question: “What sorts of people were more likely to survive?” using passenger data (ie name, age, gender, socio-economic class, etc).

2. Data

Our data is from the Kaggle website. The data has been split into two groups:

  • training set (train.csv)
  • test set (test.csv)

For the training set, they provide the outcome (also known as the “ground truth”) for each passenger. Our model will be based on “features” like passengers’ gender and class. You can also use feature engineering to create new features.

For the test set, we do not provide the ground truth for each passenger. It is our job to predict these outcomes. For each passenger in the test set, use the model we trained to predict whether or not they survived the sinking of the Titanic.

Data Dictionary

survival ► 0 = No, 1 = Yes

pclass ► Ticket class 1 = 1st, 2 = 2nd, 3 = 3rd

sex ► Sex

Age ► Age is fractional if less than 1. If the age is estimated, is it in the form of xx.5

sibsp ► The dataset defines family relations in this way... Sibling = brother, sister, stepbrother, stepsister Spouse = husband, wife (mistresses and fiancés were ignored)

parch ► Parent = mother, father Child = daughter, son, stepdaughter, stepson Some children traveled only with a nanny, therefore parch=0 for them.

ticket ► Ticket number

fare ► Passenger fare

cabin ► Cabin number

embarked ► Port of Embarkation C = Cherbourg, Q = Queenstown, S = Southampton

3. Evaluation

Metric

Your score is the percentage of passengers you correctly predict. This is known as accuracy. We should achieve 1.0 accuracy for being among the best submissions.

You see other phases throughout the notebook.

Hope you enjoy reading my code and I would be happy to hear your opinions and suggestions!