Titanic survivals occured by chance? Or were they applied the well known "survival of the fittest"? The goal of this project is to reveal what features affect the outcome of survival. We learn these patterns on test data, predict survival on test data, and evaluate the prediction accuracy of our model.
Titanic Prediction :This data set corresponds to a set of anonymized records associated with passengers' travelling details and individual data.
Variable of interest: 0. Survival - Survival (0 = No; 1 = Yes). Not included in test.csv file. Predictors:
- Pclass - Passenger Class (1 = 1st; 2 = 2nd; 3 = 3rd)
- Name - Name
- Sex - Sex
- Age - Age
- Sibsp - Number of Siblings/Spouses Aboard
- Parch - Number of Parents/Children Aboard
- Ticket - Ticket Number
- Fare - Passenger Fare
- Cabin - Cabin
- Embarked - Port of Embarkation (C = Cherbourg; Q = Queenstown; S = Southampton)
"Use the Machine Learning Workflow to process and transform the titanic dataset to create a prediction model. This model must predict which passengers are likely to survive from the titanic sank with 90% or greater accuracy."
- Exploring data distributions
- Preparing data for machine learning
- Feature Selection
- Training a logistic regression model, a decision tree and a random forest model
- Try different subsets of features
- Measuring the accuracy of your model
To run the project, it is required that the following are installed in your system:
- anaconda
- Pyhton version: "^2.7"
- NumPy version: "^1.11.3"
- Matplotlib version: "^2.02"
- Pandas version: "^0.20.1"
- scikit-learn version: "^0.18.1"