Titanic Example - Python

This repository contains files I created, using Python 2 and Jupyter Notebook, to predict survivors of the sinking of the Titanic (https://www.kaggle.com/c/titanic). Python is currently not my primary language for data analysis (that would be R, using the data.table package when possible). The datasets required to run the file are included. There are seven scripts, each of which appears in the three following formats:

.ipynb - Python Notebook file, which includes some very basic commenting and markdown formatting
.html - HTML output of Python notebook
.py - Raw Python script, without markdown coding

The seven scripts are as follows:

Titanic_data_exploration_MASTER - Inputs training data, performs some basic feature preprocessing that will be used among all algorithms, and outputs a cleaned dataset. It then uses the execfile() command to call and execute the other scripts
Final_setup_Random_Forest - Performs Random Forest algorithm, scanning across parameters in a forward-searching order to determine a high-quality fit
Final_setup_GBoost - Performs Gradient Boosting Algorithm, scanning across parameters in a forward-searching order to determine a high-quality fit (As an aside, there are few moments that made me wish I used a Mac as much as trying to install XGBoost on a Windows system)
Final_setup_SVM - Implements a Support Vector Machine, after creation of additional interaction variables from cleaned dataset. Uses the pipeline feature in Python to streamline code
Final_setup_Logit - Implements Logit Classifier, in a manner similar to that of the SVM
Final_setup_NB - Implements Naive Bayes Classifier
Final_ensemble - Inputs predictions from the 5 base learning algorithms on an extra test set (extracted from the initial input training set), implements a Gradient Boosting Classifier on these predictions, and uses the results to produce a final prediction for the test set submitted to Kaggle.

Note: For GBoost and Random Forest algorithms, a forward-stepwise parameter search was used. This would not be the optimal way to search for the parameters (In R, I would likely use the expandGrid x TuneGrid approach in the caret package, so I could compare both within and across parameters), but felt it would be sufficient to show as an example of Python programming.

Additionally, the Naive Bayes classifier actually performs so poorly that its use lowers the predictiveness of this model; again, I included it here as an example of a type of algorithm I could implement in Python, and incorporate into a larger ensemble model.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Final_ensemble.html		Final_ensemble.html
Final_ensemble.ipynb		Final_ensemble.ipynb
Final_ensemble.py		Final_ensemble.py
Final_setup_GBoost.html		Final_setup_GBoost.html
Final_setup_GBoost.ipynb		Final_setup_GBoost.ipynb
Final_setup_GBoost.py		Final_setup_GBoost.py
Final_setup_Logit.html		Final_setup_Logit.html
Final_setup_Logit.ipynb		Final_setup_Logit.ipynb
Final_setup_Logit.py		Final_setup_Logit.py
Final_setup_NB.html		Final_setup_NB.html
Final_setup_NB.ipynb		Final_setup_NB.ipynb
Final_setup_NB.py		Final_setup_NB.py
Final_setup_Random_Forest.html		Final_setup_Random_Forest.html
Final_setup_Random_Forest.ipynb		Final_setup_Random_Forest.ipynb
Final_setup_Random_Forest.py		Final_setup_Random_Forest.py
Final_setup_SVM.html		Final_setup_SVM.html
Final_setup_SVM.ipynb		Final_setup_SVM.ipynb
Final_setup_SVM.py		Final_setup_SVM.py
LICENSE		LICENSE
README.md		README.md
Titanic_data_exploration_MASTER.html		Titanic_data_exploration_MASTER.html
Titanic_data_exploration_MASTER.ipynb		Titanic_data_exploration_MASTER.ipynb
Titanic_data_exploration_MASTER.py		Titanic_data_exploration_MASTER.py
test.csv		test.csv
train.csv		train.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Titanic Example - Python

About

Releases

Packages

Languages

License

aaschroeder/Titanic_example_Python

Folders and files

Latest commit

History

Repository files navigation

Titanic Example - Python

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages