Kaggle Titanic Comp
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
data
.gitignore
README.md
__init__.py
genderclassmodel.py
gendermodel.py
learningcurve.py
loaddata.py
myfirstforest.py
naivebayes.py
naivebayes_gaussian.py
randomforest2.py
roc_auc.py
scorereport.py
sgdclassifier.py
svc.py

README.md

kaggle-titanic

This is the python/scikit-learn code I wrote during my stab at the Kaggle titanic competition. There is code for several different algorithms, but the primary and highest performing one is the RandomForest implemented in randomforest2.py.

Requirements:

Usage:
> python randomforest2.py

Key files:

  • loaddata.py: Contains all the feature engineering including options for generating different variable types, and performing PCA, clustering, and class balancing
  • randomforest2.py: The code that executes the pipeline
  • scorereport.py: Inspects and reports on the results of hyperparameter search
  • learningcurve.py: Includes code to generate a learning curve
  • roc_auc: Includes code to generate a ROC curve

Other files contain other algorithms that were used during experimentation and are in various stages of completeness. Only randomforest2 is 100% up to date