Kaggle's Allstate Purchase Prediction Challenge
Switch branches/tags
Nothing to show
Clone or download
alzmcr Merge pull request #1 from bryant1410/master
Fix broken headings in Markdown files
Latest commit d50a858 Apr 17, 2017
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
submission Fixes, Comments & Submissions May 23, 2014
.gitattributes First Commit May 19, 2014
.gitignore license & readme + small fixes May 29, 2014
LICENSE.txt license & readme + small fixes May 29, 2014
README.md Fix broken Markdown headings Apr 17, 2017
majorityvote_modelselection.py license & readme + small fixes May 29, 2014
parallel.py license & readme + small fixes May 29, 2014
utils.py last fixes May 30, 2014

README.md

Allstate Purchase Prediction Challenge

Requirements

Python 2.7.5 with Scikit-Learn 0.14a1, Numpy 1.8, Pandas 0.12
Windows 8, Intel i5-3230M @ 2.60Ghz, 16GB RAM
Developed on a HP Envy 17 j100tx laptop

How to generate the solution

Type "python majorityvote_modelselection.py" in Python shell or easily double click on Windows. Watch out on memory usage, even though "should" be configured not to exceed 8 GB with the default settings.

Comments

Using the default setting, this will fit the model and creates the submission which will score 0.53705 in the private L. This is the setting which combined with Breakfast Pirate ABCEDF combination, scored 0.53715 in the private LB and .54535 in the public LB. On the above system configuration this will take approximately 3 hours. If you’re impatience, set N=10 and NS=7 and will score 0.53710 in just 30 minutes! If you think is still slow try setting N=8, NS=6, params=[(30,5,23)] and is going to be even faster scoring as my best submission 0.53705 but lower on the public LB. If still slow, get a better computer!!!

The script will perform the the following steps:

  1. Prepare the data (load the files, transformation, clean and create the engineered features)
  2. Fit the Random Forests
  3. Make the prediction of the product G
  4. Selected the best Random Forest given the train set accuracy
  5. Do a majority vote using all the N model(s) and print the score on the cross validation set
  6. Do a majority vote using the NS selected model(s) and print the score on the cross validation set

Then, if submit is set to False:
a. Records the performance of the k-fold and loop
b. Exit the loop and make the prediction on the test set, do a majority vote using the selected models, fix the product accordingly with the state rule and create the submission file

License

Please refer for LICENSE.txt file