No description, website, or topics provided.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
clf
code
data
.gitignore
LICENSE
README.md
requirements.txt
ubuntu_install

README.md

CauseEffectPairsChallenge

Name: Diogo Moitinho de Almeida Kaggle ID: Dee5 email: diogo149@gmail.com Team: ProtoML

Software Used: arch linux (for feature creation) python 2.7.5 numpy scipy scikit-learn pandas ipython ubuntu 12.04 (for hyperparameter optimization) python 2.7.3 numpy scipy scikit-learn pandas ipython

Package Versions: -numpy 1.7.1 -scipy 0.12.0 -pandas 0.11.0 -scikit-learn 0.13.1 -ipython 0.13.2

Hardware needed: -feature creation will probably take +5GB -running on the entire dataset took several days on an 8 core machine -about 4GB of RAM per core was needed

To run with training: -open an ipython terminal -run: >>> %time %run fc_train.py

To run with testing only: -open an ipython terminal -run: >>> %time %run test_only.py

Notes: -The relevant settings can be changed in SETTINGS.py

For my 3 submissions, I use settings:

Getting leaderboard score: 0.81367
    FC_TRAIN.USE_ALL_FEAT = False
    FC_TRAIN.USE_NON_GA_FEAT = False
    FC_TRAIN.CLF = GradientBoostingRegressor(loss='huber', n_estimators=5000, random_state=1, min_samples_split=2, min_samples_leaf=1, subsample=1.0, max_features=686, alpha=0.995355212043, max_depth=10, learning_rate=np.exp(-4.09679792914))

Getting leaderboard score: 0.81279
    FC_TRAIN.USE_ALL_FEAT = True
    FC_TRAIN.USE_NON_GA_FEAT = False
    FC_TRAIN.CLF = GradientBoostingRegressor(loss='huber', n_estimators=5000, random_state=1, min_samples_split=2, min_samples_leaf=1, subsample=1.0, max_features=500, alpha=0.95, max_depth=10, learning_rate=np.exp(-3.28469694591))

Getting leaderboard score: 0.81238
    FC_TRAIN.USE_ALL_FEAT = True
    FC_TRAIN.USE_NON_GA_FEAT = False
    FC_TRAIN.CLF = GradientBoostingRegressor(loss='huber', n_estimators=5000, random_state=1, min_samples_split=2, min_samples_leaf=1, subsample=1.0, max_features=686, alpha=0.99517924408, max_depth=10, learning_rate=np.exp(-4.10031144415))