C++ Python Logos TeX Yacc Perl Other
Switch branches/tags
Nothing to show
Clone or download
Latest commit 2c8f871 May 14, 2016
Failed to load latest commit information.
BlogPost add Kaggle Interview Aug 23, 2015
Code fix non-ascii error Jul 17, 2015
Data init commit Jul 12, 2015
Doc add decoding method comparison Jul 16, 2015
Fig add decoding method comparison Jul 16, 2015
Output add decoding method comparison Jul 16, 2015
libfm-1.40.windows init commit Jul 12, 2015
rgf1.2 init commit Jul 12, 2015
README.md update readme May 14, 2016



1st Place Solution for Search Results Relevance Competition on Kaggle

The best single model we have obtained during the competition was an XGBoost model with linear booster of Public LB score 0.69322 and Private LB score 0.70768. Our final winning submission was a median ensemble of 35 best Public LB submissions. This submission scored 0.70807 on Public LB and 0.72189 on Private LB.

What's New




See ./Doc/Kaggle_CrowdFlower_ChenglongChen.pdf for documentation.


  • download data from the competition website and put all the data into folder ./Data.
  • run python ./Code/Feat/run_all.py to generate features. This will take a few hours.
  • run python ./Code/Model/generate_best_single_model.py to generate best single model submission. In our experience, it only takes a few trials to generate model of best performance or similar performance. See the training log in ./Output/Log/[Pre@solution]_[Feat@svd100_and_bow_Jun27]_[Model@reg_xgb_linear]_hyperopt.log for example.
  • run python ./Code/Model/generate_model_library.py to generate model library. This is quite time consuming. But you don't have to wait for this script to finish: you can run the next step once you have some models trained.
  • run python ./Code/Model/generate_ensemble_submission.py to generate submission via ensemble selection.
  • if you don't want to run the code, just submit the file in ./Output/Subm.