Skip to content

W-Tran/advanced-regression-techniques

Repository files navigation

Advanced Regression Techniques

Practicing regression in a Kaggle competition

A "knowledge" competition hosted by Kaggle to practice advanced regression techniques. The aim of participating in this competition was to practice tackling a "typical" regression problem. The notebook includes EDA, data cleaning, and building/interpreting the model I found to perform the best. The feature engineering I did for this particular dataset was inspired by this notebook. I tried to stick with Linear models (Lasso, OLS, GLMs etc) and avoided producing multi-model ensembles to boost my LB score to ensure model simplicity and interpretability (see final section of the notebook).

My final model was a simple OLS with feature selection performed sequentially using mlxtend (0.12090 RMSLE, 1027/4942 on LB as of 24/08/2019).

Things to try in the future to improve my LB score without resorting to ensembling:

  • YJT transforms for skewed features
  • experiment with different categorical encoding methods and discretisers
  • Create indicator features that flag categorical feature levels that spike the sale price (see step 3 of this)
  • MICE to impute MAR missing values

Installation

Simply clone the repo and install all dependencies listed in the requirements.txt file to an environment of your choice.

Usage

All results and plots can be reproduced using the advanced-regression-techniques.ipynb notebook. The Scikit-learn style data transformation pipeline can be found in the art_pipeline.ipynb notebook which uses slightly different feature engineering than the model found in the original notebook but achieves a similar LB score.

About

Kaggle competition to practice EDA and Regression techniques

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published