Skip to content
Switch branches/tags
Go to file
Cannot retrieve contributors at this time

TalkingData Kaggle competition (link)

This repository includes data preprocessing, feature engineering and machine learning techniques to produce the top 13% results on Kaggle Private leaderboard



  • Random Forest Hyperparameter tuning, evaluate Random Forest's feature importance and visualize redundant features using dendogram
  • XGBoost Hyperparameter tuning (without tree depth tuning), visualize feature importance. This notebook produces final submission file.
  • Deep Neural Network with categorical embeddings Built with pytorch and library wrapper. Use cyclical learning rate to speed up training process.
  • Blending Simple average blending and short tutorial of using numpy memory map to save memory while process data.