Top 7% (Bronze medal) solution for Kaggle Otto RecSys Competition
The solution was two-stage model where it had (1) Candidate Retrieval and (2) Ranking.
# retrieve candidates and make features based on training set
./prepare_training.sh
# retrieve candidates and make features based on validation set
./prepare_validation.sh
# train & measure CV score
./run_training.sh
# retrieve candidates and make features based on test set
./prepare_scoring.sh
# scoring and make submission
./run_scoring.sh
In the bash script, need to define following env variables
- CLICK_MODEL
- CART_MODEL
- ORDER_MODEL
These values should refer to model artifact's name (output from training pipeline)
# create virtual env
conda create --name kaggle-otto python=3.10
# activate env
conda activate kaggle-otto
# install requirements
pip install -r requirements.txt