Feedback Prize - Evaluating Student Writing

Kaggle write-up

The solution is an ensemble of 5 transformer models, each trained on 5 folds: 2x deberta-large, 2x deberta-v3-large and 1x longformer-large. The two versions of deberta models were trained using dropout=0.1 and dropout=0.15 and max_len=1024 parameters, while for longformer-large model dropout=0.1 and max_len=1536 parameters were used.

Download tez pytorch trainier library and put it at the root level
Put ./data directory at the root level and unzip the files downloaded from Kaggle there.
In order to use deberta v2 or v3, you need to patch transformers library to create a new fast tokenizer using data and instructions from this kaggle dataset.
Download microsoft/deberta-large, microsoft/deberta-v3-large and allenai/transformer-large or any other transformer models using nbs/download_model.ipynb and save them in ./model folder.
Create 5 training folds using nbs/creating_folds.ipynb.

Please make sure you run the script from parent directory of ./bin.

$ sh ./bin/train.sh

To train different models on different folds (0...4) make changes inside the train.sh file.

The training of each fold should fit into 15GB GPU memory.