Introduction

Exploratory notebooks and utils related to the Feedback Prize - Evaluating Student Writing Kaggle competition.

As team "Wagon Bar", we scored a nice top 44% on the private leaderboard :)

Some of our work (notebooks and datasets) is still currently private on Kaggle, but we hope to add it here soon.

Our work

During this 3 months-long competition, we:

Implemented a baseline using Naive Bayes
Did a lot of EDA
Used mostly Longformers (BERT-style transformer optimized for long inputs) for advanced modelling

Exploratory Data Analysis

You can find some of our findings in the notebook arthur/findings and valentin/First exploration

Modeling phase

Before using the models from the competition's best notebooks we tried several approaches, for example stacking a LSTM head after a Longformer. Details of the model achitectures are in the notebooks arthur/training_v2 and v3.

It is tough to compete with teams that have GPUs to train and fine-tune massive models, so re-using their work is a interesting way to learn while staying on top of the leaderboard. For our final inference notebook, we started from this high-score public notebook. The idea is to use 2 Longformers stacked together. Each Longformer is trained on 5 different folds, so we end up with 10 models, and we average the predictions.

The author of the notebook made public the weights of these 10 trained models. We took advantage of this to cross-validate locally a lot of post-processing ideas, without having to train models ourselves. Luckily we found one idea that was indeed improving our CV score: the function clean_rebuttals at the end of our final inference notebook.

Please note that this notebook will not work locally, as it needs access to the competition data and the models weight. However, you can make it run on Kaggle here!

Set up

To run our code locally, you can clone this repository:

git clone git@github.com:Valentin-Laurent/evalstudent.git

We do not provide a requirement.txt file, so you may need to install new Python libraries to make it work.

Some notebooks are using functions defined in utils.pyand metrics.py. For these to run properly, you can install the package (in development mode) with:

cd evalstudent
pip install -e .

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
evalstudent		evalstudent
notebooks		notebooks
.gitignore		.gitignore
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Our work

Exploratory Data Analysis

Modeling phase

Set up

About

Releases

Packages

Contributors 3

Languages

Valentin-Laurent/evalstudent

Folders and files

Latest commit

History

Repository files navigation

Introduction

Our work

Exploratory Data Analysis

Modeling phase

Set up

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages