Skip to content

atrautsch/nlbse2022_replication_kit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Predicting Issue Types with seBERT

This replication kit shows how to fine-tune and evaluate the pre-trained seBERT model for the task of issue type classification. Be aware that the fine-tuning may not run on GPUs lower than Nvida RTX5000.

If you want to live test the final model you can do so here.

Create venv and install dependencies

python3.8 -m venv .
source bin/activate
pip install -r requirements.txt

Load provided data

cd data
wget https://tickettagger.blob.core.windows.net/datasets/github-labels-top3-803k-test.tar.gz
wget https://tickettagger.blob.core.windows.net/datasets/github-labels-top3-803k-train.tar.gz
gunzip github-labels-top3-803k-test.tar.gz
gunzip github-labels-top3-803k-train.tar.gz

Loading the pre-trained model

cd models
wget https://smartshark2.informatik.uni-goettingen.de/sebert/seBERT_pre_trained.tar.gz
tar -xzf seBERT_pre_trained.tar.gz

Loading the fine-tuned model

We provide the fine-tuned version of the model that we used here.

cd models
wget https://smartshark2.informatik.uni-goettingen.de/sebert/nlbse.tar.gz
tar -xzf nlbse.tar.gz
mv model nlbse

Running the Jupyter notebooks

source bin/activate
cd notebooks
jupyter lab

Fine-tuning the pre-trained model

The fine-tuning task is using the complete training data that is provided. We provide a Jupyter Notebook to show this with notebooks/FineTuneModel.ipynb. However, this is a very resource intensive task which we ran in the HPC system of the GWDG on RTX5000 GPUs. This may not run on GPUs with less vram without modification.

Evaluating the fine-tuned model

The evaluation task uses the fine-tuned model and just classifies the test data that is provided. We provide the Jupyter Notebook notebooks/EvaluateModel.ipynb to demonstrate this. As above, this may take a long time for the data.

Test the model

The Jupyter Notebook notebooks/LiveTest.ipynb loads the fine-tuned version and can be used to play with different inputs.

About

Replication kit for the NLBSE2022 Tool Competition

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published