This repository contains the fourth assignment for the Applied Natural Language Processing course.
The assignment focuses on fine-tuning pre-trained language models (RoBERTa, BART) for the ComVE shared task from SemEval-2020.
The work is divided into three subtasks, each implemented in a separate Jupyter Notebook. Graduate students are required to complete all three subtasks.
The main goals of this assignment are to:
- Combine and pre-process input texts for text matching and multiple-choice problems.
- Fine-tune pre-trained language models for classification and sequence-to-sequence tasks.
- Evaluate model performance with accuracy, BLEU, and ROUGE metrics.
- Gain hands-on experience with Hugging Face Transformers, Datasets, and Trainer API.
Graduate students additionally complete SubTask C, implementing an end-to-end sequence-to-sequence solution.
.
├── Pretrained_LM_Subtask_A.ipynb # SubTask A: Text Matching (nonsensical statement detection)
├── Pretrained_LM_Subtask_B.ipynb # SubTask B: Multiple Choice (reason classification)
├── Pretrained_LM_Subtask_C.ipynb # SubTask C: Seq2Seq (reason generation) – graduate task
├── Assessment 4_ requirements.txt # Dependencies for Linux/Windows
├── Assignment 4_ requirements-macos.txt # Dependencies for macOS
├── .gitignore
└── README.md
python3 -m venv venv
source venv/bin/activate # Linux/macOS
venv\Scripts\activate # Windows
For Linux/Windows:
pip install -r "Assessment 4_ requirements.txt"
For macOS:
pip install -r "Assignment 4_ requirements-macos.txt"
The assignment uses the SemEval-2020 Task 4 Commonsense Validation and Explanation (ComVE) dataset.
-
Download/unzip the dataset:
unzip ALL\ data.zip -d SemEval2020-Task4-Data
This will create a folder
SemEval2020-Task4-Data/
containing:- Training Data (
subtaskA_data_all.csv
,subtaskB_data_all.csv
,subtaskC_data_all.csv
, etc.) - Development Data
- Test Data
- Gold answers
- Training Data (
-
Verify structure:
SemEval2020-Task4-Data/ ├── ALL data/ │ ├── Training Data/ │ ├── Dev Data/ │ └── Test Data/
Each notebook can be run independently:
-
SubTask A (
Pretrained_LM_Subtask_A.ipynb
): Text matching using RoBERTa viaAutoModelForSequenceClassification
. Task: Given two similar statements, identify the nonsensical one. Evaluation: Accuracy -
SubTask B (
Pretrained_LM_Subtask_B.ipynb
): Multiple-choice classification using RoBERTa via viaAutoModelForMultipleChoice
. Task: Given a nonsensical statement and three candidate reasons, classify which reason explains the statement. Evaluation: Accuracy -
SubTask C (
Pretrained_LM_Subtask_C.ipynb
): Sequence-to-sequence explanation generation using BART viaAutoModelForSeq2SeqLM
. Task: Given a nonsensical statement, generate a valid explanation. Evaluation: BLEU and ROUGE scores
For testing and grading:
pytest test.py # Undergraduate tasks (A and B)
pytest test_grads.py # Graduate task (C)
- SubTask A: Accuracy (expected ~0.49 on reduced dataset, ~0.93 on full training).
- SubTask B: Accuracy (expected ~0.51 reduced, ~0.93 full).
- SubTask C: BLEU and ROUGE metrics (expected BLEU ~0.22, ROUGE ~0.46).
-
For local training, set:
shrink_dataset = True base_model = True colab = False
-
For full-scale experiments (recommended), use Google Colab with GPU/TPU:
shrink_dataset = False base_model = False colab = True
-
Set
shrink_dataset = True
for quick debugging,False
for full training. -
GPU/TPU (e.g., Google Colab) is recommended for full fine-tuning runs.
-
ALL data.zip
is ignored from version control for space and licensing reasons; see Dataset Setup above.