This repository contains the implementation of ULMFiT (Universal Language Model Fine-tuning) for classifying Non-Functional Requirements (NFR) sentences. The project includes a Jupyter notebook implementing the ULMFiT model using the FastAI library, as well as the dataset used for training, results, and checkpoint files.
The Universal Language Model Fine-tuning (ULMFiT) is a transfer learning technique introduced by Jeremy Howard and Sebastian Ruder in their paper "Universal Language Model Fine-tuning for Text Classification". It enables effective transfer learning for NLP tasks even with limited annotated data.
In this project, we apply ULMFiT to classify Non-Functional Requirements (NFR) sentences. NFRs represent constraints or criteria that specify how a system should behave, as opposed to what the system should do (Functional Requirements).
The dataset used in this project is located in the data
directory. It includes a CSV file (NFR_Dataset.csv
) containing NFR sentences. The dataset is split into training and validation sets using the Data Block API provided by the FastAI library.
The ULMFiT model is trained using the FastAI library within a Jupyter notebook (ULMFiT_NFR_Classification.ipynb
). The notebook covers the following steps:
- Data preprocessing and loading using the FastAI
TextList
API. - Language model training to fine-tune a pre-trained model on the NFR dataset.
- Encoder prediction example.
- Classifier training using the fine-tuned language model.
- Model evaluation and interpretation.
The trained model achieves competitive performance on the NFR classification task. Detailed results, including accuracy metrics, loss curves, and examples of misclassified sentences, are provided in the notebook.
The trained model checkpoints (fit_head.pth
, fine_tuned.pth
, first.pth
, second.pth
, third.pth
, fourth.pth
, fine_tuned_enc.pth
) are saved in the repository for reproducibility and further experimentation.
- Python >= 3.6
- FastAI
- PyTorch
To replicate the experiment or apply the trained model to new data, follow these steps:
- Clone this repository:
git clone https://github.com/your-username/ULMFiT_NFR_Classification.git
- Install the required dependencies:
pip install -r requirements.txt
- Run the Jupyter notebook
ULMFiT_NFR_Classification.ipynb
in your local environment.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
- Thanks to FastAI and PyTorch for providing powerful libraries for deep learning.
- The datasets used in this project were sourced from the International Requirements Engineering Conference’s 2017 Data Challenge dataset and the Predictors Models in Software Engineering (PROMISE) NFR dataset.
Cody Baker - https://github.com/cbgithub7