# Project Report: Fine-Tuning BERT for Multilabel Text Classification

## 1. Introduction

The objective of this project is to fine-tune a BERT[`bert-base-uncased`][`110M prams`] model for multilabel text classification.



## 2. Methodology



### 2.1 Dataset Details

We utilized the SemEval 2018 Task 1 dataset, specifically focusing on Subtask 5 for English language text. The dataset contains tweets with associated multilabel classifications. Labels include various sentiment and emotion categories.
##### Example of dataset
```yaml
{'ID': '2017-En-21441',
 'Tweet': "“Worry is a down payment on a problem you may never have'. \xa0Joyce Meyer.  #motivation #leadership #worry",
 'anger': False,
 'anticipation': True,
 'disgust': False,
 'fear': False,
 'joy': False,
 'love': False,
 'optimism': True,
 'pessimism': False,
 'sadness': False,
 'surprise': False,
 'trust': True}


### 2.2 Model Architecture

We employed the BERT-base-uncased pre-trained model as the backbone for our multilabel text classification task. The model was configured for multi-label classification, with the number of labels corresponding to the available categories in the dataset.

### 2.3 Preprocessing Data

Text data underwent tokenization and encoding using the BERT tokenizer, with padding to a maximum length of 128 tokens. Labels were processed to create a binary matrix representing the presence or absence of each label for each example.

### 2.4 Training Process

The model was trained using the Hugging Face `Trainer` class. Training parameters included a batch size of 8, a learning rate of 2e-5, and training for 5 epochs. The F1 score, ROC AUC, and accuracy metrics were used to evaluate model performance.

### 2.5 Evaluation Metrics

We utilized a custom evaluation metric function that computed F1 score, ROC AUC score, and accuracy for multilabel classification. This function was employed during training to monitor and assess model performance.

## 3. Results

### 3.1 Training Results

The training process yielded promising results, with the model achieving high F1 scores, ROC AUC scores, and accuracy on the validation set. Detailed metrics for each epoch are provided in the training logs.

### 3.2 Inference Results

Inference on sample text demonstrated the model's ability to make accurate multilabel predictions. The threshold for label assignment was set at 0.5, and the predicted labels were extracted based on this threshold.

## 4. Challenges Faced

### 4.1 GPU Resource Limitations

One major challenge was the limited availability of high-computation GPUs. To overcome this, I started with smaller model 'bert-based-uncased''110M prams' and explored quantized models to reduce resource requirements.

### 4.2 Fine-Tuning Process

Fine-tuning BERT for multilabel classification required careful consideration of label processing and model configuration. The custom evaluation metric function was crucial for assessing model performance.

## 5. Conclusions

The fine-tuned BERT model demonstrated strong performance on multilabel text classification tasks. Hugging face trainer API was used to finetune the dataset for multilabel text classification. The challenges related to GPU resources were mitigated by exploring alternative model configurations.

## 6. Future Work

Future work may involve experimenting with larger BERT models and other LLMs, exploring other datasets, and conducting a more extensive evaluation against existing multilabel classification models.