# How to run

AdaptIRC:


* This notebook implements the adapter approach (AdaptIRC) on the NLBSE 2024 Issue Report Classification task.

* To run the notebook in Colab, just change the environment to GPU through: Runtime >> Change runtime type >> Hardware Accelerator >> GPU.

* You may require WANDB token if using newer versions of transformers lib

# Import Libraires

Here, we are importing libraries that would be used throughout the notebook. (Pandas, Json, OS, Sklearn, numpy, collections, transformers, adapters, random, torch, re [regular expression] ).

In [1]:
from collections import defaultdict

from transformers import TrainingArguments, EvalPrediction, TrainerCallback, DataCollatorWithPadding

from sklearn.metrics import classification_report, recall_score, f1_score, precision_score

  from .autonotebook import tqdm as notebook_tqdm


# Setting Seed

These lines set the seed for reproducability for several libraries ( torch, random, numpy, transformers)

In [2]:
import torch
import random
from transformers import set_seed
import numpy as np

RANDOM_SEED = 42

set_seed(RANDOM_SEED)
torch.manual_seed(RANDOM_SEED)
random.seed(RANDOM_SEED)
np.random.seed(RANDOM_SEED)

# Dataset

Reading the dataset cloned from NLBSE Github repository:

In [3]:
import pandas as pd

train_set = pd.read_csv("https://raw.githubusercontent.com/nlbse2024/issue-report-classification/main/data/issues_train.csv")
test_set = pd.read_csv("https://raw.githubusercontent.com/nlbse2024/issue-report-classification/main/data/issues_test.csv")

# Dataset Processing

In [4]:
# There were some Nan values that causes some issues, so, they are replaced with a single space
train_set=train_set.fillna(' ')
test_set=test_set.fillna(' ')

This function is used to pre-process the issues with various steps (
  
  * removing strings between triple Quotes
  * Remove new lines
  * Remove Links
  * Remove digits
  * Remove special characters except the question mark
  * Remove multiple spaces


In [5]:
import re

def preprocess(issues):
    processed_issues = []

    for issue in issues:

        # Remove strings between triple quotes
        issue = re.sub(r'```.*?```', ' ', issue, flags=re.DOTALL)

        # Remove new lines
        issue = re.sub(r'\n', ' ', issue)

        # Remove links
        issue = re.sub(r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\(\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', ' ', issue)

        # Remove digits
        issue = re.sub(r'\d+', ' ', issue)

        # Remove special characters except the question marks
        issue = re.sub(r'[^a-zA-Z0-9?\s]', ' ', issue)
        issue = re.sub(r'\s+', ' ', issue)

        processed_issues.append(issue)

    return processed_issues

In [6]:
# Apply the pre-process function for both train and testing sets on both the title and body.
train_set['title'] = preprocess(train_set['title'])
train_set['body'] = preprocess(train_set['body'])

test_set['title'] = preprocess(train_set['title'])
test_set['body'] = preprocess(train_set['body'])

In [7]:
# This code is taken from NLBSE
# creating the dataset with grouping it via repositry (repo)

from datasets import Dataset

repos = list(set(train_set["repo"].unique()))

train_set.groupby(["repo", "label"]).size().unstack(fill_value=0)

# Combining the title and body for a new field called text.
def process_dataset(dataset):
    dataset['text'] = dataset['title'] + " " + str(dataset['body'])
    dataset = dataset[['text', 'label', 'repo']]
    return dataset

train_set = process_dataset(train_set)
test_set = process_dataset(test_set)

group_by_repo = lambda dataset: {
    repo: Dataset.from_pandas(dataset[dataset["repo"] == repo]).class_encode_column("label")
    for repo in dataset["repo"].unique()
}

train_sets = group_by_repo(train_set)
test_sets = group_by_repo(test_set)

datasets = {
    repo: {'train': train_sets[repo], 'test': test_sets[repo]} for repo in train_sets.keys()
}

Casting to class labels: 100%|██████████| 300/300 [00:00<00:00, 108782.85 examples/s]
Casting to class labels: 100%|██████████| 300/300 [00:00<00:00, 120629.97 examples/s]


Casting to class labels: 100%|██████████| 300/300 [00:00<00:00, 126283.74 examples/s]
Casting to class labels: 100%|██████████| 300/300 [00:00<00:00, 114047.97 examples/s]
Casting to class labels: 100%|██████████| 300/300 [00:00<00:00, 121750.48 examples/s]
Casting to class labels: 100%|██████████| 300/300 [00:00<00:00, 136488.90 examples/s]
Casting to class labels: 100%|██████████| 300/300 [00:00<00:00, 124104.07 examples/s]
Casting to class labels: 100%|██████████| 300/300 [00:00<00:00, 120780.50 examples/s]
Casting to class labels: 100%|██████████| 300/300 [00:00<00:00, 125041.36 examples/s]
Casting to class labels: 100%|██████████| 300/300 [00:00<00:00, 91846.07 examples/s]


# Model Configuration

Here is the new important code: Setting the configurations of the adapters and transformer model.

In [8]:
from transformers import RobertaTokenizer, RobertaConfig, TextClassificationPipeline
from adapters import RobertaAdapterModel

def create_model(model_name="roberta-base", max_length=256, truncation=True, padding="max_length", device="cuda"):
  # The tokenizer is based on Roberta. The configurations are: Max_length = 256, truncation = true, padding = max_length.
  tokenizer = RobertaTokenizer.from_pretrained(model_name, device=device, max_length=max_length, truncation=truncation, padding=padding)

  # Configuration: We have 3 labels: Bug, Enhancment, Question.
  config = RobertaConfig.from_pretrained(model_name, device=device, num_labels=3)

  # Configuration of the Adapter model.
  model = RobertaAdapterModel.from_pretrained(model_name, config=config)
  
  # This part is for inferencing
  classifier = TextClassificationPipeline(model=model, tokenizer=tokenizer, device=device, max_length=max_length, padding=padding, truncation=truncation)

  return tokenizer, model, classifier

# Creating Training and Infering Adapters

The training occurs at every repository:
* The train set is divided into: train and validate with the percentage of 30%.
* A classfication head is attached to the model defining the number of labels to be 3 and defining the labels.
* Initilaising the training of the Adapter
* Using Adapter Droput Trainer as the Callback.
* Configuring the adapter configuarion.
* Configure the trainer
* Adding the callback.
* Start training the adapter
* Evalauting the adapter

In [9]:
from adapters import AdapterTrainer
import torch

references = {}
predictions = {}

learning_rate=1e-4
epochs=200
batch_size=32

for repo in datasets.keys():

  dataset = datasets[repo]
  tokenizer, model, classifier = create_model()

  # The function used to tokenize the issues.
  def encode_batch(batch):
    return tokenizer(batch["text"])

  # Extracting the training and testing sets from the dataset per repo
  train_set = dataset['train'].shuffle(seed=RANDOM_SEED)
  
  id2label = {x: train_set.features["label"].int2str(x) for x in range(train_set.features["label"].num_classes)}

  # Tokenizing the training set
  train_set = train_set.map(encode_batch, batched=True)
  train_set.set_format(type="torch", columns=["input_ids", "attention_mask", "label"])

  # Adapter Name and Saving Directory
  adapter_name = f"irc-{repo.replace('/','-')}"

  # Add an Adapter
  model.add_adapter(adapter_name, overwrite_ok=True)
  # Add a matching classification head

  # Add a Classification Head
  model.add_classification_head(
    adapter_name,
    num_labels=3,
    id2label=id2label,
    overwrite_ok=True
  )

  # Initilaize the adapter training
  model.train_adapter(adapter_name)

  # Create an Adapter Callback
  class AdapterDropTrainerCallback(TrainerCallback):
      def on_step_begin(self, args, state, control, **kwargs):
        skip_layers = list(range(np.random.randint(0, 11)))
        kwargs['model'].set_active_adapters(adapter_name, skip_layers=skip_layers)

      def on_evaluate(self, args, state, control, **kwargs):
        kwargs['model'].set_active_adapters(adapter_name, skip_layers=None)


  # Metrics used for evaluation (accuracy, precision, recall and F1)
  def compute_metrics(p: EvalPrediction):
    labels = p.label_ids
    preds = np.argmax(p.predictions, axis=1)
    recall = recall_score(y_true=labels, y_pred=preds,average="weighted")
    precision = precision_score(y_true=labels, y_pred=preds,average="weighted")
    f1 = f1_score(y_true=labels, y_pred=preds,average="weighted")
    return {"precision": precision, "recall": recall, "f1": f1}

  # Configure Training arguements
  training_args = TrainingArguments(
    learning_rate=learning_rate,
    num_train_epochs=epochs,
    per_device_train_batch_size=batch_size,
    logging_steps=100,
    output_dir=f"training_output/{adapter_name}",
    overwrite_output_dir=True,
    remove_unused_columns=False,
    save_strategy="no",
    seed=RANDOM_SEED
  )

  # Having a data Collator
  data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

  # Configure the Adapter Trainer
  trainer = AdapterTrainer(
    model=model,
    args=training_args,
    train_dataset=train_set,
    compute_metrics=compute_metrics,
    data_collator=data_collator
  )

  # Add the callback to the trainer
  trainer.add_callback(AdapterDropTrainerCallback())

  # Start training the adapter
  trainer.train()

  # Save the adapter
  model.save_adapter(f"training_output/{adapter_name}", adapter_name)

  # Merging the Repo
  model.merge_adapter(adapter_name)

  test_set = dataset['test']

  # Calcualting and Adding the metrics
  references[repo] = [model.config.id2label[id] for id in test_set['label']]
  predictions[repo] = [prediction['label'] for prediction in classifier(test_set['text'])]


Some weights of RobertaAdapterModel were not initialized from the model checkpoint at roberta-base and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias', 'heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
The model 'RobertaAdapterModel' is not supported for . Supported models are ['AlbertForSequenceClassification', 'BartForSequenceClassification', 'BertForSequenceClassification', 'BigBirdForSequenceClassification', 'BigBirdPegasusForSequenceClassification', 'BioGptForSequenceClassification', 'BloomForSequenceClassification', 'CamembertForSequenceClassification', 'CanineForSequenceClassification', 'LlamaForSequenceClassification', 'ConvBertForSequenceClassification', 'CTRLForSequenceClassification', 'Data2VecTextForSequenceClassification', 'DebertaForSequenceClassification', 'DebertaV2ForSequenceClassification', 'DistilBertForSequenceClassification', 'ElectraForSequenceCl

Step,Training Loss
100,1.0514
200,0.6054
300,0.4332
400,0.3224
500,0.2273
600,0.1745
700,0.1368
800,0.0934
900,0.0949
1000,0.0715


Some weights of RobertaAdapterModel were not initialized from the model checkpoint at roberta-base and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias', 'heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
The model 'RobertaAdapterModel' is not supported for . Supported models are ['AlbertForSequenceClassification', 'BartForSequenceClassification', 'BertForSequenceClassification', 'BigBirdForSequenceClassification', 'BigBirdPegasusForSequenceClassification', 'BioGptForSequenceClassification', 'BloomForSequenceClassification', 'CamembertForSequenceClassification', 'CanineForSequenceClassification', 'LlamaForSequenceClassification', 'ConvBertForSequenceClassification', 'CTRLForSequenceClassification', 'Data2VecTextForSequenceClassification', 'DebertaForSequenceClassification', 'DebertaV2ForSequenceClassification', 'DistilBertForSequenceClassification', 'ElectraForSequenceCl

Step,Training Loss
100,1.1022
200,1.0515
300,0.8516
400,0.6796
500,0.526
600,0.4262
700,0.3558
800,0.2667
900,0.256
1000,0.1958


Some weights of RobertaAdapterModel were not initialized from the model checkpoint at roberta-base and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias', 'heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
The model 'RobertaAdapterModel' is not supported for . Supported models are ['AlbertForSequenceClassification', 'BartForSequenceClassification', 'BertForSequenceClassification', 'BigBirdForSequenceClassification', 'BigBirdPegasusForSequenceClassification', 'BioGptForSequenceClassification', 'BloomForSequenceClassification', 'CamembertForSequenceClassification', 'CanineForSequenceClassification', 'LlamaForSequenceClassification', 'ConvBertForSequenceClassification', 'CTRLForSequenceClassification', 'Data2VecTextForSequenceClassification', 'DebertaForSequenceClassification', 'DebertaV2ForSequenceClassification', 'DistilBertForSequenceClassification', 'ElectraForSequenceCl

Step,Training Loss
100,1.1019
200,1.0969
300,0.8216
400,0.5194
500,0.3072
600,0.2244
700,0.1613
800,0.1126
900,0.1315
1000,0.0897


Some weights of RobertaAdapterModel were not initialized from the model checkpoint at roberta-base and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias', 'heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
The model 'RobertaAdapterModel' is not supported for . Supported models are ['AlbertForSequenceClassification', 'BartForSequenceClassification', 'BertForSequenceClassification', 'BigBirdForSequenceClassification', 'BigBirdPegasusForSequenceClassification', 'BioGptForSequenceClassification', 'BloomForSequenceClassification', 'CamembertForSequenceClassification', 'CanineForSequenceClassification', 'LlamaForSequenceClassification', 'ConvBertForSequenceClassification', 'CTRLForSequenceClassification', 'Data2VecTextForSequenceClassification', 'DebertaForSequenceClassification', 'DebertaV2ForSequenceClassification', 'DistilBertForSequenceClassification', 'ElectraForSequenceCl

Step,Training Loss
100,1.1017
200,1.0942
300,0.8846
400,0.6464
500,0.4428
600,0.3302
700,0.2461
800,0.1749
900,0.1818
1000,0.1401


Some weights of RobertaAdapterModel were not initialized from the model checkpoint at roberta-base and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias', 'heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
The model 'RobertaAdapterModel' is not supported for . Supported models are ['AlbertForSequenceClassification', 'BartForSequenceClassification', 'BertForSequenceClassification', 'BigBirdForSequenceClassification', 'BigBirdPegasusForSequenceClassification', 'BioGptForSequenceClassification', 'BloomForSequenceClassification', 'CamembertForSequenceClassification', 'CanineForSequenceClassification', 'LlamaForSequenceClassification', 'ConvBertForSequenceClassification', 'CTRLForSequenceClassification', 'Data2VecTextForSequenceClassification', 'DebertaForSequenceClassification', 'DebertaV2ForSequenceClassification', 'DistilBertForSequenceClassification', 'ElectraForSequenceCl

Step,Training Loss
100,1.1013
200,1.0922
300,0.7675
400,0.5309
500,0.3495
600,0.2527
700,0.1917
800,0.1311
900,0.1188
1000,0.0981


# Metrics

This has been taken from the NLBSE repo, so, they are not commented.

In [10]:
results = defaultdict(dict)
metrics = ['precision', 'recall', 'f1-score']
labels = ['bug', 'feature', 'question']

for repo in repos:
  results[repo] = classification_report(references[repo], predictions[repo], output_dict=True)
  results[repo]['average'] = results[repo]['weighted avg']
  results[repo] = {label: {metric: results[repo][label][metric] for metric in metrics} for label in labels + ['average']}

results['overall'] = {label: {metric: np.mean([results[repo][label][metric] for repo in repos]) for metric in metrics} for label in labels + ['average']}


In [11]:
import json

#The output json file would be created containing the results.
output_file_name = 'results.json'
with open(output_file_name, 'w') as fp:
  json.dump(results, fp, indent=2)

print(f"Repository{' '*15}Label     Precision  Recall     F1")
for repo in repos + ['overall']:
  print("-"*63)
  for label in labels + ['average']:
    out = f"{repo:<25}{label:<10}"
    for metric in metrics:
      out += f"{results[repo][label][metric]:<10.4f} "
    print(out)

Repository               Label     Precision  Recall     F1
---------------------------------------------------------------
facebook/react           bug       1.0000     1.0000     1.0000     
facebook/react           feature   0.9901     1.0000     0.9950     
facebook/react           question  1.0000     0.9900     0.9950     
facebook/react           average   0.9967     0.9967     0.9967     
---------------------------------------------------------------
bitcoin/bitcoin          bug       1.0000     1.0000     1.0000     
bitcoin/bitcoin          feature   1.0000     1.0000     1.0000     
bitcoin/bitcoin          question  1.0000     1.0000     1.0000     
bitcoin/bitcoin          average   1.0000     1.0000     1.0000     
---------------------------------------------------------------
tensorflow/tensorflow    bug       0.9901     1.0000     0.9950     
tensorflow/tensorflow    feature   1.0000     1.0000     1.0000     
tensorflow/tensorflow    question  1.0000     0.9900     0

# References & Ack

This notebook uses codes from:
* https://github.com/adapter-hub/adapters/blob/main/notebooks/01_Adapter_Training.ipynb
* https://github.com/adapter-hub/adapters/blob/main/notebooks/05_Adapter_Drop_Training.ipynb
* https://huggingface.co/docs/transformers/tasks/sequence_classification
* https://github.com/nlbse2024/issue-report-classification/blob/main/2-Template-SetFit.ipynb
