# How to run

* This notebook implements the adapter approach (AdaptIRC) training a single Adapter for all repositories.

* To run the notebook in Colab, just change the environment to GPU through: Runtime >> Change runtime type >> Hardware Accelerator >> GPU.

* You may require WANDB token if using newer versions of transformers lib

# Install Dependencies

In [None]:
!pip install -Uq adapters
!pip install -q datasets
!pip install -Uq accelerate
!pip install scikit-learn wandb pynvml

[0mCollecting scikit-learn
  Downloading scikit_learn-1.4.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (11 kB)
Collecting wandb
  Downloading wandb-0.17.0-py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (10 kB)
Collecting pynvml
  Downloading pynvml-11.5.0-py3-none-any.whl.metadata (7.8 kB)
Collecting scipy>=1.6.0 (from scikit-learn)
  Downloading scipy-1.13.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (60 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.6/60.6 kB[0m [31m1.1 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hCollecting joblib>=1.2.0 (from scikit-learn)
  Downloading joblib-1.4.2-py3-none-any.whl.metadata (5.4 kB)
Collecting threadpoolctl>=2.0.0 (from scikit-learn)
  Downloading threadpoolctl-3.5.0-py3-none-any.whl.metadata (13 kB)
Collecting click!=8.0.0,>=7.1 (from wandb)
  Downloading click-8.1.7-py3-none-any.whl.metadata (3.0 kB)
Coll

# Import Libraires

Here, we are importing libraries that would be used throughout the notebook. (Pandas, Json, OS, Sklearn, numpy, collections, transformers, adapters, random, torch, re [regular expression] ).

In [None]:
from collections import defaultdict

from transformers import TrainingArguments, EvalPrediction, TrainerCallback, DataCollatorWithPadding

from sklearn.metrics import classification_report, recall_score, f1_score, precision_score

# Setting Seed

These lines set the seed for reproducability for several libraries ( torch, random, numpy, transformers)

In [None]:
import torch
import random
from transformers import set_seed
import numpy as np
from pynvml import *

# Selecting a random seed
RANDOM_SEED = 42

# Setting seed for reproducability
set_seed(RANDOM_SEED)
torch.manual_seed(RANDOM_SEED)
random.seed(RANDOM_SEED)
np.random.seed(RANDOM_SEED)

# Dataset

Reading the dataset cloned from NLBSE Github repository:

In [None]:
import pandas as pd

# Reading the CSV files from the NLBSE github
train_set = pd.read_csv("https://raw.githubusercontent.com/nlbse2024/issue-report-classification/main/data/issues_train.csv")
test_set = pd.read_csv("https://raw.githubusercontent.com/nlbse2024/issue-report-classification/main/data/issues_test.csv")

# Dataset Processing

In [None]:
# There were some Nan values that causes some issues, so, they are replaced with a single space
train_set=train_set.fillna(' ')
test_set=test_set.fillna(' ')

This function is used to pre-process the issues with various steps (
  
  * removing strings between triple Quotes
  * Remove new lines
  * Remove Links
  * Remove digits
  * Remove special characters except the question mark
  * Remove multiple spaces


In [None]:
import re

def preprocess(issues):
    processed_issues = []

    for issue in issues:

        # Remove strings between triple quotes
        issue = re.sub(r'```.*?```', ' ', issue, flags=re.DOTALL)

        # Remove new lines
        issue = re.sub(r'\n', ' ', issue)

        # Remove links
        issue = re.sub(r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\(\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', ' ', issue)

        # Remove digits
        issue = re.sub(r'\d+', ' ', issue)

        # Remove special characters except the question marks
        issue = re.sub(r'[^a-zA-Z0-9?\s]', ' ', issue)
        issue = re.sub(r'\s+', ' ', issue)

        processed_issues.append(issue)

    return processed_issues

In [None]:
# Apply the pre-process function for both train and testing sets on both the title and body.
train_set['title'] = preprocess(train_set['title'])
train_set['body'] = preprocess(train_set['body'])

test_set['title'] = preprocess(train_set['title'])
test_set['body'] = preprocess(train_set['body'])

In [None]:
# This code is taken from NLBSE
# creating the dataset with grouping it via repositry (repo)

from datasets import Dataset

# Combining the title and body for a new field called text.
def process_dataset(dataset):
    dataset['text'] = dataset['title'] + " " + str(dataset['body'])
    dataset = dataset[['text', 'label', 'repo']]
    return dataset

train_set = process_dataset(train_set)
test_set = process_dataset(test_set)

datasets = {
    'train': train_set, 'test': test_set}

In [None]:
ds_train = Dataset.from_pandas(datasets['train'])
ds_train = ds_train.class_encode_column('label')

Casting to class labels:   0%|          | 0/1500 [00:00<?, ? examples/s]

In [None]:
ds_test = Dataset.from_pandas(datasets['test'])
ds_test = ds_test.class_encode_column('label')

Casting to class labels:   0%|          | 0/1500 [00:00<?, ? examples/s]

# Model Configuration

Here is the new important code: Setting the configurations of the adapters and transformer model.

In [None]:
from transformers import RobertaTokenizer, RobertaConfig, TextClassificationPipeline
from adapters import RobertaAdapterModel

def create_model(model_name="roberta-base", max_length=256, truncation=True, padding="max_length", device="cuda"):
  # The tokenizer is based on Roberta. The configurations are: Max_length = 256, truncation = true, padding = max_length.
  tokenizer = RobertaTokenizer.from_pretrained(model_name, device=device, max_length=max_length, truncation=truncation, padding=padding)

  # Configuration: We have 3 labels: Bug, Enhancment, Question.
  config = RobertaConfig.from_pretrained(model_name, device=device, num_labels=3)

  # Configuration of the Adapter model.
  model = RobertaAdapterModel.from_pretrained(model_name, config=config)

  # This part is for inferencing
  classifier = TextClassificationPipeline(model=model, tokenizer=tokenizer, device=device, max_length=max_length, padding=padding, truncation=truncation)

  return tokenizer, model, classifier

# Creating Training and Infering Adapters

The training occurs all repositories at once:
* A classfication head is attached to the model defining the number of labels to be 3 and defining the labels.
* Initilaising the training of the Adapter
* Using Adapter Droput Trainer as the Callback.
* Configuring the adapter configuarion.
* Configure the trainer
* Adding the callback.
* Start training the adapter
* Evalauting the adapter

In [None]:
def show_gpu():
    nvmlInit()
    device_count = nvmlDeviceGetCount()
    total_memory = 0
    total_used = 0

    for i in range(device_count):
        handle = nvmlDeviceGetHandleByIndex(i)
        info = nvmlDeviceGetMemoryInfo(handle)

        total_memory += (info.total // 1048576)
        total_used += (info.used // 1048576)

    gpu_benchmark = {
    "name": nvmlDeviceGetName(nvmlDeviceGetHandleByIndex(0)),
    "num:": device_count ,
    "total_memory": total_memory,
    "total_used": total_used}

    nvmlShutdown()

    return gpu_benchmark

In [None]:
import wandb
from adapters import AdapterTrainer

references = {}
predictions = {}

# Parameter used for Training
learning_rate=1e-4
epochs=200
batch_size=32

wandb.init(
    project="IRC RoBERTa Adapters",
    group="Single Adapter",
    name="All Repos",   )

dataset = datasets
tokenizer, model, classifier = create_model()

# Setting the training data
train_set = ds_train

id2label = {x: train_set.features["label"].int2str(x) for x in range(train_set.features["label"].num_classes)}

# Tokenizing the training set
train_set = train_set.shuffle(seed=RANDOM_SEED)
train_set = train_set.map(lambda batch: tokenizer(batch["text"]), batched=True)
train_set.set_format(type="torch", columns=["input_ids", "attention_mask", "label"])

test_set = ds_test
test_set = test_set.shuffle(seed=RANDOM_SEED)
test_set = test_set.map(lambda batch: tokenizer(batch["text"]), batched=True)
test_set.set_format(type="torch", columns=["input_ids", "attention_mask", "label"])

# Adapter Name and Saving Direc
adapter_name = "irc-single-adapter"

# Add a new adapter
model.add_adapter(adapter_name, overwrite_ok=True)

# Add a matching classification head
model.add_classification_head(
    adapter_name,
    num_labels=3,
    id2label=id2label,
    overwrite_ok=True
)

# Initilaize the adapter training
model.train_adapter(adapter_name)

# Metrics used for evaluation (accuracy, precision, recall and F1)
def compute_metrics(p: EvalPrediction):
    labels = p.label_ids
    preds = np.argmax(p.predictions, axis=1)
    recall = recall_score(y_true=labels, y_pred=preds,average="weighted")
    precision = precision_score(y_true=labels, y_pred=preds,average="weighted")
    f1 = f1_score(y_true=labels, y_pred=preds,average="weighted")
    return {"precision": precision, "recall": recall, "f1": f1}

  # Configure Training arguements
training_args = TrainingArguments(
    learning_rate=learning_rate,
    num_train_epochs=epochs,
    per_device_train_batch_size=batch_size,
    logging_steps=100,
    output_dir=f"training_output/{adapter_name}",
    overwrite_output_dir=True,
    remove_unused_columns=False,
    save_strategy="no",
    seed=RANDOM_SEED
  )

  # Having a data Collator
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

  # Configure the Adapter Trainer
trainer = AdapterTrainer(
    model=model,
    args=training_args,
    eval_dataset=test_set,
    train_dataset=train_set,
    compute_metrics=compute_metrics,
    data_collator=data_collator
)

  # Create an Adapter Callback
class AdapterDropTrainerCallback(TrainerCallback):
      def on_step_begin(self, args, state, control, **kwargs):
        skip_layers = list(range(np.random.randint(0, 11)))
        kwargs['model'].set_active_adapters(adapter_name, skip_layers=skip_layers)

      def on_evaluate(self, args, state, control, **kwargs):
        kwargs['model'].set_active_adapters(adapter_name, skip_layers=None)


# Add the callback to the trainer
trainer.add_callback(AdapterDropTrainerCallback())

is_show_train_gpu = True;

# Start training the adapter
train_output = trainer.train()
print(train_output)

gpu_benchmark = show_gpu()
is_show_train_gpu = False
print(gpu_benchmark)


evaluation = trainer.evaluate()
display(evaluation)


# Save the adapter
model.save_adapter(f"training_output/{adapter_name}", adapter_name)

# Merging the Repo
model.merge_adapter(adapter_name)

wandb.finish()


[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
[34m[1mwandb[0m: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

  ········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc




tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/481 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/499M [00:00<?, ?B/s]

Some weights of RobertaAdapterModel were not initialized from the model checkpoint at roberta-base and are newly initialized: ['heads.default.3.bias', 'roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
The model 'RobertaAdapterModel' is not supported for . Supported models are ['AlbertForSequenceClassification', 'BartForSequenceClassification', 'BertForSequenceClassification', 'BigBirdForSequenceClassification', 'BigBirdPegasusForSequenceClassification', 'BioGptForSequenceClassification', 'BloomForSequenceClassification', 'CamembertForSequenceClassification', 'CanineForSequenceClassification', 'LlamaForSequenceClassification', 'ConvBertForSequenceClassification', 'CTRLForSequenceClassification', 'Data2VecTextForSequenceClassification', 'DebertaForSequenceClassification', 'DebertaV2ForSequenceClassification', 'DistilBertForSequenceClassification', 'ElectraForSequenceCl

Map:   0%|          | 0/1500 [00:00<?, ? examples/s]

Map:   0%|          | 0/1500 [00:00<?, ? examples/s]

dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)


Step,Training Loss
100,1.101
200,1.0972
300,0.9442
400,0.8159
500,0.7689
600,0.7354
700,0.7118
800,0.6644
900,0.6389
1000,0.6421


TrainOutput(global_step=9400, training_loss=0.27522293273438797, metrics={'train_runtime': 1214.5554, 'train_samples_per_second': 247.004, 'train_steps_per_second': 7.739, 'total_flos': 4.554818224231939e+16, 'train_loss': 0.27522293273438797, 'epoch': 200.0})
{'name': 'NVIDIA GeForce RTX 4090', 'num:': 1, 'total_memory': 24564, 'total_used': 14833}


{'eval_loss': 0.0012785957660526037,
 'eval_precision': 0.9993346640053227,
 'eval_recall': 0.9993333333333333,
 'eval_f1': 0.999333332666666,
 'eval_runtime': 3.8683,
 'eval_samples_per_second': 387.77,
 'eval_steps_per_second': 48.601,
 'epoch': 200.0}

In [None]:
# Calcualting and Adding the metrics
references = [model.config.id2label[id.item()] for id in test_set['label']]
predictions = [prediction['label'] for prediction in classifier(test_set['text'])]



In [None]:
# model.push_adapter_to_hub(
#    adapter_name,
#    adapter_name,
#    adapterhub_tag="IRC_Adapters",
#    datasets_tag="IRC")

# Metrics

In [None]:
# Setting the metrics and labels
metrics = ['precision', 'recall', 'f1-score']
labels = ['bug', 'feature', 'question']

In [None]:
# A function to get the metric results
def get_results ():
  results = defaultdict(dict)
  results = classification_report(references, predictions, output_dict=True)
  results['average'] = results['weighted avg']
  results = {label: {metric: results[label][metric] for metric in metrics} for label in labels + ['average']}
  results['overall'] = {label: {metric: np.mean([results[label][metric]]) for metric in metrics} for label in labels + ['average']}

  return results

In [None]:
# A function to write to a json file
import json

def write_json_file (results):
  #The output json file would be created containing the results.
  output_file_name = 'results.json'
  with open(output_file_name, 'w') as fp:
    json.dump(results, fp, indent=2)

In [None]:
# A function to print the results

def print_results (results):
  print("Label     Precision  Recall     F1")
  for label in labels + ['average']:
    out = f"{label:<10}"
    for metric in metrics:
      out += f"{results[label][metric]:<10.4f} "
      print(out)

In [None]:
# Call the function to get the results
results = get_results ()

# This function call create the Json file with the results
write_json_file(results)

# This function prints the results
print_results(results)

Label     Precision  Recall     F1
bug       0.9980     
bug       0.9980     1.0000     
bug       0.9980     1.0000     0.9990     
feature   1.0000     
feature   1.0000     0.9980     
feature   1.0000     0.9980     0.9990     
question  1.0000     
question  1.0000     1.0000     
question  1.0000     1.0000     1.0000     
average   0.9993     
average   0.9993     0.9993     
average   0.9993     0.9993     0.9993     


# References & Ack

This notebook uses codes from:
* https://github.com/adapter-hub/adapters/blob/main/notebooks/01_Adapter_Training.ipynb
* https://github.com/adapter-hub/adapters/blob/main/notebooks/05_Adapter_Drop_Training.ipynb
* https://huggingface.co/docs/transformers/tasks/sequence_classification
* https://github.com/nlbse2024/issue-report-classification/blob/main/2-Template-SetFit.ipynb
