# Stance Detection Using ArBERT

`Arabic Bidirectional Encoder Representations from Transformers`  
`AraStance Dataset`  
`Stance Detection` `Arabic Language` `Transformer Architecture`


---

In this notebook, we rely on the Arabic version of BERT to classify the stances of the articles in the AraStance dataset. The dataset was introduced in the paper:
```
AraStance: A Multi-Country and Multi-Domain Dataset of Arabic Stance Detection for Fact Checking.
```
The model was introduced in the paper:
```
ARBERT & MARBERT: Deep Bidirectional Transformers for Arabic.
```

## Packages

In [None]:
!pip install transformers

In [None]:
import torch
import numpy as np
from utils import *
from tqdm.auto import tqdm
from torch.optim import AdamW
from sklearn.metrics import f1_score
from torch.utils.data import Dataset, DataLoader
from transformers import AutoTokenizer, AutoModelForSequenceClassification

## Raw data

- Download the raw data:

In [None]:
!wget https://github.com/Tariq60/arastance/archive/refs/heads/main.zip
!unzip /content/main.zip

- Read the data:

In [None]:
raw_train = AraStanceData("/content/arastance-main/data/train.jsonl")
raw_val = AraStanceData("/content/arastance-main/data/dev.jsonl")
raw_test = AraStanceData("/content/arastance-main/data/test.jsonl")

print(f'# training instances: {len(raw_train.stances)}')
print(f'# validation instances: {len(raw_val.stances)}')
print(f'# testing instances: {len(raw_test.stances)}')

# training instances: 2848
# validation instances: 569
# testing instances: 646


- Print an instance from the data

In [None]:
instance_no = 40
print(f"Claim text: {raw_train.claims[raw_train.article_claim[instance_no]]}")
print(f"Article text: {raw_train.articles[instance_no]}")
print(f"Stance: {raw_train.stances[instance_no]}")

Claim text: بمناسبة العام الجديد  شركة ليكزس توزع 200 سيارة مجانا
Article text: كثيرا ما تداولت صحف ومواقع إخبارية تقارير عن الهدايا التي منحتها الملكة إليزابيث الثانية (  ) ملكة بريطانيا، للعاملين لديها بمناسبة أعياد الميلاد، إلا أن صور هذه الهدايا وطبيعتها لم تكشف بشكل كامل إلا مؤخرا، وتحديدا بعد أن كشف عنها أحد جامعي التذكارات الملكية، ويدعى إيان شابيرو ( )، ويمتلك إيان مجموعة من الهدايا الملكية التي قدمتها الملكة للعاملين لديها، وتتضمن وعاء أنيق من الكريستال وطاقم عبوات الملح والفلفل الخاصة بالمائدة وإطار صور أنيق يحمل صورة رسمية للملكة التقطت بمناسبة عيد ميلادها الثمانين. مجموعة هدايا الملكة للعاملين لديها منذ عام 2002 طبقا لما نشره موقع   فإن مجموعة الهدايا الملكية بمناسبة أعياد الميلاد والتي قام باقتنائها أيان شابيرو، تتضمن مجموعة من الهدايا الشخصية التي قدمتها الملكة لعدد من العاملين لديها في قصر باكنغهام وقلعة وندسور بمناسبة أعياد الميلاد خلال الفترة ما بين عامي 2002-2015، إلى جانب عدد من الهدايا التي اعتادت ملكة بريطانيا تقديمها لجميع العاملين لديها في كل عام مثل بودنج أعياد 

- Thus, the instances are triplets, Claim/Article/Stance.
- Note that the original language of the data is Arabic.

## Dataset

In [None]:
batch_size = 32
sequence_length = 512
checkpoint = 'UBC-NLP/ARBERT'

- Download the tokenizer:

In [None]:
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
def tokenize(instance):
  return tokenizer(instance[0], instance[1], truncation=True, padding='max_length', max_length=sequence_length)

In [None]:
class CustomDataset(Dataset):
  def __init__(self, encodings, labels):
    self.encodings = encodings
    self.labels = labels

  def __len__(self):
    return len(self.labels)

  def __getitem__(self, idx):
    item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
    item['labels'] = torch.tensor(self.labels[idx])
    return item

- Train dataloader:

In [None]:
train_labels = [stance_to_int[stance] for stance in raw_train.stances]
train_claims = list(map(raw_train.claims.__getitem__, raw_train.article_claim))
train_encodings = tokenize((train_claims, raw_train.articles))
train_dataset = CustomDataset(train_encodings, train_labels)
train_dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

- Val dataloader

In [None]:
val_labels = [stance_to_int[stance] for stance in raw_val.stances]
val_claims = list(map(raw_val.claims.__getitem__, raw_val.article_claim))
val_encodings = tokenize((val_claims, raw_val.articles))
val_dataset = CustomDataset(val_encodings, val_labels)
val_dataloader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)

- Test dataloader

In [None]:
test_labels = [stance_to_int[stance] for stance in raw_test.stances]
test_claims = list(map(raw_test.claims.__getitem__, raw_test.article_claim))
test_encodings = tokenize((test_claims, raw_test.articles))
test_dataset = CustomDataset(test_encodings, test_labels)
test_dataloader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

- Check a batch from the data:

In [None]:
for batch in test_dataloader:
  break
print({k: v.shape for k,v in batch.items()})

{'input_ids': torch.Size([32, 512]), 'token_type_ids': torch.Size([32, 512]), 'attention_mask': torch.Size([32, 512]), 'labels': torch.Size([32])}


## Model

- Download the model:

In [None]:
model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=4, torch_dtype="auto")

pytorch_model.bin:   0%|          | 0.00/654M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at UBC-NLP/ARBERT and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


- Put the model on GPU:

In [None]:
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
model.to(device)
print(device)

cuda


## Training

In [None]:
def train_loop(dataloader, model, optimizer):
  running_loss, running_corrects = 0, 0

  progress_bar = tqdm(range(len(dataloader)))
  model.train()
  for batch in dataloader:
    batch = {k: v.to(device) for k, v in batch.items()}
    outputs = model(**batch)
    loss = outputs.loss
    loss.backward()

    optimizer.step()
    optimizer.zero_grad()
    progress_bar.update(1)

    logits = outputs.logits
    predictions = torch.argmax(logits, dim=-1)

    running_loss += loss.item() * len(batch["labels"])
    running_corrects += torch.sum(predictions == batch["labels"])

  final_loss = running_loss / len(dataloader.dataset)
  accuracy = running_corrects / len(dataloader.dataset)
  return final_loss, accuracy

In [None]:
def test_loop(dataloader, model):
  running_loss, running_corrects = 0, 0
  all_preds = []

  model.eval()
  for batch in dataloader:
    batch = {k: v.to(device) for k, v in batch.items()}
    with torch.no_grad():
      outputs = model(**batch)
      loss = outputs.loss

    logits = outputs.logits
    predictions = torch.argmax(logits, dim=-1)

    running_loss += loss.item() * len(batch["labels"])
    running_corrects += torch.sum(predictions == batch["labels"])

    all_preds = np.concatenate((all_preds, predictions.cpu().numpy()))

  final_loss = running_loss / len(dataloader.dataset)
  accuracy = running_corrects / len(dataloader.dataset)
  f1score = f1_score(dataloader.dataset.labels, all_preds, average=None)
  mf1score = f1_score(dataloader.dataset.labels, all_preds, average='macro')

  return final_loss, accuracy, f1score, mf1score

In [None]:
class EarlyStopping:
  def __init__(self, patience=1, min_delta=0):
    self.patience = patience
    self.min_delta = min_delta
    self.counter = 0
    self.min_validation_loss = float('inf')

  def __call__(self, validation_loss):
    if validation_loss < self.min_validation_loss:
      self.min_validation_loss = validation_loss
      self.counter = 0
    elif validation_loss > (self.min_validation_loss + self.min_delta):
      self.counter += 1
      if self.counter >= self.patience:
        return True
    return False

In [None]:
epochs = 25
optimizer = AdamW(model.parameters(), lr=2e-5)
early_stopping = EarlyStopping(patience=3)

for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loss, train_accuracy = train_loop(train_dataloader, model, optimizer)
    val_loss, val_accuracy, _, _ = test_loop(val_dataloader, model)

    print(f"Train_loss: {train_loss:.3f}, Train_acc: {train_accuracy:.3f}",
          f"Val_loss: {val_loss:.3f}, Val_accuracy: {val_accuracy:.3f}",
          "\n-------------------------------")

    if early_stopping(val_loss):
      print("Early stopping!")
      break

Epoch 1
-------------------------------


  0%|          | 0/89 [00:00<?, ?it/s]

Train_loss: 0.773, Train_acc: 0.717 Val_loss: 0.680, Val_accuracy: 0.757 
-------------------------------
Epoch 2
-------------------------------


  0%|          | 0/89 [00:00<?, ?it/s]

Train_loss: 0.417, Train_acc: 0.858 Val_loss: 0.573, Val_accuracy: 0.805 
-------------------------------
Epoch 3
-------------------------------


  0%|          | 0/89 [00:00<?, ?it/s]

Train_loss: 0.265, Train_acc: 0.916 Val_loss: 0.644, Val_accuracy: 0.801 
-------------------------------
Epoch 4
-------------------------------


  0%|          | 0/89 [00:00<?, ?it/s]

Train_loss: 0.178, Train_acc: 0.940 Val_loss: 0.593, Val_accuracy: 0.826 
-------------------------------
Epoch 5
-------------------------------


  0%|          | 0/89 [00:00<?, ?it/s]

Train_loss: 0.138, Train_acc: 0.952 Val_loss: 0.582, Val_accuracy: 0.849 
-------------------------------
Early stopping!


## Evaluation

In [None]:
_, val_accuracy, val_f1score, val_mf1score = test_loop(val_dataloader, model)

print("Validation Resutls:")
print("=====================")
print(f"Accuracy: {val_accuracy:.3f}")
agree, disagree, discuss, unrelated = val_f1score
print("Per Class F1 scores:")
print(f"Agree   : {agree:.3f}")
print(f"Disagree: {disagree:.3f}")
print(f"Discuss : {discuss:.3f}")
print(f"Unrelated: {unrelated:.3f}")
print(f"Macro F1 scores: {val_mf1score:.3f}")

Validation Resutls:
Accuracy: 0.849
Per Class F1 scores:
Agree   : 0.837
Disagree: 0.806
Discuss : 0.596
Unrelated: 0.924
Macro F1 scores: 0.791


In [None]:
_, test_accuracy, test_f1score, test_mf1score = test_loop(test_dataloader, model)

print("Testing Resutls:")
print("=====================")
print(f"Accuracy: {test_accuracy:.3f}")
agree, disagree, discuss, unrelated = test_f1score
print("Per Class F1 scores:")
print(f"Agree   : {agree:.3f}")
print(f"Disagree: {disagree:.3f}")
print(f"Discuss : {discuss:.3f}")
print(f"Unrelated: {unrelated:.3f}")
print(f"Macro F1 scores: {test_mf1score:.3f}")

Testing Resutls:
Accuracy: 0.873
Per Class F1 scores:
Agree   : 0.875
Disagree: 0.764
Discuss : 0.548
Unrelated: 0.946
Macro F1 scores: 0.783
