# Stance Detection Using MBERT + BiLSTM

`Multi-lingual Bidirectional Encoder Representations from Transformers` `Bidirectional Long Short-Term Memory`  
`AraStance Dataset`  
`Stance Detection` `Arabic Language` `Transformer Architecture`


---

In this notebook, we rely on the features extracted from the last four layers of the multi-lingual version of BERT to classify the stances of the articles in the AraStance dataset. The dataset was introduced in the paper:
```
AraStance: A Multi-Country and Multi-Domain Dataset of Arabic Stance Detection for Fact Checking.
```
The model was introduced in the paper:
```
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
```

## Packages

In [None]:
!pip install transformers

In [None]:
import torch
import numpy as np
from utils import *
from tqdm.auto import tqdm
from torch.optim import AdamW
from sklearn.metrics import f1_score
from torch.utils.data import Dataset, DataLoader
from transformers import AutoTokenizer, AutoModel

## Raw data

- Download the raw data:

In [None]:
!wget https://github.com/Tariq60/arastance/archive/refs/heads/main.zip
!unzip /content/main.zip

- Read the data:

In [None]:
raw_train = AraStanceData("/content/arastance-main/data/train.jsonl")
raw_val = AraStanceData("/content/arastance-main/data/dev.jsonl")
raw_test = AraStanceData("/content/arastance-main/data/test.jsonl")

print(f'# training instances: {len(raw_train.stances)}')
print(f'# validation instances: {len(raw_val.stances)}')
print(f'# testing instances: {len(raw_test.stances)}')

# training instances: 2848
# validation instances: 569
# testing instances: 646


- Print an instance from the data

In [None]:
instance_no = 40
print(f"Claim text: {raw_train.claims[raw_train.article_claim[instance_no]]}")
print(f"Article text: {raw_train.articles[instance_no]}")
print(f"Stance: {raw_train.stances[instance_no]}")

Claim text: بمناسبة العام الجديد  شركة ليكزس توزع 200 سيارة مجانا
Article text: كثيرا ما تداولت صحف ومواقع إخبارية تقارير عن الهدايا التي منحتها الملكة إليزابيث الثانية (  ) ملكة بريطانيا، للعاملين لديها بمناسبة أعياد الميلاد، إلا أن صور هذه الهدايا وطبيعتها لم تكشف بشكل كامل إلا مؤخرا، وتحديدا بعد أن كشف عنها أحد جامعي التذكارات الملكية، ويدعى إيان شابيرو ( )، ويمتلك إيان مجموعة من الهدايا الملكية التي قدمتها الملكة للعاملين لديها، وتتضمن وعاء أنيق من الكريستال وطاقم عبوات الملح والفلفل الخاصة بالمائدة وإطار صور أنيق يحمل صورة رسمية للملكة التقطت بمناسبة عيد ميلادها الثمانين. مجموعة هدايا الملكة للعاملين لديها منذ عام 2002 طبقا لما نشره موقع   فإن مجموعة الهدايا الملكية بمناسبة أعياد الميلاد والتي قام باقتنائها أيان شابيرو، تتضمن مجموعة من الهدايا الشخصية التي قدمتها الملكة لعدد من العاملين لديها في قصر باكنغهام وقلعة وندسور بمناسبة أعياد الميلاد خلال الفترة ما بين عامي 2002-2015، إلى جانب عدد من الهدايا التي اعتادت ملكة بريطانيا تقديمها لجميع العاملين لديها في كل عام مثل بودنج أعياد 

- Thus, the instances are triplets, Claim/Article/Stance.
- Note that the original language of the data is Arabic.

## Dataset

In [None]:
batch_size = 64
sequence_length = 256
checkpoint = 'bert-base-multilingual-cased'

- Download the tokenizer:

In [None]:
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
def tokenize(instance):
  return tokenizer(instance[0], instance[1], truncation=True, padding='max_length', max_length=sequence_length)

In [None]:
class CustomDataset(Dataset):
  def __init__(self, encodings, labels):
    self.encodings = encodings
    self.labels = labels

  def __len__(self):
    return len(self.labels)

  def __getitem__(self, idx):
    item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
    item['labels'] = torch.tensor(self.labels[idx])
    return item

- Train dataloader:

In [None]:
train_labels = [stance_to_int[stance] for stance in raw_train.stances]
train_claims = list(map(raw_train.claims.__getitem__, raw_train.article_claim))
train_encodings = tokenize((train_claims, raw_train.articles))
train_dataset = CustomDataset(train_encodings, train_labels)
train_dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

- Val dataloader

In [None]:
val_labels = [stance_to_int[stance] for stance in raw_val.stances]
val_claims = list(map(raw_val.claims.__getitem__, raw_val.article_claim))
val_encodings = tokenize((val_claims, raw_val.articles))
val_dataset = CustomDataset(val_encodings, val_labels)
val_dataloader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)

- Test dataloader

In [None]:
test_labels = [stance_to_int[stance] for stance in raw_test.stances]
test_claims = list(map(raw_test.claims.__getitem__, raw_test.article_claim))
test_encodings = tokenize((test_claims, raw_test.articles))
test_dataset = CustomDataset(test_encodings, test_labels)
test_dataloader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

- Check a batch from the data:

In [None]:
for batch in test_dataloader:
  break
print({k: v.shape for k,v in batch.items()})

{'input_ids': torch.Size([64, 256]), 'token_type_ids': torch.Size([64, 256]), 'attention_mask': torch.Size([64, 256]), 'labels': torch.Size([64])}


## Model

- Download the model:

In [None]:
pretrained_model = AutoModel.from_pretrained(checkpoint, output_hidden_states=True, torch_dtype="auto")

model.safetensors:   0%|          | 0.00/714M [00:00<?, ?B/s]

In [None]:
class Model(torch.nn.Module):
  def __init__(self, pretrained_model):
    super(Model, self).__init__()
    self.pretrained_model = pretrained_model
    for param in self.pretrained_model.parameters():
      param.requires_grad = False
    self.bilstm1 = torch.nn.LSTM(input_size=768, hidden_size=32, num_layers=1,
                               batch_first=True, bidirectional=True)
    self.bilstm2 = torch.nn.LSTM(input_size=64, hidden_size=32, num_layers=1,
                               batch_first=True, bidirectional=True)
    self.classifier = torch.nn.Linear(in_features=64, out_features=4)

  def forward(self, inputs):
    encoder_outputs = self.pretrained_model(**inputs)['hidden_states']
    l12_outputs = encoder_outputs[12]
    l11_outputs = encoder_outputs[11]
    l10_outputs = encoder_outputs[10]
    l9_outputs = encoder_outputs[9]
    encoder_outputs = torch.cat((l9_outputs, l10_outputs, l11_outputs, l12_outputs), dim=1)
    sequence, (_, _) = self.bilstm1(encoder_outputs)
    _, (h, _) = self.bilstm2(sequence)
    h = h.permute(1,0,2)
    h = h.reshape(h.shape[0], -1)
    logits = self.classifier(h)
    return logits

- Create a model instance and put it on the GPU:

In [None]:
model = Model(pretrained_model)
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
model.to(device)
print(device)

cuda


- Print Model trainable parameters:

In [None]:
for param in model.parameters():
  if param.requires_grad:
    print(param.size())

torch.Size([128, 768])
torch.Size([128, 32])
torch.Size([128])
torch.Size([128])
torch.Size([128, 768])
torch.Size([128, 32])
torch.Size([128])
torch.Size([128])
torch.Size([128, 64])
torch.Size([128, 32])
torch.Size([128])
torch.Size([128])
torch.Size([128, 64])
torch.Size([128, 32])
torch.Size([128])
torch.Size([128])
torch.Size([4, 64])
torch.Size([4])


## Training

In [None]:
def train_loop(dataloader, model, loss_fn, optimizer):
  running_loss, running_corrects = 0, 0

  progress_bar = tqdm(range(len(dataloader)))
  model.train()
  for batch in dataloader:
    input_ids, token_type_ids, attention_mask, labels = batch.values()
    labels = labels.to(device)
    inputs = {'input_ids': input_ids.to(device),
              'token_type_ids': token_type_ids.to(device),
              'attention_mask': attention_mask.to(device)}
    logits = model(inputs)
    loss = loss_fn(logits, labels)

    loss.backward()
    optimizer.step()
    optimizer.zero_grad()
    progress_bar.update(1)

    predictions = torch.argmax(logits, dim=-1)

    running_loss += loss.item() * len(labels)
    running_corrects += torch.sum(predictions == labels)

  final_loss = running_loss / len(dataloader.dataset)
  accuracy = running_corrects / len(dataloader.dataset)
  return final_loss, accuracy

In [None]:
def test_loop(dataloader, model, loss_fn):
  running_loss, running_corrects = 0, 0
  all_preds = []

  model.eval()
  for batch in dataloader:
    input_ids, token_type_ids, attention_mask, labels = batch.values()
    labels = labels.to(device)
    inputs = {'input_ids': input_ids.to(device),
              'token_type_ids': token_type_ids.to(device),
              'attention_mask': attention_mask.to(device)}
    with torch.no_grad():
      logits = model(inputs)
      loss = loss_fn(logits, labels)

    predictions = torch.argmax(logits, dim=-1)

    running_loss += loss.item() * len(labels)
    running_corrects += torch.sum(predictions == labels)

    all_preds = np.concatenate((all_preds, predictions.cpu().numpy()))

  final_loss = running_loss / len(dataloader.dataset)
  accuracy = running_corrects / len(dataloader.dataset)
  f1score = f1_score(dataloader.dataset.labels, all_preds, average=None)
  mf1score = f1_score(dataloader.dataset.labels, all_preds, average='macro')

  return final_loss, accuracy, f1score, mf1score

In [None]:
epochs = 25
lr = 1e-2
loss_fn = torch.nn.CrossEntropyLoss()
optimizer = AdamW(model.parameters(), lr=lr)

for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loss, train_accuracy = train_loop(train_dataloader, model, loss_fn, optimizer)
    val_loss, val_accuracy, _, _ = test_loop(val_dataloader, model, loss_fn)

    print(f"Train_loss: {train_loss:.3f}, Train_acc: {train_accuracy:.3f}",
          f"Val_loss: {val_loss:.3f}, Val_accuracy: {val_accuracy:.3f}",
          "\n-------------------------------")

Epoch 1
-------------------------------


  0%|          | 0/45 [00:00<?, ?it/s]

Train_loss: 0.978, Train_acc: 0.620 Val_loss: 0.899, Val_accuracy: 0.659 
-------------------------------
Epoch 2
-------------------------------


  0%|          | 0/45 [00:00<?, ?it/s]

Train_loss: 0.761, Train_acc: 0.737 Val_loss: 0.868, Val_accuracy: 0.684 
-------------------------------
Epoch 3
-------------------------------


  0%|          | 0/45 [00:00<?, ?it/s]

Train_loss: 0.642, Train_acc: 0.778 Val_loss: 0.828, Val_accuracy: 0.749 
-------------------------------
Epoch 4
-------------------------------


  0%|          | 0/45 [00:00<?, ?it/s]

Train_loss: 0.570, Train_acc: 0.802 Val_loss: 0.805, Val_accuracy: 0.736 
-------------------------------
Epoch 5
-------------------------------


  0%|          | 0/45 [00:00<?, ?it/s]

Train_loss: 0.525, Train_acc: 0.810 Val_loss: 0.783, Val_accuracy: 0.712 
-------------------------------
Epoch 6
-------------------------------


  0%|          | 0/45 [00:00<?, ?it/s]

Train_loss: 0.484, Train_acc: 0.819 Val_loss: 0.698, Val_accuracy: 0.749 
-------------------------------
Epoch 7
-------------------------------


  0%|          | 0/45 [00:00<?, ?it/s]

Train_loss: 0.458, Train_acc: 0.835 Val_loss: 0.759, Val_accuracy: 0.724 
-------------------------------
Epoch 8
-------------------------------


  0%|          | 0/45 [00:00<?, ?it/s]

Train_loss: 0.425, Train_acc: 0.846 Val_loss: 0.723, Val_accuracy: 0.764 
-------------------------------
Epoch 9
-------------------------------


  0%|          | 0/45 [00:00<?, ?it/s]

Train_loss: 0.380, Train_acc: 0.854 Val_loss: 0.805, Val_accuracy: 0.731 
-------------------------------
Epoch 10
-------------------------------


  0%|          | 0/45 [00:00<?, ?it/s]

Train_loss: 0.361, Train_acc: 0.867 Val_loss: 0.778, Val_accuracy: 0.743 
-------------------------------
Epoch 11
-------------------------------


  0%|          | 0/45 [00:00<?, ?it/s]

Train_loss: 0.348, Train_acc: 0.875 Val_loss: 0.832, Val_accuracy: 0.772 
-------------------------------
Epoch 12
-------------------------------


  0%|          | 0/45 [00:00<?, ?it/s]

Train_loss: 0.320, Train_acc: 0.886 Val_loss: 0.813, Val_accuracy: 0.766 
-------------------------------
Epoch 13
-------------------------------


  0%|          | 0/45 [00:00<?, ?it/s]

Train_loss: 0.278, Train_acc: 0.899 Val_loss: 0.899, Val_accuracy: 0.731 
-------------------------------
Epoch 14
-------------------------------


  0%|          | 0/45 [00:00<?, ?it/s]

Train_loss: 0.276, Train_acc: 0.895 Val_loss: 0.809, Val_accuracy: 0.752 
-------------------------------
Epoch 15
-------------------------------


  0%|          | 0/45 [00:00<?, ?it/s]

Train_loss: 0.271, Train_acc: 0.899 Val_loss: 0.846, Val_accuracy: 0.756 
-------------------------------
Epoch 16
-------------------------------


  0%|          | 0/45 [00:00<?, ?it/s]

Train_loss: 0.238, Train_acc: 0.912 Val_loss: 0.942, Val_accuracy: 0.736 
-------------------------------
Epoch 17
-------------------------------


  0%|          | 0/45 [00:00<?, ?it/s]

Train_loss: 0.212, Train_acc: 0.920 Val_loss: 0.918, Val_accuracy: 0.770 
-------------------------------
Epoch 18
-------------------------------


  0%|          | 0/45 [00:00<?, ?it/s]

Train_loss: 0.226, Train_acc: 0.916 Val_loss: 0.896, Val_accuracy: 0.768 
-------------------------------
Epoch 19
-------------------------------


  0%|          | 0/45 [00:00<?, ?it/s]

Train_loss: 0.201, Train_acc: 0.922 Val_loss: 0.985, Val_accuracy: 0.775 
-------------------------------
Epoch 20
-------------------------------


  0%|          | 0/45 [00:00<?, ?it/s]

Train_loss: 0.208, Train_acc: 0.925 Val_loss: 0.905, Val_accuracy: 0.775 
-------------------------------
Epoch 21
-------------------------------


  0%|          | 0/45 [00:00<?, ?it/s]

Train_loss: 0.196, Train_acc: 0.928 Val_loss: 0.931, Val_accuracy: 0.754 
-------------------------------
Epoch 22
-------------------------------


  0%|          | 0/45 [00:00<?, ?it/s]

Train_loss: 0.183, Train_acc: 0.938 Val_loss: 0.935, Val_accuracy: 0.763 
-------------------------------
Epoch 23
-------------------------------


  0%|          | 0/45 [00:00<?, ?it/s]

Train_loss: 0.186, Train_acc: 0.931 Val_loss: 1.055, Val_accuracy: 0.736 
-------------------------------
Epoch 24
-------------------------------


  0%|          | 0/45 [00:00<?, ?it/s]

Train_loss: 0.184, Train_acc: 0.934 Val_loss: 0.998, Val_accuracy: 0.782 
-------------------------------
Epoch 25
-------------------------------


  0%|          | 0/45 [00:00<?, ?it/s]

Train_loss: 0.188, Train_acc: 0.931 Val_loss: 0.876, Val_accuracy: 0.793 
-------------------------------


## Evaluation

In [None]:
_, val_accuracy, val_f1score, val_mf1score = test_loop(val_dataloader, model, loss_fn)

print("Validation Resutls:")
print("=====================")
print(f"Accuracy: {val_accuracy:.3f}")
agree, disagree, discuss, unrelated = val_f1score
print("Per Class F1 scores:")
print(f"Agree   : {agree:.3f}")
print(f"Disagree: {disagree:.3f}")
print(f"Discuss : {discuss:.3f}")
print(f"Unrelated: {unrelated:.3f}")
print(f"Macro F1 scores: {val_mf1score:.3f}")

Validation Resutls:
Accuracy: 0.793
Per Class F1 scores:
Agree   : 0.789
Disagree: 0.736
Discuss : 0.432
Unrelated: 0.875
Macro F1 scores: 0.708


In [None]:
_, test_accuracy, test_f1score, test_mf1score = test_loop(test_dataloader, model, loss_fn)

print("Testing Resutls:")
print("=====================")
print(f"Accuracy: {test_accuracy:.3f}")
agree, disagree, discuss, unrelated = test_f1score
print("Per Class F1 scores:")
print(f"Agree   : {agree:.3f}")
print(f"Disagree: {disagree:.3f}")
print(f"Discuss : {discuss:.3f}")
print(f"Unrelated: {unrelated:.3f}")
print(f"Macro F1 scores: {test_mf1score:.3f}")

Testing Resutls:
Accuracy: 0.808
Per Class F1 scores:
Agree   : 0.783
Disagree: 0.714
Discuss : 0.392
Unrelated: 0.894
Macro F1 scores: 0.696
