# Stance Detection Using MBERT + CNN_BiLSTM

`Multi-lingual Bidirectional Encoder Representations from Transformers` `Convolutional Neural Networks` `Bidirectional Long Short-Term Memory`  
`AraStance Dataset`  
`Stance Detection` `Arabic Language` `Transformer Architecture`


---

In this notebook, we rely on the features extracted from the last layer of the multi-lingual version of BERT to classify the stances of the articles in the AraStance dataset. The dataset was introduced in the paper:
```
AraStance: A Multi-Country and Multi-Domain Dataset of Arabic Stance Detection for Fact Checking.
```
The model was introduced in the paper:
```
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
```

## Packages

In [None]:
!pip install transformers

In [None]:
import torch
import numpy as np
from utils import *
from tqdm.auto import tqdm
from torch.optim import AdamW
from sklearn.metrics import f1_score
from torch.utils.data import Dataset, DataLoader
from transformers import AutoTokenizer, AutoModel

## Raw data

- Download the raw data:

In [None]:
!wget https://github.com/Tariq60/arastance/archive/refs/heads/main.zip
!unzip /content/main.zip

- Read the data:

In [None]:
raw_train = AraStanceData("/content/arastance-main/data/train.jsonl")
raw_val = AraStanceData("/content/arastance-main/data/dev.jsonl")
raw_test = AraStanceData("/content/arastance-main/data/test.jsonl")

print(f'# training instances: {len(raw_train.stances)}')
print(f'# validation instances: {len(raw_val.stances)}')
print(f'# testing instances: {len(raw_test.stances)}')

# training instances: 2848
# validation instances: 569
# testing instances: 646


- Print an instance from the data

In [None]:
instance_no = 40
print(f"Claim text: {raw_train.claims[raw_train.article_claim[instance_no]]}")
print(f"Article text: {raw_train.articles[instance_no]}")
print(f"Stance: {raw_train.stances[instance_no]}")

Claim text: بمناسبة العام الجديد  شركة ليكزس توزع 200 سيارة مجانا
Article text: كثيرا ما تداولت صحف ومواقع إخبارية تقارير عن الهدايا التي منحتها الملكة إليزابيث الثانية (  ) ملكة بريطانيا، للعاملين لديها بمناسبة أعياد الميلاد، إلا أن صور هذه الهدايا وطبيعتها لم تكشف بشكل كامل إلا مؤخرا، وتحديدا بعد أن كشف عنها أحد جامعي التذكارات الملكية، ويدعى إيان شابيرو ( )، ويمتلك إيان مجموعة من الهدايا الملكية التي قدمتها الملكة للعاملين لديها، وتتضمن وعاء أنيق من الكريستال وطاقم عبوات الملح والفلفل الخاصة بالمائدة وإطار صور أنيق يحمل صورة رسمية للملكة التقطت بمناسبة عيد ميلادها الثمانين. مجموعة هدايا الملكة للعاملين لديها منذ عام 2002 طبقا لما نشره موقع   فإن مجموعة الهدايا الملكية بمناسبة أعياد الميلاد والتي قام باقتنائها أيان شابيرو، تتضمن مجموعة من الهدايا الشخصية التي قدمتها الملكة لعدد من العاملين لديها في قصر باكنغهام وقلعة وندسور بمناسبة أعياد الميلاد خلال الفترة ما بين عامي 2002-2015، إلى جانب عدد من الهدايا التي اعتادت ملكة بريطانيا تقديمها لجميع العاملين لديها في كل عام مثل بودنج أعياد 

- Thus, the instances are triplets, Claim/Article/Stance.
- Note that the original language of the data is Arabic.

## Dataset

In [None]:
batch_size = 64
sequence_length = 512
checkpoint = 'bert-base-multilingual-cased'

- Download the tokenizer:

In [None]:
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
def tokenize(instance):
  return tokenizer(instance[0], instance[1], truncation=True, padding='max_length', max_length=sequence_length)

In [None]:
class CustomDataset(Dataset):
  def __init__(self, encodings, labels):
    self.encodings = encodings
    self.labels = labels

  def __len__(self):
    return len(self.labels)

  def __getitem__(self, idx):
    item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
    item['labels'] = torch.tensor(self.labels[idx])
    return item

- Train dataloader:

In [None]:
train_labels = [stance_to_int[stance] for stance in raw_train.stances]
train_claims = list(map(raw_train.claims.__getitem__, raw_train.article_claim))
train_encodings = tokenize((train_claims, raw_train.articles))
train_dataset = CustomDataset(train_encodings, train_labels)
train_dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

- Val dataloader

In [None]:
val_labels = [stance_to_int[stance] for stance in raw_val.stances]
val_claims = list(map(raw_val.claims.__getitem__, raw_val.article_claim))
val_encodings = tokenize((val_claims, raw_val.articles))
val_dataset = CustomDataset(val_encodings, val_labels)
val_dataloader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)

- Test dataloader

In [None]:
test_labels = [stance_to_int[stance] for stance in raw_test.stances]
test_claims = list(map(raw_test.claims.__getitem__, raw_test.article_claim))
test_encodings = tokenize((test_claims, raw_test.articles))
test_dataset = CustomDataset(test_encodings, test_labels)
test_dataloader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

- Check a batch from the data:

In [None]:
for batch in test_dataloader:
  break
print({k: v.shape for k,v in batch.items()})

{'input_ids': torch.Size([64, 512]), 'token_type_ids': torch.Size([64, 512]), 'attention_mask': torch.Size([64, 512]), 'labels': torch.Size([64])}


## Model

- Download the model:

In [None]:
pretrained_model = AutoModel.from_pretrained(checkpoint, output_hidden_states=True, torch_dtype="auto")

model.safetensors:   0%|          | 0.00/714M [00:00<?, ?B/s]

In [None]:
class Model(torch.nn.Module):
  def __init__(self, pretrained_model):
    super(Model, self).__init__()
    self.pretrained_model = pretrained_model
    for param in self.pretrained_model.parameters():
      param.requires_grad = False
    self.conv1 = torch.nn.Conv1d(in_channels=768, out_channels=100, kernel_size=2)
    self.conv2 = torch.nn.Conv1d(in_channels=768, out_channels=100, kernel_size=3)
    self.conv3 = torch.nn.Conv1d(in_channels=768, out_channels=100, kernel_size=4)
    self.bilstm = torch.nn.LSTM(input_size=sequence_length*3-6, hidden_size=32, num_layers=1,
                               batch_first=True, bidirectional=True)
    self.dropout = torch.nn.Dropout(p=0.2)
    self.classifier = torch.nn.Linear(in_features=64, out_features=4)

  def forward(self, inputs):
    encoder_outputs = self.pretrained_model(**inputs)['hidden_states'][12]
    encoder_outputs = encoder_outputs.permute(0, 2, 1)
    conv1_outputs = self.conv1(encoder_outputs)
    conv2_outputs = self.conv2(encoder_outputs)
    conv3_outputs = self.conv3(encoder_outputs)
    concat_outputs = torch.cat((conv1_outputs, conv2_outputs, conv3_outputs), dim=2)
    _, (h, _) = self.bilstm(concat_outputs)
    h = h.permute(1,0,2)
    h = h.reshape(h.shape[0], -1)
    h = self.dropout(h)
    logits = self.classifier(h)
    return logits

- Create a model instance and put it on the GPU:

In [None]:
model = Model(pretrained_model)
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
model.to(device)
print(device)

cuda


- Print Model trainable parameters:

In [None]:
for param in model.parameters():
  if param.requires_grad:
    print(param.size())

torch.Size([100, 768, 2])
torch.Size([100])
torch.Size([100, 768, 3])
torch.Size([100])
torch.Size([100, 768, 4])
torch.Size([100])
torch.Size([128, 1530])
torch.Size([128, 32])
torch.Size([128])
torch.Size([128])
torch.Size([128, 1530])
torch.Size([128, 32])
torch.Size([128])
torch.Size([128])
torch.Size([4, 64])
torch.Size([4])


## Training

In [None]:
def train_loop(dataloader, model, loss_fn, optimizer):
  running_loss, running_corrects = 0, 0

  progress_bar = tqdm(range(len(dataloader)))
  model.train()
  for batch in dataloader:
    input_ids, token_type_ids, attention_mask, labels = batch.values()
    labels = labels.to(device)
    inputs = {'input_ids': input_ids.to(device),
              'token_type_ids': token_type_ids.to(device),
              'attention_mask': attention_mask.to(device)}
    logits = model(inputs)
    loss = loss_fn(logits, labels)

    loss.backward()
    optimizer.step()
    optimizer.zero_grad()
    progress_bar.update(1)

    predictions = torch.argmax(logits, dim=-1)

    running_loss += loss.item() * len(labels)
    running_corrects += torch.sum(predictions == labels)

  final_loss = running_loss / len(dataloader.dataset)
  accuracy = running_corrects / len(dataloader.dataset)
  return final_loss, accuracy

In [None]:
def test_loop(dataloader, model, loss_fn):
  running_loss, running_corrects = 0, 0
  all_preds = []

  model.eval()
  for batch in dataloader:
    input_ids, token_type_ids, attention_mask, labels = batch.values()
    labels = labels.to(device)
    inputs = {'input_ids': input_ids.to(device),
              'token_type_ids': token_type_ids.to(device),
              'attention_mask': attention_mask.to(device)}
    with torch.no_grad():
      logits = model(inputs)
      loss = loss_fn(logits, labels)

    predictions = torch.argmax(logits, dim=-1)

    running_loss += loss.item() * len(labels)
    running_corrects += torch.sum(predictions == labels)

    all_preds = np.concatenate((all_preds, predictions.cpu().numpy()))

  final_loss = running_loss / len(dataloader.dataset)
  accuracy = running_corrects / len(dataloader.dataset)
  f1score = f1_score(dataloader.dataset.labels, all_preds, average=None)
  mf1score = f1_score(dataloader.dataset.labels, all_preds, average='macro')

  return final_loss, accuracy, f1score, mf1score

In [None]:
epochs = 12
lr = 3e-4
loss_fn = torch.nn.CrossEntropyLoss()
optimizer = AdamW(model.parameters(), lr=lr)

for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loss, train_accuracy = train_loop(train_dataloader, model, loss_fn, optimizer)
    val_loss, val_accuracy, _, _ = test_loop(val_dataloader, model, loss_fn)

    print(f"Train_loss: {train_loss:.3f}, Train_acc: {train_accuracy:.3f}",
          f"Val_loss: {val_loss:.3f}, Val_accuracy: {val_accuracy:.3f}",
          "\n-------------------------------")

Epoch 1
-------------------------------


  0%|          | 0/45 [00:00<?, ?it/s]

Train_loss: 1.143, Train_acc: 0.541 Val_loss: 1.185, Val_accuracy: 0.517 
-------------------------------
Epoch 2
-------------------------------


  0%|          | 0/45 [00:00<?, ?it/s]

Train_loss: 1.072, Train_acc: 0.567 Val_loss: 1.144, Val_accuracy: 0.527 
-------------------------------
Epoch 3
-------------------------------


  0%|          | 0/45 [00:00<?, ?it/s]

Train_loss: 0.960, Train_acc: 0.623 Val_loss: 1.056, Val_accuracy: 0.582 
-------------------------------
Epoch 4
-------------------------------


  0%|          | 0/45 [00:00<?, ?it/s]

Train_loss: 0.803, Train_acc: 0.713 Val_loss: 0.944, Val_accuracy: 0.634 
-------------------------------
Epoch 5
-------------------------------


  0%|          | 0/45 [00:00<?, ?it/s]

Train_loss: 0.662, Train_acc: 0.771 Val_loss: 0.845, Val_accuracy: 0.694 
-------------------------------
Epoch 6
-------------------------------


  0%|          | 0/45 [00:00<?, ?it/s]

Train_loss: 0.552, Train_acc: 0.823 Val_loss: 0.788, Val_accuracy: 0.714 
-------------------------------
Epoch 7
-------------------------------


  0%|          | 0/45 [00:00<?, ?it/s]

Train_loss: 0.452, Train_acc: 0.862 Val_loss: 0.756, Val_accuracy: 0.722 
-------------------------------
Epoch 8
-------------------------------


  0%|          | 0/45 [00:00<?, ?it/s]

Train_loss: 0.374, Train_acc: 0.889 Val_loss: 0.692, Val_accuracy: 0.747 
-------------------------------
Epoch 9
-------------------------------


  0%|          | 0/45 [00:00<?, ?it/s]

Train_loss: 0.314, Train_acc: 0.911 Val_loss: 0.671, Val_accuracy: 0.772 
-------------------------------
Epoch 10
-------------------------------


  0%|          | 0/45 [00:00<?, ?it/s]

Train_loss: 0.248, Train_acc: 0.936 Val_loss: 0.701, Val_accuracy: 0.752 
-------------------------------
Epoch 11
-------------------------------


  0%|          | 0/45 [00:00<?, ?it/s]

Train_loss: 0.210, Train_acc: 0.953 Val_loss: 0.687, Val_accuracy: 0.779 
-------------------------------
Epoch 12
-------------------------------


  0%|          | 0/45 [00:00<?, ?it/s]

Train_loss: 0.173, Train_acc: 0.961 Val_loss: 0.687, Val_accuracy: 0.773 
-------------------------------


## Evaluation

In [None]:
_, val_accuracy, val_f1score, val_mf1score = test_loop(val_dataloader, model, loss_fn)

print("Validation Resutls:")
print("=====================")
print(f"Accuracy: {val_accuracy:.3f}")
agree, disagree, discuss, unrelated = val_f1score
print("Per Class F1 scores:")
print(f"Agree   : {agree:.3f}")
print(f"Disagree: {disagree:.3f}")
print(f"Discuss : {discuss:.3f}")
print(f"Unrelated: {unrelated:.3f}")
print(f"Macro F1 scores: {val_mf1score:.3f}")

Validation Resutls:
Accuracy: 0.773
Per Class F1 scores:
Agree   : 0.754
Disagree: 0.667
Discuss : 0.500
Unrelated: 0.849
Macro F1 scores: 0.693


In [None]:
_, test_accuracy, test_f1score, test_mf1score = test_loop(test_dataloader, model, loss_fn)

print("Testing Resutls:")
print("=====================")
print(f"Accuracy: {test_accuracy:.3f}")
agree, disagree, discuss, unrelated = test_f1score
print("Per Class F1 scores:")
print(f"Agree   : {agree:.3f}")
print(f"Disagree: {disagree:.3f}")
print(f"Discuss : {discuss:.3f}")
print(f"Unrelated: {unrelated:.3f}")
print(f"Macro F1 scores: {test_mf1score:.3f}")

Testing Resutls:
Accuracy: 0.833
Per Class F1 scores:
Agree   : 0.828
Disagree: 0.702
Discuss : 0.396
Unrelated: 0.911
Macro F1 scores: 0.709
