---
---

Artificial Intelligence Master's Degree (2022 - 2023)

Natural Language Processing

# **Stance Classification for Human Value Premises**

---
---
## Abstract

The notebook finetunes `distilRoBERTa-base` model on the task of classifying the stance of an argument premise with its conclusion. For example:
```
Input:
Premise: affirmative action helps with employment equity.
Conclusion: We should end affirmative action

Prediction: against
Truth: against
```



---
---
## Table of Contents

>[Stance Classification for Human Value Premises](#folderId=1WhJ0uu3eFDQmelokM4Dk1B7KXrF9_mn_&updateTitle=true&scrollTo=HbeNjlgAIlYh)

>>[Abstract](#folderId=1WhJ0uu3eFDQmelokM4Dk1B7KXrF9_mn_&updateTitle=true&scrollTo=Sso0lMlLIuWa)

>>[Table of Contents](#folderId=1WhJ0uu3eFDQmelokM4Dk1B7KXrF9_mn_&updateTitle=true&scrollTo=UEplxkDaQGY9)

>>[Background](#folderId=1WhJ0uu3eFDQmelokM4Dk1B7KXrF9_mn_&updateTitle=true&scrollTo=yCz0ldhNI6mB)

>>[Implementation](#folderId=1WhJ0uu3eFDQmelokM4Dk1B7KXrF9_mn_&updateTitle=true&scrollTo=UwGofHSFI8y5)

>>>[Setup](#folderId=1WhJ0uu3eFDQmelokM4Dk1B7KXrF9_mn_&updateTitle=true&scrollTo=P3IBzZViI_kS)

>>>[Imports](#folderId=1WhJ0uu3eFDQmelokM4Dk1B7KXrF9_mn_&updateTitle=true&scrollTo=UghrvFDbTfLQ)

>>>[Dataset: Loading](#folderId=1WhJ0uu3eFDQmelokM4Dk1B7KXrF9_mn_&updateTitle=true&scrollTo=B10X0u6MJCFY)

>>>[Dataset: Preprocessing](#folderId=1WhJ0uu3eFDQmelokM4Dk1B7KXrF9_mn_&updateTitle=true&scrollTo=rUSnQIQGUGch)

>>>[Dataset: Tokenization](#folderId=1WhJ0uu3eFDQmelokM4Dk1B7KXrF9_mn_&updateTitle=true&scrollTo=JeB6VKP_VcZt)

>>>[Model: Creation](#folderId=1WhJ0uu3eFDQmelokM4Dk1B7KXrF9_mn_&updateTitle=true&scrollTo=Y5aTQgCWJV3K)

>>>[Model: Fine-Tuning](#folderId=1WhJ0uu3eFDQmelokM4Dk1B7KXrF9_mn_&updateTitle=true&scrollTo=XNdQyQZ4XIUS)

>>>[Inference](#folderId=1WhJ0uu3eFDQmelokM4Dk1B7KXrF9_mn_&updateTitle=true&scrollTo=Wrhn4wITbIl9)



---
---
## Implementation



### Setup


In [1]:
!pip install -q transformers[torch] datasets evaluate

### Imports

In [2]:
# Dataset
import numpy as np
import pandas as pd

In [3]:
# Machine Learning
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader, RandomSampler, SequentialSampler

from sklearn import metrics
from sklearn.metrics import classification_report, f1_score

In [21]:
# Hugging Face
import evaluate
from datasets import Dataset, DatasetDict, load_dataset, concatenate_datasets
from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments
from transformers import DataCollatorWithPadding, pipeline

In [5]:
# Dataset
import numpy as np
import pandas as pd

In [6]:
# Machine Learning
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader, RandomSampler, SequentialSampler

from sklearn import metrics
from sklearn.metrics import classification_report, f1_score

In [7]:
# Hugging Face
from datasets import Dataset, DatasetDict, load_dataset, concatenate_datasets
from transformers import AutoConfig, AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments, EvalPrediction

### Dataset: Loading

In [8]:
# Load the dataset from HF
dataset = load_dataset("webis/Touche23-ValueEval")

In [69]:
dataset

DatasetDict({
    train: Dataset({
        features: ['Argument ID', 'Conclusion', 'Stance', 'Premise', 'Labels'],
        num_rows: 5393
    })
    validation: Dataset({
        features: ['Argument ID', 'Conclusion', 'Stance', 'Premise', 'Labels'],
        num_rows: 1896
    })
    test: Dataset({
        features: ['Argument ID', 'Conclusion', 'Stance', 'Premise', 'Labels'],
        num_rows: 1576
    })
})

### Dataset: Preprocessing

In [9]:
def preprocess_dataset(dataset_dict):
    """
        1. Concatenate the "premise" and "conclusion" in a single "text" field.
        2. Binarize the "stance" into the "label" field.
    """
    def encode_stance(stance):
        return 1 if stance == "in favor of" else 0

    def preprocess_stance(example):
        premise = example["Premise"]
        conclusion = example["Conclusion"]
        example["text"] = f"Premise: {premise}\nConclusion: {conclusion}"
        example["label"] = encode_stance(example["Stance"])

        return example

    # Use the .map function to apply the preprocessing to the "Stance" field
    modified_dataset_dict = dataset_dict.map(preprocess_stance, remove_columns=["Argument ID", "Conclusion", "Stance", "Labels", "Premise"])

    return modified_dataset_dict

ds = preprocess_dataset(dataset)

In [10]:
ds

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 5393
    })
    validation: Dataset({
        features: ['text', 'label'],
        num_rows: 1896
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 1576
    })
})

### Dataset: Tokenization

In [11]:
# Define the model name and tokenizer
model_name = "distilroberta-base"
tokenizer = AutoTokenizer.from_pretrained(model_name)

In [12]:
def tokenize_batch(batch):
    inputs = tokenizer(batch["text"], truncation=True)
    inputs["label"] = batch["label"]

    return inputs

# Use the .map function to tokenize the dataset
ds_tokenized = ds.map(tokenize_batch, batched=True, remove_columns=ds["train"].column_names)

Map:   0%|          | 0/1896 [00:00<?, ? examples/s]

In [13]:
ds_tokenized

DatasetDict({
    train: Dataset({
        features: ['label', 'input_ids', 'attention_mask'],
        num_rows: 5393
    })
    validation: Dataset({
        features: ['label', 'input_ids', 'attention_mask'],
        num_rows: 1896
    })
    test: Dataset({
        features: ['label', 'input_ids', 'attention_mask'],
        num_rows: 1576
    })
})

### Model: Creation

In [14]:
# Define the id2label and label2id lists
id2label = {0: "against", 1: "in favor of"}
label2id = {"against": 0, "in favor of": 1}

In [15]:
model = AutoModelForSequenceClassification.from_pretrained(
    model_name,
    num_labels=2,
    id2label=id2label,
    label2id=label2id
    )

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.dense.weight', 'classifier.out_proj.weight', 'classifier.dense.bias', 'classifier.out_proj.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


### Model: Fine-Tuning

In [16]:
# Define the accuracy metric
accuracy = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return accuracy.compute(predictions=predictions, references=labels)

In [19]:
batch_size = 32

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

training_args = TrainingArguments(
    output_dir="stance_classifier",
    learning_rate=2e-5,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    num_train_epochs=3,
    weight_decay=0.01,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=concatenate_datasets([ds_tokenized["train"], ds_tokenized["validation"]]),
    eval_dataset=ds_tokenized["test"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

trainer.train()

You're using a RobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.378478,0.838198
2,No log,0.334427,0.862944
3,0.393500,0.361465,0.860406


TrainOutput(global_step=684, training_loss=0.34359357510393823, metrics={'train_runtime': 214.6473, 'train_samples_per_second': 101.874, 'train_steps_per_second': 3.187, 'total_flos': 547451019640152.0, 'train_loss': 0.34359357510393823, 'epoch': 3.0})

### Inference

In [56]:
def classify_text(texts):

    label_list = ["against", "in favor of"]
    device = "cuda"

    # Tokenize input texts as a batch
    inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt", max_length=128, is_split_into_words=False)

    # Move inputs to the same device as the model
    inputs = {key: val.to(device) for key, val in inputs.items()}

    # Perform inference
    with torch.no_grad():
        outputs = model(**inputs)

    # Get predicted probabilities for each class
    logits = outputs.logits
    probabilities = torch.softmax(logits, dim=1).tolist()

    # Create a list of dictionaries mapping class labels to their corresponding probabilities for each input
    results = [{label: prob for label, prob in zip(label_list, probs)} for probs in probabilities]

    # Get the predicted labels with the highest probabilities for each input
    predicted_labels = [max(result, key=result.get) for result in results]

    return predicted_labels

In [71]:
y_true = dataset["test"]["Stance"]
y_pred = classify_text(ds["test"]["text"])

In [72]:
example = ds["test"]["text"][0]
print(f"Input:\n{example}\n")

print(f"Prediction: {y_pred[0]}")
print(f"Truth: {y_true[0]}")

Input:
Premise: affirmative action helps with employment equity.
Conclusion: We should end affirmative action

Prediction: against
Truth: against
