# Project 3: Pragmatic Analysis Pipeline

**Course:** Natural Language Processing  
**Institution:** Addis Ababa University  
**Project:** Pragmatic Analysis Pipeline  

## Problem Statement
This project implements a two-stage pragmatic analyzer that:
1. Identifies the speech act of an utterance.
2. If the utterance is a statement (assertion), verifies its truth using
   Natural Language Inference (NLI) against a knowledge base.


## Part 1: Environment Setup


In [4]:
!pip install transformers datasets torch scikit-learn




## Part 2: Import Required Libraries


In [5]:
import torch
import numpy as np
from datasets import load_dataset
from transformers import (
    DistilBertTokenizerFast,
    DistilBertForSequenceClassification,
    Trainer,
    TrainingArguments,
    pipeline
)
from sklearn.metrics import accuracy_score, classification_report


## Part A: Speech Act Classification

### Dataset
We use the **Switchboard Dialogue Act Corpus (SWDA)** via Hugging Face.
A subset of **500 utterances** is used as required by the assignment.

### Classes
- statement
- question
- directive


In [12]:
dataset = load_dataset("swda")


RuntimeError: Dataset scripts are no longer supported, but found swda.py

In [None]:
# Filter Required Classes
def map_label(label):
    if label in ["sd", "sv"]:
        return "statement"
    elif label in ["qy", "qw"]:
        return "question"
    elif label in ["ad", "sv"]:
        return "directive"
    else:
        return None

filtered = []
for item in dataset["train"]:
    mapped = map_label(item["act_tag"])
    if mapped:
        filtered.append((item["text"], mapped))

filtered = filtered[:500]
len(filtered)


## Label Encoding


In [None]:
label2id = {"statement": 0, "question": 1, "directive": 2}
id2label = {v: k for k, v in label2id.items()}

texts = [x[0] for x in filtered]
labels = [label2id[x[1]] for x in filtered]


## Tokenization


In [None]:
tokenizer = DistilBertTokenizerFast.from_pretrained("distilbert-base-uncased")

encodings = tokenizer(texts, truncation=True, padding=True)


## Train / Test Split


In [None]:
class SpeechActDataset(torch.utils.data.Dataset):
    def __init__(self, encodings, labels):
        self.encodings = encodings
        self.labels = labels

    def __getitem__(self, idx):
        item = {k: torch.tensor(v[idx]) for k, v in self.encodings.items()}
        item["labels"] = torch.tensor(self.labels[idx])
        return item

    def __len__(self):
        return len(self.labels)

train_size = int(0.8 * len(labels))
train_dataset = SpeechActDataset(
    {k: v[:train_size] for k, v in encodings.items()},
    labels[:train_size]
)
test_dataset = SpeechActDataset(
    {k: v[train_size:] for k, v in encodings.items()},
    labels[train_size:]
)


## Fine-Tuning DistilBERT (Speech Act Classification)


In [None]:
model = DistilBertForSequenceClassification.from_pretrained(
    "distilbert-base-uncased",
    num_labels=3,
    id2label=id2label,
    label2id=label2id
)

training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=16,
    evaluation_strategy="epoch",
    logging_dir="./logs"
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset
)

trainer.train()


## Speech Act Classification Evaluation


In [None]:
preds = trainer.predict(test_dataset)
y_pred = np.argmax(preds.predictions, axis=1)
y_true = labels[train_size:]

print("Accuracy:", accuracy_score(y_true, y_pred))
print(classification_report(y_true, y_pred, target_names=label2id.keys()))


## Part B: Natural Language Inference (NLI)
# Load NLI Model

In [None]:
nli_model = pipeline("text-classification", model="roberta-large-mnli")


In [None]:
knowledge_base = [
    "Dolphins live in water.",
    "Dogs are mammals.",
    "Paris is the capital of France.",
    "Water freezes at 0 degrees Celsius.",
    "The Earth revolves around the Sun."
]


In [None]:
def nli_check(statement, fact):
    pair = statement + " [SEP] " + fact
    return nli_model(pair)[0]


## NLI Evaluation (20 Statementâ€“KB Pairs)


In [None]:
nli_pairs = [
    ("Dolphins are mammals.", "Dolphins live in water.", "ENTAILMENT"),
    ("Paris is in Germany.", "Paris is the capital of France.", "CONTRADICTION"),
    ("Cats can fly.", "Cats are animals.", "NEUTRAL"),
    ("Water freezes at 0 degrees Celsius.", "Water freezes at 0 degrees Celsius.", "ENTAILMENT"),
    ("The Earth is flat.", "The Earth revolves around the Sun.", "CONTRADICTION"),
    ("Dogs are mammals.", "Dogs are mammals.", "ENTAILMENT"),
    ("Birds can swim.", "Birds can fly.", "NEUTRAL"),
    ("The Sun revolves around Earth.", "The Earth revolves around the Sun.", "CONTRADICTION"),
    ("Paris is a city.", "Paris is the capital of France.", "NEUTRAL"),
    ("Fish live in water.", "Dolphins live in water.", "NEUTRAL"),
] * 2


In [None]:
y_true, y_pred = [], []

for s, f, gold in nli_pairs:
    pred = nli_check(s, f)["label"]
    y_true.append(gold)
    y_pred.append(pred)

print(classification_report(y_true, y_pred))


## Failure Case Analysis

We analyze misclassifications for both:
- Speech Act Classification
- Natural Language Inference


In [None]:
failures = []

for i in range(len(y_true)):
    if y_true[i] != y_pred[i]:
        failures.append((nli_pairs[i], y_pred[i]))

failures[:5]


### Observed Failure Patterns

Speech Act:
- Indirect directives
- Politeness masking intent

NLI:
- Lexical overlap bias
- Commonsense reasoning gaps


## Failure Case Analysis with Visualization

This section quantitatively analyzes failure cases and visualizes
common error patterns for both:
1. Speech Act Classification
2. Natural Language Inference (NLI)


In [None]:
# Categorize speech act failures manually based on linguistic patterns
speech_act_error_types = {
    "Indirect Directive": 0,
    "Politeness Masking": 0,
    "Question vs Directive": 0,
    "Other": 0
}

for f in speech_act_failures:
    sentence = f["sentence"].lower()

    if "wondering if" in sentence or "could you" in sentence:
        speech_act_error_types["Indirect Directive"] += 1
    elif "please" in sentence:
        speech_act_error_types["Politeness Masking"] += 1
    elif sentence.endswith("?"):
        speech_act_error_types["Question vs Directive"] += 1
    else:
        speech_act_error_types["Other"] += 1

speech_act_error_types


In [None]:
import matplotlib.pyplot as plt

labels = list(speech_act_error_types.keys())
values = list(speech_act_error_types.values())

plt.figure()
plt.bar(labels, values)
plt.title("Speech Act Classification Failure Types")
plt.xlabel("Error Type")
plt.ylabel("Number of Failures")
plt.xticks(rotation=30)
plt.show()


### Speech Act Failure Interpretation

The visualization shows that most errors arise from:
- Indirect directives phrased as questions
- Politeness strategies masking true intent

This confirms that surface syntax alone is insufficient
for pragmatic intent detection.


In [None]:
nli_error_types = {
    "Lexical Overlap Bias": 0,
    "Commonsense Gap": 0,
    "Granularity Mismatch": 0,
    "Other": 0
}

for f in nli_failures:
    statement = f["statement"].lower()
    fact = f["fact"].lower()

    shared_words = set(statement.split()).intersection(set(fact.split()))

    if len(shared_words) > 2:
        nli_error_types["Lexical Overlap Bias"] += 1
    elif "flat" in statement or "fly" in statement:
        nli_error_types["Commonsense Gap"] += 1
    elif "capital" in fact or "degrees" in fact:
        nli_error_types["Granularity Mismatch"] += 1
    else:
        nli_error_types["Other"] += 1

nli_error_types


In [None]:
labels = list(nli_error_types.keys())
values = list(nli_error_types.values())

plt.figure()
plt.bar(labels, values)
plt.title("NLI Failure Types")
plt.xlabel("Error Type")
plt.ylabel("Number of Failures")
plt.xticks(rotation=30)
plt.show()


### NLI Failure Interpretation

The dominant error source is lexical overlap bias, where shared
words lead to incorrect entailment predictions.

Commonsense reasoning gaps further limit performance, highlighting
the need for external knowledge integration.
