# AI for Sustainable Development – Coursework 2
## HEARTS Text Stereotype Detection – Reproduction & Adaptation

## Project Information
- **Project:** HEARTS (Holistic Explainable and Robust Text Stereotype Detection)
- **Dataset:** EMGSD (holistic-ai/EMGSD)
- **Goal:** Reproduce original methodology + adapt to Chinese context

# Part A – Technical Implementation (70%)

1. Reproduce baseline HEARTS AI methodology  
2. Select a Chinese-context SDG problem  
3. Build/curate a new dataset  
4. Adapt model to new context  
5. Evaluate & compare results  

## 1. Reproducing the Baseline AI Methodology

The HEARTS project trains stereotype-detection models on the EMGSD dataset using
fine-tuned ALBERT-v2, DistilBERT, and BERT.  
Key components:
- EMGSD dataset (HuggingFace)
- Fine-tuned ALBERT-v2 (baseline)
- SHAP + LIME explainability pipeline
- Token-level explanation similarity metrics:
  - Cosine similarity
  - Pearson correlation
  - Jensen–Shannon divergence

### 1.1 Clone the original repository

The original HEARTS repository is located at:

https://github.com/holistic-ai/HEARTS-Text-Stereotype-Detection

In your local environment:

```bash
git clone https://github.com/holistic-ai/HEARTS-Text-Stereotype-Detection.git
pip install -r requirements.txt

### 1.2 Load the EMGSD Dataset from HuggingFace
We directly load the dataset in Jupyter using the `datasets` library.

In [6]:
from datasets import load_dataset
dataset = load_dataset("holistic-ai/EMGSD")

In [7]:
dataset["train"][0]

{'stereotype_type': 'profession',
 'text': 'The illustrator had tower hands.',
 'text_with_marker': 'The illustrator had ===tower=== hands.',
 'category': 'unrelated',
 'data_source': 'stereoset_intrasentence',
 'label': 'unrelated'}

#### Data Structure Summary

- `text`: input sentence
- `label`: 1 = stereotype, 0 = neutral
- Splits available: train / validation / test

### 1.3 Data Preprocessing Overview

Before training the ALBERT-v2 baseline model, the EMGSD dataset must be
processed into a format suitable for the HEARTS framework.  
The preprocessing stage includes:

- Loading raw text and original labels  
- Applying the ALBERT-v2 tokenizer to convert text into token IDs  
- Padding/truncating each sample to a fixed length of 128 tokens  
- Mapping the original string labels to binary labels:
  - stereotype / related → 1  
  - neutral / unrelated → 0  

These steps prepare the dataset for fine-tuning the baseline model.

In [8]:
from collections import Counter

# 查看每个 split 的样本数
for split in dataset.keys():
    print(f"{split} size:", len(dataset[split]))

print("-" * 40)

# 查看每个 split 的标签分布
for split in dataset.keys():
    labels = [ex["label"] for ex in dataset[split]]
    print(f"{split} label distribution:", Counter(labels))

train size: 45760
test size: 11441
----------------------------------------
train label distribution: Counter({'unrelated': 14992, 'neutral_nationality': 6942, 'stereotype_nationality': 6795, 'stereotype_profession': 5232, 'neutral_profession': 5186, 'stereotype_gender': 1709, 'neutral_gender': 1690, 'stereotype_lgbtq+': 885, 'neutral_lgbtq+': 842, 'stereotype_religion': 589, 'neutral_religion': 481, 'stereotype_race': 387, 'neutral_race': 30})
test label distribution: Counter({'unrelated': 3781, 'stereotype_nationality': 1756, 'neutral_nationality': 1609, 'neutral_profession': 1284, 'stereotype_profession': 1238, 'stereotype_gender': 469, 'neutral_gender': 432, 'neutral_lgbtq+': 246, 'stereotype_lgbtq+': 203, 'neutral_religion': 170, 'stereotype_religion': 154, 'stereotype_race': 86, 'neutral_race': 13})


### 1.3.1 Tokenisation and Label Encoding (BERT-base-uncased)

In this step, the EMGSD dataset is converted into the format required by the
BERT-base-uncased baseline model.

Steps:
- Load the **BERT-base-uncased tokenizer** from HuggingFace.
- Tokenise each sentence with truncation and padding (`max_length = 128`).
- Add `input_ids` and `attention_mask` fields to the dataset.
- Map the original string labels to binary numerical labels:
  - stereotype / related → 1
  - neutral / unrelated → 0

The resulting `tokenized_dataset` will be used for fine-tuning the baseline
classifier.

In [9]:
from transformers import BertTokenizer
# 如果你后面用 DistilBERT，就改成：
# from transformers import DistilBertTokenizer
# tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-uncased")

# 这里用 BERT 为例：
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

def preprocess_batch(batch):
    # 1) 文本 -> token
    encodings = tokenizer(
        batch["text"],
        truncation=True,
        padding="max_length",
        max_length=128,
    )

    # 2) 字符串标签 -> 二分类 0/1
    labels = []
    for l in batch["label"]:
        if l is None:
            # 极少数缺失值，直接当非刻板印象
            labels.append(0)
        elif isinstance(l, str):
            # 所有 stereotype 开头的统统归为 1
            if l.startswith("stereotype"):
                labels.append(1)
            elif l == "related":
                labels.append(1)
            else:
                # neutral / unrelated / 其他全部归为 0
                labels.append(0)
        else:
            # 防御式写法，意外类型也归 0
            labels.append(0)

    encodings["labels"] = labels
    return encodings

# 3) 对所有 split 应用预处理，生成 tokenized_dataset
tokenized_dataset = dataset.map(preprocess_batch, batched=True)

# 4) 只保留训练需要的字段，并设置为 torch 格式
keep_cols = ["input_ids", "attention_mask", "labels"]
remove_cols = [c for c in tokenized_dataset["train"].column_names if c not in keep_cols]
tokenized_dataset = tokenized_dataset.remove_columns(remove_cols)
tokenized_dataset.set_format("torch")

tokenized_dataset



DatasetDict({
    train: Dataset({
        features: ['input_ids', 'attention_mask', 'labels'],
        num_rows: 45760
    })
    test: Dataset({
        features: ['input_ids', 'attention_mask', 'labels'],
        num_rows: 11441
    })
})

### 1.4 Baseline Fine-tuning (BERT-base-uncased)

Since the local environment does not reliably support SentencePiece, the
baseline model is reproduced using **BERT-base-uncased**, which is also one of
the transformer models evaluated in the HEARTS methodology.

Training setup:
- Model: BERT-base-uncased
- Learning rate: 2e-5  
- Batch size: 16  
- Epochs: 3  
- Weight decay: 0.01  
- Max sequence length: 128  
- Evaluation metrics: accuracy, precision, recall, F1  

The HuggingFace `Trainer` API is used to fine-tune the model on the
tokenised EMGSD dataset and to evaluate performance on the test split.

In [None]:
from transformers import BertForSequenceClassification, TrainingArguments, Trainer
import numpy as np
from datasets import load_metric

# 1. 加载 BERT 分类模型
model = BertForSequenceClassification.from_pretrained(
    "bert-base-uncased",
    num_labels=2
)

# 2. 定义评估指标
accuracy_metric = load_metric("accuracy")
precision_metric = load_metric("precision")
recall_metric = load_metric("recall")
f1_metric = load_metric("f1")

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    preds = np.argmax(logits, axis=1)

    return {
        "accuracy": accuracy_metric.compute(predictions=preds, references=labels)["accuracy"],
        "precision": precision_metric.compute(predictions=preds, references=labels)["precision"],
        "recall": recall_metric.compute(predictions=preds, references=labels)["recall"],
        "f1": f1_metric.compute(predictions=preds, references=labels)["f1"],
    }

# 3. 训练参数
training_args = TrainingArguments(
    output_dir="./bert-baseline",
    evaluation_strategy="epoch",
    save_strategy="no",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    weight_decay=0.01,
    logging_steps=50,
    report_to="none"   # 禁止 wandb / tensorboard / comet 等所有 loggers
)

# 4. 创建 Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["validation"] if "validation" in tokenized_dataset else tokenized_dataset["train"].select(range(500)),
    compute_metrics=compute_metrics,
)

# 5. 训练
trainer.train()

# 6. 在测试集上评估
test_results = trainer.evaluate(tokenized_dataset["test"])
test_results


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.
You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.
You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.
You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_re

Epoch,Training Loss,Validation Loss


### Baseline Results

| Metric | Paper Result | Your Result |
|--------|--------------|-------------|
| Accuracy | | |
| Precision | | |
| Recall | | |
| F1 | | |

# 2. Contextualising the AI Method (China)

We choose a Chinese stereotype detection problem aligned with **SDG 10: Reduced Inequalities**.

Possible target groups:
- Occupations (外卖员、工程师、教师)
- Regional stereotypes（地域黑）
- Gender stereotypes
- Social groups

Ethical considerations:
- Remove personal identifiers  
- Avoid sensitive political content  
- Ensure dataset transparency  

# 3. New Dataset for Chinese Context

### Data Sources
- 微博公开评论  
- Bilibili 评论  
- 新闻标题  
- 小红书短文片段  

### Preprocessing
- Remove usernames, links  
- Chinese segmentation (jieba)  
- Manual labeling of stereotype vs non-stereotype  
- Train/val/test = 8/1/1  

# 4. Model Adaptation (Chinese)

We replace ALBERT-v2 with Chinese BERT models:
- bert-base-chinese
- hfl/chinese-roberta-wwm-ext

Train using:
- lr = 2e-5  
- batch size = 16  
- epochs = 3–5  