# CBDC Stance Classification

This model fine-tunes [`bilalzafar/bilalzafar/CentralBank-BERT`](https://huggingface.co/bilalzafar/bilalzafar/CentralBank-BERT), a domain-adapted BERT on central banking text, to classify **sentences about CBDCs** into one of three stance categories:

* **Anti-CBDC**
* **Pro-CBDC**
* **Wait-and-See**

---

## Training Details

* **Base checkpoint:** `bilalzafar/CentralBank-BERT`
* **Max sequence length:** 320
* **Architecture:** BERT-base + 3-way classification head

**Hyperparameters**

* Epochs: up to 8 (early stopped at 6)
* Batch size: 16 (train & eval)
* Learning rate: 2e-5
* Weight decay: 0.01
* Warmup ratio: 0.06
* Optimizer: AdamW
* Loss: **Focal Loss** (γ=1.0, soft focal)
* Sampler: **WeightedRandomSampler** (√(inverse frequency))
* FP16: enabled
* Early stopping: patience = 2 (monitoring macro-F1)

**Data split (stratified):**

* Train: 80%
* Validation: 10%
* Test: 10%

**Label distribution (full set):**

* Pro-CBDC: 742
* Wait-and-See: 694
* Anti-CBDC: 211

---

## Performance & Metrics

### Validation (best checkpoint, epoch 6)

* Accuracy: **0.8303**
* Macro-F1: **0.7936**
* Weighted-F1: **0.8338**
* Loss: 0.3883

### Test

* Accuracy: **0.8485**
* Macro-F1: **0.8519**
* Weighted-F1: **0.8484**
* Loss: 0.4223

### Per-class (test)

| Class        | Precision | Recall | F1     | Support |
| ------------ | --------- | ------ | ------ | ------- |
| Anti-CBDC    | 0.8261    | 0.9048 | 0.8636 | 21      |
| Pro-CBDC     | 0.8421    | 0.8533 | 0.8477 | 75      |
| Wait-and-See | 0.8636    | 0.8261 | 0.8444 | 69      |


---

## Results

* Strong balanced stance classification across **three nuanced CBDC perspectives**.
* **Minority class (Anti-CBDC)** handled well despite limited examples (F1 ≈ 0.86).
* Robust generalization with **macro-F1 ≈ 0.85** on the test set.

In [None]:
# =========================
# 0) Setup
# =========================
# !pip -q install -U transformers datasets accelerate scikit-learn

import os, json, numpy as np, pandas as pd, torch, torch.nn as nn
from google.colab import drive
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score, classification_report
from sklearn.utils.class_weight import compute_class_weight
from torch.utils.data import DataLoader, WeightedRandomSampler
from datasets import Dataset, DatasetDict
from transformers import (
    AutoTokenizer, AutoModelForSequenceClassification,
    DataCollatorWithPadding, TrainingArguments, Trainer, EarlyStoppingCallback
)

# -------------------------
# 1) Paths
# -------------------------
MOUNT              = "/content/drive"
DRIVE_PROJECT_DIR  = f"{MOUNT}/MyDrive/cbdc-bert-stance"
DATA_CSV           = f"{MOUNT}/MyDrive/cbdc-bert-stance/stance_sentences.csv"

MODEL_NAME         = "bilalzafar/cb-bert-mlm" # Now updated the model name as bilalzafar/CentralBank-BERT
OUTPUT_DIR         = os.path.join(DRIVE_PROJECT_DIR, "checkpoints")
FINAL_MODEL_DIR    = os.path.join(DRIVE_PROJECT_DIR, "stance_model_cbbert")
os.makedirs(OUTPUT_DIR, exist_ok=True)
os.makedirs(FINAL_MODEL_DIR, exist_ok=True)


# -------------------------
# 2) Mount Drive & load data
# -------------------------
try:
    drive.mount(MOUNT)
except Exception:
    pass

df = pd.read_csv(DATA_CSV)  # expects columns: text, label
df = df[["text", "label"]].dropna().reset_index(drop=True)
df["text"] = df["text"].astype(str)

# -------------------------
# 3) Three-class labels
# -------------------------
label_list = ["Anti-CBDC", "Pro-CBDC", "Wait-and-See"]
df = df[df["label"].isin(label_list)].copy()

# map -> ids
label2id = {l: i for i, l in enumerate(label_list)}
id2label = {i: l for l, i in label2id.items()}
df["label_id"] = df["label"].map(label2id)

print("Label counts (3-class):\n", df["label"].value_counts())


# -------------------------
# 4) Train/Val/Test split (stratified on merged labels)
# -------------------------
train_df, temp_df = train_test_split(
    df, test_size=0.2, stratify=df["label_id"], random_state=42
)
val_df, test_df = train_test_split(
    temp_df, test_size=0.5, stratify=temp_df["label_id"], random_state=42
)

def to_hf(ds):
    return Dataset.from_pandas(
        ds[["text","label_id"]].rename(columns={"label_id":"labels"}),
        preserve_index=False
    )

dataset = DatasetDict({
    "train": to_hf(train_df),
    "validation": to_hf(val_df),
    "test": to_hf(test_df),
})

# -------------------------
# 5) Tokenizer & collator
# -------------------------
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

def tokenize(batch):
    return tokenizer(batch["text"], truncation=True, max_length=320)

# remove 'text' so the collator doesn't try to tensorize strings
dataset = dataset.map(
    tokenize,
    batched=True,
    remove_columns=["text"],
    load_from_cache_file=False
)

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

# -------------------------
# 6) Model
# -------------------------
model = AutoModelForSequenceClassification.from_pretrained(
    MODEL_NAME,
    num_labels=len(label_list),   # 3 classes
    id2label=id2label,
    label2id=label2id
)

# -------------------------
# 7) Imbalance handling: weights + (soft) sampler
# -------------------------
y_train = train_df["label_id"].values
weights = compute_class_weight(
    class_weight="balanced", classes=np.arange(len(label_list)), y=y_train
)
class_weights = torch.tensor(weights, dtype=torch.float)
print("Class weights (3-class merged):", {id2label[i]: float(w) for i,w in enumerate(class_weights)})

# gentler sampler: sqrt(inverse frequency)
class_counts = train_df["label_id"].value_counts().to_dict()
inv_class_counts = {c: 1.0 / count for c, count in class_counts.items()}
sample_weights = train_df["label_id"].map(inv_class_counts).astype("float32").values
sample_weights = np.sqrt(sample_weights).astype("float32")
sample_weights_t = torch.tensor(sample_weights)

# -------------------------
# 8) Focal Loss (soft: gamma=1.0)
# -------------------------
class FocalLoss(nn.Module):
    def __init__(self, gamma: float = 1.0, weight: torch.Tensor | None = None, reduction: str = "mean"):
        super().__init__()
        self.gamma = gamma
        self.weight = weight
        self.reduction = reduction

    def forward(self, logits: torch.Tensor, targets: torch.Tensor) -> torch.Tensor:
        log_probs = torch.nn.functional.log_softmax(logits, dim=-1)
        probs = log_probs.exp()
        ce = torch.nn.functional.nll_loss(
            log_probs, targets.long(), weight=self.weight, reduction="none"
        )
        pt = probs.gather(dim=-1, index=targets.view(-1,1)).squeeze(1)
        loss = (1 - pt).pow(self.gamma) * ce
        if self.reduction == "mean":
            return loss.mean()
        if self.reduction == "sum":
            return loss.sum()
        return loss

# -------------------------
# 9) Custom Trainer: Focal + WeightedRandomSampler
# -------------------------
class FocalSamplerTrainer(Trainer):
    def __init__(self, class_weights=None, focal_gamma: float = 1.0, train_sample_weights=None, *args, **kwargs):
        super().__init__(*args, **kwargs)
        device = self.args.device
        self.class_weights = class_weights.to(device) if class_weights is not None else None
        self.criterion = FocalLoss(gamma=focal_gamma, weight=self.class_weights)
        self.train_sample_weights = train_sample_weights

    def compute_loss(self, model, inputs, return_outputs=False, **kwargs):
        labels = inputs.pop("labels")
        outputs = model(**inputs)
        logits = outputs.logits
        loss = self.criterion(logits, labels)
        return (loss, outputs) if return_outputs else loss

    def get_train_dataloader(self) -> DataLoader:
        if self.train_dataset is None:
            raise ValueError("Trainer: training requires a train_dataset.")
        if self.train_sample_weights is None:
            return super().get_train_dataloader()
        sampler = WeightedRandomSampler(
            weights=self.train_sample_weights,
            num_samples=len(self.train_sample_weights),
            replacement=True
        )
        return DataLoader(
            self.train_dataset,
            batch_size=self.args.train_batch_size,
            sampler=sampler,
            collate_fn=self.data_collator,
            drop_last=self.args.dataloader_drop_last,
            num_workers=self.args.dataloader_num_workers,
            pin_memory=self.args.dataloader_pin_memory,
            persistent_workers=self.args.dataloader_persistent_workers,
        )

# -------------------------
# 10) Metrics
# -------------------------
def compute_metrics(eval_pred):
    labels = eval_pred.label_ids
    preds = eval_pred.predictions.argmax(-1)
    return {
        "accuracy": accuracy_score(labels, preds),
        "f1_macro": f1_score(labels, preds, average="macro"),
        "f1_weighted": f1_score(labels, preds, average="weighted"),
    }

# -------------------------
# 11) Training args
# -------------------------
training_args = TrainingArguments(
    output_dir=OUTPUT_DIR,
    eval_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    metric_for_best_model="f1_macro",
    greater_is_better=True,
    num_train_epochs=8,                # EarlyStopping will cut if it plateaus
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    weight_decay=0.01,
    warmup_ratio=0.06,
    fp16=True,
    save_total_limit=2,
    logging_steps=50,
    report_to="none",
    seed=42,
    dataloader_num_workers=0,          # Colab-safe
)

trainer = FocalSamplerTrainer(
    model=model,
    args=training_args,
    train_dataset=dataset["train"],
    eval_dataset=dataset["validation"],
    data_collator=data_collator,
    tokenizer=tokenizer,               # HF warns it's deprecated; fine here
    compute_metrics=compute_metrics,
    class_weights=None,                # soft focal: no extra class weights (sampler handles imbalance)
    focal_gamma=1.0,
    train_sample_weights=sample_weights_t,
    callbacks=[EarlyStoppingCallback(early_stopping_patience=2)]
)

# -------------------------
# 12) Train
# -------------------------
print("🚀 Training…")
trainer.train()
print("✅ Training complete.")

# -------------------------
# 13) Evaluate on val & test
# -------------------------
print("📊 Validation:", trainer.evaluate(dataset["validation"]))
test_metrics = trainer.evaluate(dataset["test"])
print("📊 Test:", test_metrics)

# Detailed report on test set
preds = trainer.predict(dataset["test"])
y_true = preds.label_ids
y_pred = preds.predictions.argmax(-1)
print("\nClassification report (test):\n",
      classification_report(y_true, y_pred, target_names=label_list, digits=4))

# -------------------------
# 14) Save model + tokenizer + label map
# -------------------------
trainer.save_model(FINAL_MODEL_DIR)
tokenizer.save_pretrained(FINAL_MODEL_DIR)
with open(os.path.join(FINAL_MODEL_DIR, "label_mapping.json"), "w") as f:
    json.dump({"label2id": label2id, "id2label": id2label}, f, indent=2)

print(f"\n💾 Saved best model to: {FINAL_MODEL_DIR}")
print(f"   Checkpoints in: {OUTPUT_DIR}")


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Label counts (3-class):
 label
Pro-CBDC        742
Wait-and-See    694
Anti-CBDC       211
Name: count, dtype: int64


Map:   0%|          | 0/1317 [00:00<?, ? examples/s]

Map:   0%|          | 0/165 [00:00<?, ? examples/s]

Map:   0%|          | 0/165 [00:00<?, ? examples/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bilalzafar/cb-bert-mlm and are newly initialized: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  super().__init__(*args, **kwargs)


Class weights (3-class merged): {'Anti-CBDC': 2.597633123397827, 'Pro-CBDC': 0.7403035163879395, 'Wait-and-See': 0.7909910082817078}
🚀 Training…


Epoch,Training Loss,Validation Loss,Accuracy,F1 Macro,F1 Weighted
1,0.6694,0.313964,0.8,0.77093,0.803784
2,0.19,0.322927,0.812121,0.78364,0.815999
3,0.1308,0.305862,0.830303,0.791326,0.830749
4,0.0593,0.388343,0.830303,0.793579,0.833764
5,0.0436,0.410729,0.836364,0.793038,0.837832
6,0.0213,0.4351,0.824242,0.784401,0.826425


✅ Training complete.


📊 Validation: {'eval_loss': 0.3883433938026428, 'eval_accuracy': 0.8303030303030303, 'eval_f1_macro': 0.7935788796599267, 'eval_f1_weighted': 0.8337642724755738, 'eval_runtime': 0.3228, 'eval_samples_per_second': 511.127, 'eval_steps_per_second': 34.075, 'epoch': 6.0}
📊 Test: {'eval_loss': 0.42226442694664, 'eval_accuracy': 0.8484848484848485, 'eval_f1_macro': 0.8519209757620354, 'eval_f1_weighted': 0.8483587226874403, 'eval_runtime': 0.2355, 'eval_samples_per_second': 700.522, 'eval_steps_per_second': 46.701, 'epoch': 6.0}

Classification report (test):
               precision    recall  f1-score   support

   Anti-CBDC     0.8261    0.9048    0.8636        21
    Pro-CBDC     0.8421    0.8533    0.8477        75
Wait-and-See     0.8636    0.8261    0.8444        69

    accuracy                         0.8485       165
   macro avg     0.8439    0.8614    0.8519       165
weighted avg     0.8491    0.8485    0.8484       165


💾 Saved best model to: /content/drive/MyDrive/cbdc-bert-

# Inferance

In [20]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

MODEL_PATH = "/content/drive/MyDrive/cbdc-bert-stance/stance_model_cbbert"

tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_PATH)

classifier = pipeline(
    "text-classification",
    model=model,
    tokenizer=tokenizer,
    truncation=True,
    padding=True,
    top_k=True
)

# Single sentence
print(classifier("CBDCs will reduce costs and improve payments."))

Device set to use cuda:0


[[{'label': 'Pro-CBDC', 'score': 0.9788165092468262}]]
