# CBDC Sentiment Classification Pipeline

This pipeline fine-tunes a domain-adapted BERT model (`bilalzafar/CentralBank-BERT`) to classify **CBDC-related sentences** into three sentiment categories: **Negative, Neutral, Positive**.
It combines **class balancing, focal loss, and weighted sampling** to address dataset imbalance.

---

## 0. Setup

```bash
pip install -U transformers datasets accelerate scikit-learn
```

```python
import os, json, numpy as np, pandas as pd, torch, torch.nn as nn
from google.colab import drive
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score, classification_report
from sklearn.utils.class_weight import compute_class_weight
from torch.utils.data import DataLoader, WeightedRandomSampler
from datasets import Dataset, DatasetDict
from transformers import (
    AutoTokenizer, AutoModelForSequenceClassification,
    DataCollatorWithPadding, TrainingArguments, Trainer, EarlyStoppingCallback
)
```

---

## 1. Data

* **Source:** `cbdc_sentiment_training.csv`
* **Labels:** `negative`, `neutral`, `positive`
* **Split:** 80% train / 10% validation / 10% test (stratified)
* **Distribution:**

  * Neutral: 1068
  * Positive: 1026
  * Negative: 311

---

## 2. Model

* **Base model:** [`bilalzafar/CentralBank-BERT`](https://huggingface.co/bilalzafar/CentralBank-BERT) (domain-adapted BERT on central bank text).
* **Classifier head:** 3-way softmax.
* **Tokenizer:** WordPiece with `max_length=320`.

---

## 3. Handling Imbalance

* **Class weights** computed from training data (`negative` weighted highest).
* **WeightedRandomSampler** to upsample minority examples.
* **Focal Loss** (γ=1.0, soft focal) to reduce the effect of easy examples.

---

## 4. Training

* **Epochs:** 8 (early stopped at 6)
* **Batch size:** 16 (train & eval)
* **Learning rate:** 2e-5
* **Weight decay:** 0.01
* **Warmup ratio:** 0.06
* **Loss:** Focal Loss (with class weights)
* **Evaluation metric:** Macro-F1
* **Mixed precision:** fp16 enabled
* **Early stopping:** patience = 2

---

## 5. Evaluation

### Validation (10%)

* Accuracy: **0.846**
* Macro-F1: **0.827**
* Weighted-F1: **0.845**

### Test (10%)

* Accuracy: **0.822**
* Macro-F1: **0.812**
* Weighted-F1: **0.822**

### Per-class (Test)

| Class    | Precision | Recall | F1    | Support |
| -------- | --------- | ------ | ----- | ------- |
| Negative | 0.821     | 0.742  | 0.780 | 31      |
| Neutral  | 0.786     | 0.822  | 0.804 | 107     |
| Positive | 0.861     | 0.845  | 0.853 | 103     |

---

## 6. Results

* The model achieves **balanced performance across all three sentiment classes**, with strongest scores on the **Positive** class (F1 ≈ 0.85).
* **Negative** class is well-handled despite being the minority (F1 ≈ 0.78).
* Macro-F1 ≈ **0.81** demonstrates robust generalization.

In [None]:
# =========================
# 0) Setup
# =========================
# !pip -q install -U transformers datasets accelerate scikit-learn

import os, json, numpy as np, pandas as pd, torch, torch.nn as nn
from google.colab import drive
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score, classification_report
from sklearn.utils.class_weight import compute_class_weight
from torch.utils.data import DataLoader, WeightedRandomSampler
from datasets import Dataset, DatasetDict
from transformers import (
    AutoTokenizer, AutoModelForSequenceClassification,
    DataCollatorWithPadding, TrainingArguments, Trainer, EarlyStoppingCallback
)

# -------------------------
# 1) Paths
# -------------------------
import pandas as pd
# --- Mount Google Drive ---
from google.colab import drive
drive.mount('/content/drive')

MOUNT             = "/content/drive"
BASE_DIR          = f"{MOUNT}/MyDrive/cbdc-bert-tone"
DATA_CSV          = f"{BASE_DIR}/cbdc_sentiment_training.csv"

MODEL_NAME        = "bilalzafar/cb-bert-mlm" # now changed the model name on HF as bilalzafar/CentralBank-BERT
OUTPUT_DIR        = os.path.join(BASE_DIR, "checkpoints_cbdc_sentiment_cbbert")
FINAL_MODEL_DIR   = os.path.join(BASE_DIR, "cbdc_sentiment_cbbert_model")
os.makedirs(OUTPUT_DIR, exist_ok=True)
os.makedirs(FINAL_MODEL_DIR, exist_ok=True)

# -------------------------
# 2) Mount Drive & load data
# -------------------------
try:
    drive.mount(MOUNT)
except Exception:
    pass

df = pd.read_csv(DATA_CSV)  # expects columns: url, sentence, label
df = df[["sentence", "label"]].dropna().reset_index(drop=True)
df["sentence"] = df["sentence"].astype(str)
df["label"] = df["label"].str.strip().str.lower()

# -------------------------
# 3) Three-class labels (standardized)
# -------------------------
label_list = ["negative", "neutral", "positive"]
df = df[df["label"].isin(label_list)].copy()

label2id = {l: i for i, l in enumerate(label_list)}
id2label = {i: l for l, i in label2id.items()}
df["label_id"] = df["label"].map(label2id)

print("Label counts:\n", df["label"].value_counts())

# -------------------------
# 4) Train/Val/Test split (80/10/10 stratified)
# -------------------------
train_df, temp_df = train_test_split(
    df, test_size=0.2, stratify=df["label_id"], random_state=42
)
val_df, test_df = train_test_split(
    temp_df, test_size=0.5, stratify=temp_df["label_id"], random_state=42
)

def to_hf(ds):
    return Dataset.from_pandas(
        ds[["sentence","label_id"]].rename(columns={"sentence":"text","label_id":"labels"}),
        preserve_index=False
    )

dataset = DatasetDict({
    "train": to_hf(train_df),
    "validation": to_hf(val_df),
    "test": to_hf(test_df),
})

# -------------------------
# 5) Tokenizer & collator
# -------------------------
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

def tokenize(batch):
    return tokenizer(batch["text"], truncation=True, max_length=320)

dataset = dataset.map(
    tokenize, batched=True, remove_columns=["text"], load_from_cache_file=False
)

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

# -------------------------
# 6) Model
# -------------------------
model = AutoModelForSequenceClassification.from_pretrained(
    MODEL_NAME,
    num_labels=len(label_list),
    id2label=id2label,
    label2id=label2id
)

# -------------------------
# 7) Imbalance handling
# -------------------------
# compute class weights *from the actual train dataset* (alignment-safe),
# and we WILL pass them into the trainer so they are used.
labels_train = np.array(dataset["train"]["labels"])
weights = compute_class_weight(
    class_weight="balanced",
    classes=np.arange(len(label_list)),
    y=labels_train
)
class_weights = torch.tensor(weights, dtype=torch.float)
print("Class weights (used in loss):", {id2label[i]: float(w) for i, w in enumerate(class_weights)})

# build per-sample weights from dataset['train'] so ordering matches the sampler.
# Using sqrt(inverse frequency) as gentler bias.
class_counts = np.bincount(labels_train, minlength=len(label_list))
inv_counts = {i: (1.0 / c) for i, c in enumerate(class_counts)}
instance_weights = np.array([inv_counts[l] for l in labels_train], dtype="float32")
instance_weights = np.sqrt(instance_weights).astype("float32")
train_sample_weights_t = torch.tensor(instance_weights)

# -------------------------
# 8) Focal Loss (soft: gamma=1.0)
# -------------------------
class FocalLoss(nn.Module):
    def __init__(self, gamma: float = 1.0, weight: torch.Tensor | None = None, reduction: str = "mean"):
        super().__init__()
        self.gamma = gamma
        self.weight = weight
        self.reduction = reduction

    def forward(self, logits: torch.Tensor, targets: torch.Tensor) -> torch.Tensor:
        log_probs = torch.nn.functional.log_softmax(logits, dim=-1)
        probs = log_probs.exp()
        ce = torch.nn.functional.nll_loss(
            log_probs, targets.long(), weight=self.weight, reduction="none"
        )
        pt = probs.gather(dim=-1, index=targets.view(-1,1)).squeeze(1)
        loss = (1 - pt).pow(self.gamma) * ce
        if self.reduction == "mean":
            return loss.mean()
        if self.reduction == "sum":
            return loss.sum()
        return loss

# -------------------------
# 9) Custom Trainer: Focal + WeightedRandomSampler
# -------------------------
class FocalSamplerTrainer(Trainer):
    def __init__(self, class_weights=None, focal_gamma: float = 1.0, train_sample_weights=None, *args, **kwargs):
        super().__init__(*args, **kwargs)
        device = self.args.device
        self.class_weights = class_weights.to(device) if class_weights is not None else None
        self.criterion = FocalLoss(gamma=focal_gamma, weight=self.class_weights)
        self.train_sample_weights = train_sample_weights

    def compute_loss(self, model, inputs, return_outputs=False, **kwargs):
        labels = inputs.pop("labels")
        outputs = model(**inputs)
        logits = outputs.logits
        loss = self.criterion(logits, labels)
        return (loss, outputs) if return_outputs else loss

    def get_train_dataloader(self) -> DataLoader:
        if self.train_dataset is None:
            raise ValueError("Trainer: training requires a train_dataset.")
        if self.train_sample_weights is None:
            return super().get_train_dataloader()
        sampler = WeightedRandomSampler(
            weights=self.train_sample_weights,
            num_samples=len(self.train_sample_weights),
            replacement=True
        )
        return DataLoader(
            self.train_dataset,
            batch_size=self.args.train_batch_size,
            sampler=sampler,
            collate_fn=self.data_collator,
            drop_last=self.args.dataloader_drop_last,
            num_workers=self.args.dataloader_num_workers,
            pin_memory=self.args.dataloader_pin_memory,
            persistent_workers=self.args.dataloader_persistent_workers,
        )

# -------------------------
# 10) Metrics
# -------------------------
def compute_metrics(eval_pred):
    labels = eval_pred.label_ids
    preds = eval_pred.predictions.argmax(-1)
    return {
        "accuracy": accuracy_score(labels, preds),
        "f1_macro": f1_score(labels, preds, average="macro"),
        "f1_weighted": f1_score(labels, preds, average="weighted"),
    }

# -------------------------
# 11) Training args
# -------------------------
training_args = TrainingArguments(
    output_dir=OUTPUT_DIR,
    eval_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    metric_for_best_model="f1_macro",
    greater_is_better=True,
    num_train_epochs=8,
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    weight_decay=0.01,
    warmup_ratio=0.06,
    fp16=True,
    save_total_limit=2,
    logging_steps=50,
    report_to="none",
    seed=42,
    dataloader_num_workers=0,
)

trainer = FocalSamplerTrainer(
    model=model,
    args=training_args,
    train_dataset=dataset["train"],
    eval_dataset=dataset["validation"],
    data_collator=data_collator,
    processing_class=tokenizer,
    compute_metrics=compute_metrics,
    class_weights=class_weights,
    focal_gamma=1.0,
    train_sample_weights=train_sample_weights_t,
    callbacks=[EarlyStoppingCallback(early_stopping_patience=2)]
)

# -------------------------
# 12) Train
# -------------------------
print("🚀 Training…")
trainer.train()
print("✅ Training complete.")

# -------------------------
# 13) Evaluate on val & test
# -------------------------
print("📊 Validation:", trainer.evaluate(dataset["validation"]))
test_metrics = trainer.evaluate(dataset["test"])
print("📊 Test:", test_metrics)

preds = trainer.predict(dataset["test"])
y_true = preds.label_ids
y_pred = preds.predictions.argmax(-1)
print("\nClassification report (test):\n",
      classification_report(y_true, y_pred, target_names=label_list, digits=4))

# -------------------------
# 14) Save model + tokenizer + label map
# -------------------------
trainer.save_model(FINAL_MODEL_DIR)
tokenizer.save_pretrained(FINAL_MODEL_DIR)
with open(os.path.join(FINAL_MODEL_DIR, "label_mapping.json"), "w") as f:
    json.dump({"label2id": label2id, "id2label": id2label}, f, indent=2)

print(f"\n💾 Saved best model to: {FINAL_MODEL_DIR}")
print(f"   Checkpoints in: {OUTPUT_DIR}")


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Label counts:
 label
neutral     1068
positive    1026
negative     311
Name: count, dtype: int64


Map:   0%|          | 0/1924 [00:00<?, ? examples/s]

Map:   0%|          | 0/240 [00:00<?, ? examples/s]

Map:   0%|          | 0/241 [00:00<?, ? examples/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bilalzafar/cb-bert-mlm and are newly initialized: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Class weights (used in loss): {'negative': 2.5756359100341797, 'neutral': 0.7509757876396179, 'positive': 0.7811611890792847}
🚀 Training…


Epoch,Training Loss,Validation Loss,Accuracy,F1 Macro,F1 Weighted
1,0.3487,0.299174,0.7875,0.778957,0.788831
2,0.186,0.357329,0.7875,0.787869,0.783521
3,0.1012,0.407685,0.8,0.785749,0.799994
4,0.0402,0.490367,0.845833,0.826967,0.84532
5,0.0257,0.560023,0.833333,0.815209,0.832394
6,0.0274,0.53845,0.820833,0.799541,0.820615


✅ Training complete.


📊 Validation: {'eval_loss': 0.4903668165206909, 'eval_accuracy': 0.8458333333333333, 'eval_f1_macro': 0.8269668498403844, 'eval_f1_weighted': 0.8453203996815247, 'eval_runtime': 0.5446, 'eval_samples_per_second': 440.664, 'eval_steps_per_second': 27.542, 'epoch': 6.0}
📊 Test: {'eval_loss': 0.47107648849487305, 'eval_accuracy': 0.8215767634854771, 'eval_f1_macro': 0.8120850538187568, 'eval_f1_weighted': 0.8216319513767758, 'eval_runtime': 0.59, 'eval_samples_per_second': 408.476, 'eval_steps_per_second': 27.119, 'epoch': 6.0}

Classification report (test):
               precision    recall  f1-score   support

    negative     0.8214    0.7419    0.7797        31
     neutral     0.7857    0.8224    0.8037       107
    positive     0.8614    0.8447    0.8529       103

    accuracy                         0.8216       241
   macro avg     0.8228    0.8030    0.8121       241
weighted avg     0.8226    0.8216    0.8216       241


💾 Saved best model to: /content/drive/MyDrive/cbdc-bert

# Inferance

In [None]:
from transformers import pipeline

model_dir = "/content/drive/MyDrive/cbdc-bert-tone/cbdc_sentiment_cbbert_model"  # your saved Model 2
clf = pipeline("text-classification", model=model_dir, tokenizer=model_dir,
               truncation=True, max_length=320)  # add device=0 to use GPU

sentence = "CBDCs will revolutionize payment systems and improve financial inclusion."
pred = clf(sentence)[0]

print(f"Sentence: {sentence}")
print(f"Predicted label: {pred['label']} (score: {pred['score']:.4f})")

Sentence: CBDCs will revolutionize payment systems and improve financial inclusion.
Predicted label: positive (score: 0.9789)
