# Policy Severity & Zero-Shot Classification

This notebook uses a pre-trained NLI model to label each GDPR/CCPA policy update with a severity (LOW, MEDIUM, HIGH, CRITICAL) completely offline and free.


In [1]:
%pip install transformers torch pandas

Collecting transformers
  Downloading transformers-4.53.2-py3-none-any.whl.metadata (40 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.9/40.9 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.30.0 (from transformers)
  Downloading huggingface_hub-0.33.4-py3-none-any.whl.metadata (14 kB)
Collecting tokenizers<0.22,>=0.21 (from transformers)
  Downloading tokenizers-0.21.2-cp39-abi3-macosx_11_0_arm64.whl.metadata (6.8 kB)
Collecting safetensors>=0.4.3 (from transformers)
  Downloading safetensors-0.5.3-cp38-abi3-macosx_11_0_arm64.whl.metadata (3.8 kB)
Collecting hf-xet<2.0.0,>=1.1.2 (from huggingface-hub<1.0,>=0.30.0->transformers)
  Downloading hf_xet-1.1.5-cp37-abi3-macosx_11_0_arm64.whl.metadata (879 bytes)
Downloading transformers-4.53.2-py3-none-any.whl (10.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.8/10.8 MB[0m [31m15.5 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hDownloading huggingface_h

The ipywidgets library provides the foundation for interactive UI elements (sliders, buttons, progress bars, etc.) in Jupyter notebooks and JupyterLab.

In this case, HuggingFace’s pipeline tries to display a live progress bar as an interactive widget (an HBoxModel) inside the notebook. Without ipywidgets installed, Jupyter doesn’t know how to render that widget and throws those “Failed to load model class HBoxModel” errors.

In [11]:
%pip install ipywidgets

Collecting fqdn (from jsonschema[format-nongpl]>=4.18.0->jupyter-events>=0.6.0->jupyter-server<3,>=2.4.0->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets)
  Downloading fqdn-1.5.1-py3-none-any.whl.metadata (1.4 kB)
Collecting isoduration (from jsonschema[format-nongpl]>=4.18.0->jupyter-events>=0.6.0->jupyter-server<3,>=2.4.0->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets)
  Downloading isoduration-20.11.0-py3-none-any.whl.metadata (5.7 kB)
Collecting uri-template (from jsonschema[format-nongpl]>=4.18.0->jupyter-events>=0.6.0->jupyter-server<3,>=2.4.0->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets)
  Downloading uri_template-1.3.0-py3-none-any.whl.metadata (8.8 kB)
Collecting webcolors>=1.11 (from jsonschema[format-nongpl]>=4.18.0->jupyter-events>=0.6.0->jupyter-server<3,>=2.4.0->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets)
  Downloading webcolors-24.11.1-py3-none-any.whl.metadata (2.2 kB)
Downloading webcolors-24.11.1-py3-none-any.whl (14 kB)
Downloa

## Imports/Load Libraries

In [6]:
import os
import pandas as pd
from datetime import datetime
from transformers import pipeline
from transformers import pipeline


## Configuration

In [3]:
# Paths – adjust if your repo layout differs
BASE_DIR   = os.getcwd()  # Notebook folder
INPUT_CSV  = os.path.join(BASE_DIR, "/Volumes/Personal Drive/GitHub/gdpr-ccpa-risk-pipeline/data/processed/cleaned_policies.csv")

# Timestamped output
ts         = datetime.utcnow().strftime("%Y%m%dT%H%M%SZ")
OUTPUT_CSV = os.path.join(BASE_DIR, f"../data/processed/cleaned_with_severity_zeroshot_{ts}.csv")



In [4]:
df = pd.read_csv(INPUT_CSV)
df.head()

Unnamed: 0,source,title,link,date
0,EDPB,The Italian SA imposes fines of 420 000 EUR on...,https://edpb.europa.eu/news/national-news/2025...,15 July 2025
1,EDPB,Biometrics for attendance recording. The Itali...,https://edpb.europa.eu/news/national-news/2025...,15 July 2025
2,EDPB,Swedish SA: Administrative fine against the Eq...,https://edpb.europa.eu/news/national-news/2025...,15 July 2025
3,EDPB,Targeted modifications of the GDPR: EDPB & EDP...,https://edpb.europa.eu/news/news/2025/targeted...,9 July 2025
4,EDPB,Irish Supervisory Authority fines TikTok €530 ...,https://edpb.europa.eu/news/news/2025/irish-su...,4 July 2025


## Initialize Zero-Shot Classifier

In [12]:
# Uses a natural-language inference model to pick the best label
classifier = pipeline(
    "zero-shot-classification",
    model="facebook/bart-large-mnli",
    framework="pt",      # ← force PyTorch
    device=-1            # ← CPU (change to a GPU id if you have one)
)

LABELS = ["LOW", "MEDIUM", "HIGH", "CRITICAL"]


Device set to use cpu


## Define local classifier

In [13]:
# Classification function
def classify_severity_local(title: str, summary: str) -> str:
    text = f"Title: {title}\nSummary: {summary}"
    result = classifier(text, LABELS, multi_label=False)
    return result["labels"][0]  # pick the top label


## Apply it and write out our CSV

In [14]:
df["severity"] = df.apply(
    lambda row: classify_severity_local(row["title"], row.get("summary", "")),
    axis=1
)

# Timestamped output filename
ts = datetime.utcnow().strftime("%Y%m%dT%H%M%SZ")
output_path = os.path.join(BASE_DIR, f"/Volumes/Personal Drive/GitHub/gdpr-ccpa-risk-pipeline/data/processed/cleaned_with_severity_zeroshot_{ts}.csv")

df.to_csv(output_path, index=False)
print("✅ Saved zero-shot classified policies to:", output_path)
df.head()

✅ Saved zero-shot classified policies to: /Volumes/Personal Drive/GitHub/gdpr-ccpa-risk-pipeline/data/processed/cleaned_with_severity_zeroshot_20250718T032242Z.csv


Unnamed: 0,source,title,link,date,severity
0,EDPB,The Italian SA imposes fines of 420 000 EUR on...,https://edpb.europa.eu/news/national-news/2025...,15 July 2025,MEDIUM
1,EDPB,Biometrics for attendance recording. The Itali...,https://edpb.europa.eu/news/national-news/2025...,15 July 2025,MEDIUM
2,EDPB,Swedish SA: Administrative fine against the Eq...,https://edpb.europa.eu/news/national-news/2025...,15 July 2025,MEDIUM
3,EDPB,Targeted modifications of the GDPR: EDPB & EDP...,https://edpb.europa.eu/news/news/2025/targeted...,9 July 2025,MEDIUM
4,EDPB,Irish Supervisory Authority fines TikTok €530 ...,https://edpb.europa.eu/news/news/2025/irish-su...,4 July 2025,MEDIUM
