
# Zero-Shot Category Inference for Report Views

This notebook uses a transformer-based model from HuggingFace to **infer the most likely category** of a report view based on its textual description, without requiring prior training.  
We use **zero-shot classification** via a model like `facebook/bart-large-mnli`.

Target categories:
- functional
- index
- executive
- informative
- self-service
- other
- master data


<a href="https://colab.research.google.com/github/cbadenes/semantic-report-search/blob/main/data/analysis/31_zeroshot_classification.ipynb" target="_parent">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab"/>
</a>


In [1]:

#!pip install -q transformers

from transformers import pipeline
import pandas as pd

# Load the zero-shot classifier pipeline
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")


TypeError: unhashable type: 'list'

In [None]:

# Load Excel file with report views
df = pd.read_excel("../raw/Reporting_Inventory.xlsx", sheet_name="Views")

# Ensure descriptions are strings
df["Description"] = df["Description"].astype(str)

# Define candidate labels (categories)
candidate_labels = [
    "functional", "index", "executive", "informative",
    "self-service", "other", "master data"
]


In [None]:

def infer_category(description, labels=candidate_labels, multi_label=False):
    result = classifier(description, candidate_labels, multi_label=multi_label)
    return result["labels"][0]  # top predicted category


In [None]:

# Filter views with no assigned category
df_unlabeled = df[df["Category"].isna()].copy()

# Apply zero-shot classification
df_unlabeled["Predicted Category"] = df_unlabeled["Description"].apply(infer_category)

# Show results
df_unlabeled[["Report View", "Description", "Predicted Category"]].head()


In [None]:

df_unlabeled.to_csv("views_with_predicted_categories_zeroshot.csv", index=False)
