
# Zero-Shot Category Inference for Report Views

This notebook uses a transformer-based model from HuggingFace to **infer the most likely category** of a report view based on its textual description, without requiring prior training.  
We use **zero-shot classification** via a model like `facebook/bart-large-mnli`.

Target categories:
- functional
- index
- executive
- informative
- self-service
- other
- master data


<a href="https://colab.research.google.com/github/cbadenes/semantic-report-search/blob/main/data/analysis/31_zeroshot_classification.ipynb" target="_parent">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab"/>
</a>


In [15]:

#!pip install -q transformers

from transformers import pipeline
import pandas as pd

pd.set_option("display.max_colwidth", None)
pd.set_option("display.max_columns", 100)
pd.set_option("display.max_rows", 100)

# Load the zero-shot classifier pipeline
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")


Device set to use cpu


In [16]:
def infer_category(input_text, labels, multi_label=False):
    result = classifier(input_text, labels, multi_label=multi_label)
    return result["labels"][0]  # top predicted category

my_text = "The government is planning to increase taxes on imported goods to support local industries."
my_labels = ["Politics", "Economics", "Entertainment", "Sports"]
print(infer_category(input_text=my_text,labels=my_labels))

Politics


In [24]:
# Load Excel file with report views
df = pd.read_excel("Reporting_Inventory.xlsx", sheet_name="Views")

# Ensure descriptions are strings
df["Description"] = df["Description"].astype(str)

# Define candidate labels (categories)
candidate_labels = [
    "functional", "index", "executive", "informative", "self-service", "other", "master data"
]

# Filter views with no assigned category
df_unlabeled = df[
    df["Category"].isna() &
    df["Description"].notna() &
    (df["Description"].str.strip() != "") &
    (df["Description"].str.lower().str.strip() != "nan")
]

df_unlabeled.head(10)

Unnamed: 0,ID Data Product,Report Name,Product Owner,PBIX_File,Report View,Description,Category,Status,Rename,Dimensions,KPIs,Other Terms,Filters,Tags,Priority
182,RPPBI0034,Corporate Market Share - 2024,Raven Jordan,CharacterReport.pbix,STR Forecast Dashboard 2024,"The reports sent by STR every 3 months with forecast data from some markets of %OCC, ADR and RevPar, are consolidated on this tab.",,Productive,,Cities available,"Occupancy, ADR, RevPar",%Chg last 2 forecast,"Forecast Month, Flag STR is Yes, Hotel_Name is not Hotel Puebla Finsa or Hotel Curitiba The Five or Hotel Lisboa Campo Grande","STR Forecast, Corporate Market Share, 2024",Priority 1
183,RPPBI0034,Corporate Market Share - 2024,Raven Jordan,CharacterReport.pbix,STR Forecast Dashboard 2025,"The reports sent by STR every 3 months with forecast data from some markets of %OCC, ADR and RevPar, are consolidated on this tab.",,Productive,,Cities available,"Occupancy, ADR, RevPar",%Chg last 2 forecast,"Forecast Month, Flag STR is Yes, Hotel_Name is not Hotel Puebla Finsa or Hotel Curitiba The Five or Hotel Lisboa Campo Grande","STR Forecast, Corporate Market Share, 2024",Priority 1
259,RPPBI0150,Corporate Market Share - 2025,Matthew Callahan,SameReport.pbix,STR Forecast Dashboard 2025,"The reports sent by STR every 3 months with forecast data from some markets of %OCC, ADR and RevPar, are consolidated on this tab.",,Productive,,Cities available,"Occupancy, ADR, RevPar",%Chg last 2 forecast,"Forecast Month, Flag STR is Yes, Hotel_Name is not Hotel Puebla Finsa or Hotel Curitiba The Five or Hotel Lisboa Campo Grande","STR Forecast, Corporate Market Share",Priority 1
320,RPPBI0173,Daily Revenue Report 2025,Tasha Hall,AboutReport.pbix,Pick Up Channel Detail,DELETED,,,,,,,,,Priority 1
358,RPPBI0062,Price Competitiveness,Nicole Carter,AboutReport.pbix,Booking Criteria,"This view is exclusively for Booking.com,given that they have their offensive criteria. They stablish that a searchis considered offensive when the price difference is greater then 3% and the ranking position is less than 4",,Productive,,"BU, Country, City, Hotel, Brand, META, OTA",,,,,Priority 1
362,RPPBI0062,Price Competitiveness,Nicole Carter,AboutReport.pbix,Page 1,internal,,Internal,,,,,,,Priority 1


In [25]:
# Apply zero-shot classification using lambda to pass extra arguments
df_unlabeled["Predicted Category"] = df_unlabeled["Description"].apply(
    lambda desc: infer_category(desc, candidate_labels, multi_label=False)
)

# Show results
df_unlabeled[["Report View", "Description", "Predicted Category"]].head(20)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_unlabeled["Predicted Category"] = df_unlabeled["Description"].apply(


Unnamed: 0,Report View,Description,Predicted Category
182,STR Forecast Dashboard 2024,"The reports sent by STR every 3 months with forecast data from some markets of %OCC, ADR and RevPar, are consolidated on this tab.",functional
183,STR Forecast Dashboard 2025,"The reports sent by STR every 3 months with forecast data from some markets of %OCC, ADR and RevPar, are consolidated on this tab.",functional
259,STR Forecast Dashboard 2025,"The reports sent by STR every 3 months with forecast data from some markets of %OCC, ADR and RevPar, are consolidated on this tab.",functional
320,Pick Up Channel Detail,DELETED,other
358,Booking Criteria,"This view is exclusively for Booking.com,given that they have their offensive criteria. They stablish that a searchis considered offensive when the price difference is greater then 3% and the ranking position is less than 4",executive
362,Page 1,internal,other


In [20]:
# Save Results
df_unlabeled_sample.to_csv("views_with_predicted_categories_zeroshot.csv", index=False)
