
# Zero-Shot Category Inference for Report Views

This notebook uses a transformer-based model from HuggingFace to **infer the most likely category** of a report view based on its textual description, without requiring prior training.  
We use **zero-shot classification** via a model like `facebook/bart-large-mnli`.

Target categories:
- functional
- index
- executive
- informative
- self-service
- other
- master data


<a href="https://colab.research.google.com/github/cbadenes/semantic-report-search/blob/main/data/analysis/31_zeroshot_classification.ipynb" target="_parent">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab"/>
</a>


In [15]:

#!pip install -q transformers

from transformers import pipeline
import pandas as pd

pd.set_option("display.max_colwidth", None)
pd.set_option("display.max_columns", 100)
pd.set_option("display.max_rows", 100)

# Load the zero-shot classifier pipeline
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")


Device set to use cpu


In [16]:
def infer_category(input_text, labels, multi_label=False):
    result = classifier(input_text, labels, multi_label=multi_label)
    return result["labels"][0]  # top predicted category

my_text = "The government is planning to increase taxes on imported goods to support local industries."
my_labels = ["Politics", "Economics", "Entertainment", "Sports"]
print(infer_category(input_text=my_text,labels=my_labels))

Politics


In [24]:
# Load Excel file with report views
df = pd.read_excel("Reporting_Inventory.xlsx", sheet_name="Views")

# Ensure descriptions are strings
df["Description"] = df["Description"].astype(str)

# Define candidate labels (categories)
candidate_labels = [
    "functional", "index", "executive", "informative", "self-service", "other", "master data"
]

# Filter views with no assigned category
df_unlabeled = df[
    df["Category"].isna() &
    df["Description"].notna() &
    (df["Description"].str.strip() != "") &
    (df["Description"].str.lower().str.strip() != "nan")
]

df_unlabeled.head(10)

Unnamed: 0,ID Data Product,Report Name,Product Owner,PBIX_File,Report View,Description,Category,Status,Rename,Dimensions,KPIs,Other Terms,Filters,Tags,Priority
182,RPPBI0034,Corporate Market Share - 2024,Raven Jordan,CharacterReport.pbix,STR Forecast Dashboard 2024,"The reports sent by STR every 3 months with forecast data from some markets of %OCC, ADR and RevPar, are consolidated on this tab.",,Productive,,Cities available,"Occupancy, ADR, RevPar",%Chg last 2 forecast,"Forecast Month, Flag STR is Yes, Hotel_Name is not Hotel Puebla Finsa or Hotel Curitiba The Five or Hotel Lisboa Campo Grande","STR Forecast, Corporate Market Share, 2024",Priority 1
183,RPPBI0034,Corporate Market Share - 2024,Raven Jordan,CharacterReport.pbix,STR Forecast Dashboard 2025,"The reports sent by STR every 3 months with forecast data from some markets of %OCC, ADR and RevPar, are consolidated on this tab.",,Productive,,Cities available,"Occupancy, ADR, RevPar",%Chg last 2 forecast,"Forecast Month, Flag STR is Yes, Hotel_Name is not Hotel Puebla Finsa or Hotel Curitiba The Five or Hotel Lisboa Campo Grande","STR Forecast, Corporate Market Share, 2024",Priority 1
259,RPPBI0150,Corporate Market Share - 2025,Matthew Callahan,SameReport.pbix,STR Forecast Dashboard 2025,"The reports sent by STR every 3 months with forecast data from some markets of %OCC, ADR and RevPar, are consolidated on this tab.",,Productive,,Cities available,"Occupancy, ADR, RevPar",%Chg last 2 forecast,"Forecast Month, Flag STR is Yes, Hotel_Name is not Hotel Puebla Finsa or Hotel Curitiba The Five or Hotel Lisboa Campo Grande","STR Forecast, Corporate Market Share",Priority 1
320,RPPBI0173,Daily Revenue Report 2025,Tasha Hall,AboutReport.pbix,Pick Up Channel Detail,DELETED,,,,,,,,,Priority 1
358,RPPBI0062,Price Competitiveness,Nicole Carter,AboutReport.pbix,Booking Criteria,"This view is exclusively for Booking.com,given that they have their offensive criteria. They stablish that a searchis considered offensive when the price difference is greater then 3% and the ranking position is less than 4",,Productive,,"BU, Country, City, Hotel, Brand, META, OTA",,,,,Priority 1
362,RPPBI0062,Price Competitiveness,Nicole Carter,AboutReport.pbix,Page 1,internal,,Internal,,,,,,,Priority 1


In [25]:
# Apply zero-shot classification using lambda to pass extra arguments
df_unlabeled["Predicted Category"] = df_unlabeled["Description"].apply(
    lambda desc: infer_category(desc, candidate_labels, multi_label=False)
)

# Show results
df_unlabeled[["Report View", "Description", "Predicted Category"]].head(20)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_unlabeled["Predicted Category"] = df_unlabeled["Description"].apply(


Unnamed: 0,Report View,Description,Predicted Category
182,STR Forecast Dashboard 2024,"The reports sent by STR every 3 months with forecast data from some markets of %OCC, ADR and RevPar, are consolidated on this tab.",functional
183,STR Forecast Dashboard 2025,"The reports sent by STR every 3 months with forecast data from some markets of %OCC, ADR and RevPar, are consolidated on this tab.",functional
259,STR Forecast Dashboard 2025,"The reports sent by STR every 3 months with forecast data from some markets of %OCC, ADR and RevPar, are consolidated on this tab.",functional
320,Pick Up Channel Detail,DELETED,other
358,Booking Criteria,"This view is exclusively for Booking.com,given that they have their offensive criteria. They stablish that a searchis considered offensive when the price difference is greater then 3% and the ranking position is less than 4",executive
362,Page 1,internal,other


In [20]:
# Save Results
df_unlabeled_sample.to_csv("views_with_predicted_categories_zeroshot.csv", index=False)


# 2. Extend candidates

In [30]:
# Construir el texto concatenando campos relevantes
df["TextToClassify"] = (
    df["Report View"].fillna("").astype(str).str.strip() + " - " +
    df["Report Name"].fillna("").astype(str).str.strip() + " - " +
    df["Description"].fillna("").astype(str).str.strip()
)

df_extended_sample = df.head(10).copy()

df_extended_sample.head(10)

Unnamed: 0,ID Data Product,Report Name,Product Owner,PBIX_File,Report View,Description,Category,Status,Rename,Dimensions,KPIs,Other Terms,Filters,Tags,Priority,TextToClassify
0,RPPBI0032,Feeder Market - 2024,Jonathan Shields,LifeReport.pbix,CRITERIA,Methodolody and definition of the algorithim of Feeder Market,Informative,Productive,,,,,,,Priority 1,CRITERIA - Feeder Market - 2024 - Methodolody and definition of the algorithim of Feeder Market
1,RPPBI0032,Feeder Market - 2024,Jonathan Shields,LifeReport.pbix,DESTINATION_OF_FEEDER_MARKETS,View focused on understand the performance by hotel for a specific feeder market o selection of feeder marktes.,Functional,Productive,,"Hotel, month, Feeder Market, Segment, Channel Mix, Room Type","Total Revenue, Room Revenue, RN, Lead Time, Lenght of Stay, AOV, ADR, ADR Net, %Cost",,,,Priority 1,DESTINATION_OF_FEEDER_MARKETS - Feeder Market - 2024 - View focused on understand the performance by hotel for a specific feeder market o selection of feeder marktes.
2,RPPBI0032,Feeder Market - 2024,Jonathan Shields,LifeReport.pbix,EXECUTIVE VIEW,Global view to understand Feeder Market Performance compared to previous years diferentiating between domestic and international,Executive,Productive,,"Hotel, month, Feeder Market, Segment, Channel Mix, Room Type","Total Revenue, Room Revenue, RN, Lead Time, Lenght of Stay, AOV, ADR, ADR Net, %Cost",,,,Priority 1,EXECUTIVE VIEW - Feeder Market - 2024 - Global view to understand Feeder Market Performance compared to previous years diferentiating between domestic and international
3,RPPBI0032,Feeder Market - 2024,Jonathan Shields,LifeReport.pbix,FEEDER MARKET FLOWS,"View focused on understanding the booking behaviour by Feeder Market. It allows to understand when, where and through which channels and segments are producing the different feeder markets for a selected booking period. Besides, it shows the flow (Feeder Market to Destination) by contribution of total revenue",Functional,Productive,,"Hotel, month, Feeder Market, Segment, Channel Mix, Room Type, Booked Year and Booked month","Total Revenue, Room Revenue, RN, Lead Time, Lenght of Stay, AOV, ADR, ADR Net, %Cost",,,,Priority 1,"FEEDER MARKET FLOWS - Feeder Market - 2024 - View focused on understanding the booking behaviour by Feeder Market. It allows to understand when, where and through which channels and segments are producing the different feeder markets for a selected booking period. Besides, it shows the flow (Feeder Market to Destination) by contribution of total revenue"
4,RPPBI0032,Feeder Market - 2024,Jonathan Shields,LifeReport.pbix,FEEDER_MARKET_DETAIL,"Detail view of Feeder Markets by Destination including more indepth view by channel, and including Top_Agency and Top_Company information",Functional,Productive,,"Hotel, month, Feeder Market, Segment, Channel Mix, Room Type","Total Revenue, Room Revenue, RN, Lead Time, Lenght of Stay, AOV, ADR, ADR Net, %Cost",,,,Priority 1,"FEEDER_MARKET_DETAIL - Feeder Market - 2024 - Detail view of Feeder Markets by Destination including more indepth view by channel, and including Top_Agency and Top_Company information"
5,RPPBI0032,Feeder Market - 2024,Jonathan Shields,LifeReport.pbix,FEEDER_MARKETS_OF_DESTINATION,VIew focused on understanding the feeder markets producing at a specific Destination,Functional,Productive,,"Hotel, month, Feeder Market, Segment, Channel Mix, Room Type","Total Revenue, Room Revenue, RN, Lead Time, Lenght of Stay, AOV, ADR, ADR Net, %Cost",,,,Priority 1,FEEDER_MARKETS_OF_DESTINATION - Feeder Market - 2024 - VIew focused on understanding the feeder markets producing at a specific Destination
6,RPPBI0032,Feeder Market - 2024,Jonathan Shields,LifeReport.pbix,MENU,Index page with interactive buttons to other views.,Index,Productive,,,,,,,Priority 1,MENU - Feeder Market - 2024 - Index page with interactive buttons to other views.
7,RPPBI0032,Feeder Market - 2024,Jonathan Shields,LifeReport.pbix,OE MARKET INSIGHTS,Benchmark by Destination. Outside information is provided by Oxford Economics providing a summary developed by AI,Functional,Productive,,"Country, City","Total Spending, Total Revenue, Arrivals, Nights,","Outbound, Inbound",,,Priority 1,OE MARKET INSIGHTS - Feeder Market - 2024 - Benchmark by Destination. Outside information is provided by Oxford Economics providing a summary developed by AI
8,RPPBI0032,Feeder Market - 2024,Jonathan Shields,LifeReport.pbix,TARGETS FOLLOW UP,"View that provides performance vs budget at a feeder Market level. It allows to drill down by destination, segment and channel",Functional,Productive,,"Hotel, month, Feeder Market, Segment, Channel Mix","Total Revenue, Room Revenue, RN,ADR",Budget,,,Priority 1,"TARGETS FOLLOW UP - Feeder Market - 2024 - View that provides performance vs budget at a feeder Market level. It allows to drill down by destination, segment and channel"
9,RPPBI0154,Feeder Market - 2025,Jonathan Shields,OfficerReport.pbix,CRITERIA,Methodolody and definition of the algorithim of Feeder Market,Informative,Productive,,"Hotel, month, Feeder Market, Segment, Channel Mix, Room Type","Total Revenue, Room Revenue, RN, Lead Time, Lenght of Stay, AOV, ADR, ADR Net, %Cost",,,,Priority 1,CRITERIA - Feeder Market - 2025 - Methodolody and definition of the algorithim of Feeder Market


In [31]:
# Aplicar clasificación usando el nuevo texto combinado
df_extended_sample["Predicted Category"] = df_extended_sample["TextToClassify"].apply(
    lambda text: infer_category(text, candidate_labels, multi_label=False)
)

# Show results
df_extended_sample[["Report View", "Description", "Category", "Predicted Category"]].head(20)

Unnamed: 0,Report View,Description,Category,Predicted Category
0,CRITERIA,Methodolody and definition of the algorithim of Feeder Market,Informative,informative
1,DESTINATION_OF_FEEDER_MARKETS,View focused on understand the performance by hotel for a specific feeder market o selection of feeder marktes.,Functional,informative
2,EXECUTIVE VIEW,Global view to understand Feeder Market Performance compared to previous years diferentiating between domestic and international,Executive,executive
3,FEEDER MARKET FLOWS,"View focused on understanding the booking behaviour by Feeder Market. It allows to understand when, where and through which channels and segments are producing the different feeder markets for a selected booking period. Besides, it shows the flow (Feeder Market to Destination) by contribution of total revenue",Functional,functional
4,FEEDER_MARKET_DETAIL,"Detail view of Feeder Markets by Destination including more indepth view by channel, and including Top_Agency and Top_Company information",Functional,informative
5,FEEDER_MARKETS_OF_DESTINATION,VIew focused on understanding the feeder markets producing at a specific Destination,Functional,informative
6,MENU,Index page with interactive buttons to other views.,Index,index
7,OE MARKET INSIGHTS,Benchmark by Destination. Outside information is provided by Oxford Economics providing a summary developed by AI,Functional,other
8,TARGETS FOLLOW UP,"View that provides performance vs budget at a feeder Market level. It allows to drill down by destination, segment and channel",Functional,functional
9,CRITERIA,Methodolody and definition of the algorithim of Feeder Market,Informative,informative
