## Setup and Load Zero-Shot Pipeline

Load the Hugging Face `zero-shot-classification` pipeline using `facebook/bart-large-mnli`. The model will download on first run and then be cached locally.

In [3]:
from transformers import pipeline


In [4]:
pipe = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")




Device set to use cpu


In [5]:
result = pipe(
    "I love watching cricket.",
    ["sports", "finance", "politics"]
)

print(result)


{'sequence': 'I love watching cricket.', 'labels': ['sports', 'finance', 'politics'], 'scores': [0.9984119534492493, 0.0008899097447283566, 0.0006980590405873954]}


## Built-in 50-sample News Dataset

A small curated list of 50 short news headlines / snippets across five categories: `sports`, `business`, `politics`, `technology`, `entertainment`. This will be used to evaluate the zero-shot classifier.

The dataset is a list of dictionaries with fields `text` and `label'.


In [6]:
dataset = [
# SPORTS (10)
{"text": "Manchester United beat Liverpool 2-1 in a thrilling match.", "label": "sports"},
{"text": "The Olympic sprinter set a new national record in the 100m final.", "label": "sports"},
{"text": "Tennis champion withdraws from the tournament due to injury.", "label": "sports"},
{"text": "Local high school basketball team wins the state championship.", "label": "sports"},
{"text": "Cricket captain announces retirement after the World Cup.", "label": "sports"},
{"text": "Star forward transfers to a top European club for a record fee.", "label": "sports"},
{"text": "The marathon drew thousands of runners from across the country.", "label": "sports"},
{"text": "Coach praises team for their defensive performance in the playoffs.", "label": "sports"},
{"text": "Formula 1 driver takes pole position ahead of the grand prix.", "label": "sports"},
{"text": "Swimmer wins gold and breaks the meet record at nationals.", "label": "sports"},

# BUSINESS (10)
{"text": "Tech startup raises $50 million in a Series B funding round.", "label": "business"},
{"text": "The central bank held interest rates steady amid inflation concerns.", "label": "business"},
{"text": "Company reports quarter-on-quarter revenue growth exceeding forecasts.", "label": "business"},
{"text": "Retail giant plans to open 100 new stores this year.", "label": "business"},
{"text": "Oil prices rose after reports of supply disruptions in the region.", "label": "business"},
{"text": "Automaker recalls thousands of vehicles over a safety issue.", "label": "business"},
{"text": "Local bakery expands online delivery service to new neighborhoods.", "label": "business"},
{"text": "Merger talks between two major airlines gain momentum.", "label": "business"},
{"text": "Startup pivots its business model to focus on enterprise customers.", "label": "business"},
{"text": "Cryptocurrency exchange announces new security measures after breach.", "label": "business"},

# POLITICS (10)
{"text": "Parliament passed a new education reform bill after lengthy debate.", "label": "politics"},
{"text": "The president met with foreign leaders to discuss trade agreements.", "label": "politics"},
{"text": "Opposition parties plan to stage protests over proposed tax changes.", "label": "politics"},
{"text": "Local elections saw a record turnout of young voters.", "label": "politics"},
{"text": "Government announced a stimulus package to boost the economy.", "label": "politics"},
{"text": "Senator faces criticism over comments made at a public event.", "label": "politics"},
{"text": "New diplomatic talks aim to reduce tensions between the countries.", "label": "politics"},
{"text": "The mayor unveiled a plan for urban development and housing.", "label": "politics"},
{"text": "Court rules on a landmark case that could affect future legislation.", "label": "politics"},
{"text": "Parliamentary committee to investigate recent policy failures.", "label": "politics"},

# TECHNOLOGY (10)
{"text": "A major software company launched its next-generation AI model.", "label": "technology"},
{"text": "Researchers unveiled a breakthrough in battery technology for EVs.", "label": "technology"},
{"text": "Cybersecurity experts warn of a new phishing campaign targeting users.", "label": "technology"},
{"text": "Mobile phone manufacturer reveals the specs of its flagship device.", "label": "technology"},
{"text": "Open-source project reaches one million downloads worldwide.", "label": "technology"},
{"text": "Cloud provider announces a new region and availability zones.", "label": "technology"},
{"text": "Startups competing to build quantum computing applications.", "label": "technology"},
{"text": "Developers discuss best practices for building reliable APIs.", "label": "technology"},
{"text": "New programming language features aim to improve developer productivity.", "label": "technology"},
{"text": "Researchers publish a paper on AI ethics and fairness in models.", "label": "technology"},

# ENTERTAINMENT (10)
{"text": "Film festival opens with a premiere of an anticipated indie movie.", "label": "entertainment"},
{"text": "Pop star releases a surprise album that tops the charts.", "label": "entertainment"},
{"text": "TV series renewed for another season after strong viewer ratings.", "label": "entertainment"},
{"text": "Actor wins award for best performance at the international ceremony.", "label": "entertainment"},
{"text": "Local theater stages a classic play to sold-out audiences.", "label": "entertainment"},
{"text": "Documentary explores the life of a famous musician.", "label": "entertainment"},
{"text": "Music festival lineup features several headline acts this summer.", "label": "entertainment"},
{"text": "Critics praise the director's bold new approach in the film.", "label": "entertainment"},
{"text": "Celebrity couple announces engagement on social media.", "label": "entertainment"},
{"text": "Box office receipts climb as moviegoers flock to theaters.", "label": "entertainment"},
]

len(dataset)


50

In [8]:
import random
random.shuffle(dataset)


##  Evaluation (50 samples)
The zero-shot pipeline will run on each sample, using the candidate labels `['sports','business','politics','technology','entertainment']`. The predicted top label and compute accuracy by comparing to the ground-truth label will be recorded.

In [9]:
candidate_labels = ['sports','business','politics','technology','entertainment']

preds = []
correct = 0

for item in dataset:
    text = item['text']
    true_label = item['label']
    result = pipe(text, candidate_labels)
    predicted = result['labels'][0]
    score = result['scores'][0]
    preds.append({
        'text': text,
        'true_label': true_label,
        'predicted': predicted,
        'score': score
    })
    if predicted == true_label:
        correct += 1

accuracy = correct / len(dataset)
print(f"Total samples: {len(dataset)}, Correct: {correct}, Accuracy: {accuracy:.4f}")

# Show a few sample predictions
import pandas as pd
df = pd.DataFrame(preds)
df.head(10)


Total samples: 50, Correct: 47, Accuracy: 0.9400


Unnamed: 0,text,true_label,predicted,score
0,Critics praise the director's bold new approac...,entertainment,entertainment,0.61804
1,New diplomatic talks aim to reduce tensions be...,politics,politics,0.870285
2,Tech startup raises $50 million in a Series B ...,business,technology,0.849775
3,The mayor unveiled a plan for urban developmen...,politics,politics,0.691252
4,Cryptocurrency exchange announces new security...,business,business,0.603959
5,Box office receipts climb as moviegoers flock ...,entertainment,business,0.838008
6,Film festival opens with a premiere of an anti...,entertainment,entertainment,0.847542
7,A major software company launched its next-gen...,technology,technology,0.876697
8,Documentary explores the life of a famous musi...,entertainment,entertainment,0.899127
9,Open-source project reaches one million downlo...,technology,technology,0.761734
