# Task 1: News Topic Classifier Using BERT

## Objective
To classify news headlines into categories (Sports, Politics, Tech, etc.) using a pre-trained Transformer model (Zero-Shot Classification).

## Methodology
1.  **Data Generation:** Simulating a dataset of news headlines.
2.  **Model Selection:** Using Facebook's BART-Large-MNLI (Zero-Shot Classifier).
3.  **Prediction:** Classifying text without explicit training on these specific labels.

In [1]:
# Import necessary libraries
import pandas as pd
import numpy as np
from transformers import pipeline
from sklearn.metrics import classification_report, accuracy_score

print("Libraries Imported Successfully!")

  from .autonotebook import tqdm as notebook_tqdm


Libraries Imported Successfully!


### Step 1: Data Preparation
We will generate a synthetic dataset containing news headlines from various domains like Sports, Politics, and Technology.

In [2]:
# Generate Synthetic News Data
data = {
    'text': [
        "The football match ended in a draw yesterday.",
        "Government announces new tax reforms for businesses.",
        "Apple releases the new iPhone with AI features.",
        "The stock market crashed due to global inflation.",
        "New health guidelines issued for flu season.",
        "NASA launches a new satellite to Mars.",
        "The cricket world cup final is scheduled for Sunday.",
        "Elections are going to be held next month.",
        "Google introduces a powerful quantum computer.",
        "Oil prices surge as tensions rise in the Middle East."
    ],
    'category': ['Sports', 'Politics', 'Tech', 'Business', 'Health', 'Sci/Tech', 'Sports', 'Politics', 'Tech', 'Business']
}

df = pd.DataFrame(data)
print("Data Preview:")
print(df.head())

Data Preview:
                                                text  category
0      The football match ended in a draw yesterday.    Sports
1  Government announces new tax reforms for busin...  Politics
2    Apple releases the new iPhone with AI features.      Tech
3  The stock market crashed due to global inflation.  Business
4       New health guidelines issued for flu season.    Health


### Step 2: Load Pre-trained Model
We use the Hugging Face `pipeline` for zero-shot classification. This allows the model to classify text into labels it hasn't seen before.

In [3]:
# Load Model (This might take a minute)
print("Loading BERT Model...")
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")
print("Model Loaded!")

Loading BERT Model...


To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
Device set to use cpu


Model Loaded!


### Step 3: Prediction & Evaluation
We define our target labels (`Sports`, `Politics`, `Tech`, etc.) and let the model predict the category for each headline.

In [4]:
# Define candidate labels
candidate_labels = ["Sports", "Politics", "Tech", "Business", "Health", "Sci/Tech"]
predictions = []

print("Running Predictions...")
for text in df['text']:
    res = classifier(text, candidate_labels)
    predictions.append(res['labels'][0]) # Pick the highest probability label

# Evaluation
print("\n--- Model Performance ---")
print("Accuracy:", accuracy_score(df['category'], predictions))
print("\nClassification Report:\n", classification_report(df['category'], predictions))

Running Predictions...

--- Model Performance ---
Accuracy: 0.8

Classification Report:
               precision    recall  f1-score   support

    Business       0.50      0.50      0.50         2
      Health       1.00      1.00      1.00         1
    Politics       1.00      0.50      0.67         2
    Sci/Tech       1.00      1.00      1.00         1
      Sports       1.00      1.00      1.00         2
        Tech       0.67      1.00      0.80         2

    accuracy                           0.80        10
   macro avg       0.86      0.83      0.83        10
weighted avg       0.83      0.80      0.79        10

