<div class="alert alert-success"><h1>Topic Classification with Pretrained Models in Python</h1></div>

In this tutorial, we illustrate **Topic Classification** (also known as topic labeling or content classification) - a technique used to assign predefined topics or categories, such as weather, sports, finance, and more, to a given piece of text. This approach is particularly useful for organizing and filtering large streams of textual data, such as news feeds or social media posts.

## Learning Objectives
By the end of this tutorial, you will:
+ **Build a topic classification pipeline:** Create an end-to-end pipeline that leverages pretrained models for topic labeling.
+ **Interpret model outputs:** Analyze and understand the model’s predictions, including the assigned topic labels and their confidence scores.

## Prerequisites
Before we begin, please ensure that you have:
+ A working knowledge of Python, including variables, functions, loops, and basic object-oriented programming.
+ Familiarity with deep learning model development in Python using Keras and TensorFlow.
+ A Python (version 3.x) environment with the `tensorflow`, `keras`, `ipywidgets`, and `transformers` packages installed.

Let's also reduce the log verbosity of the `transformers` package. This ensures that we only get error alerts but not informational logs.

In [None]:
from transformers import logging
logging.set_verbosity_error()

<hr>

## 1. Instantiate a Pipeline for Topic Classification
The first thing we do is import the `pipeline` function from the Hugging Face `transformers` package. Then we instantiate a pipeline object called `topics` while specifying `"text-classification"` as the task. 

Note that Hugging Face does not have a dedicated pipeline name for topic classification. However, we can still use the generic `"text-classification"` pipeline for this purpose, but with a <u>model specifically trained for topic classification</u>. 

Topic classification differs from sentiment analysis in that it assigns one or more topics (rather than a polarity) to a text sample. In this tutorial, we will use the `classla/multilingual-IPTC-news-topic-classifier` model from Hugging Face. This model is specifically designed to classify news articles or short texts into topics based on the **IPTC taxonomy** (see https://www.iptc.org/std/NewsCodes/treeview/mediatopic/mediatopic-en-GB.html). The taxonomy covers a wide range of subjects, including categories like "economy, business and finance," "sport," "science and technology," and "environment."

In [None]:
from transformers import pipeline
model_name = "classla/multilingual-IPTC-news-topic-classifier"
topics = pipeline(task = "text-classification", model = model_name)

<div class="alert alert-info"><b>Note:</b> For guidance on how to choose the right pretrained model for a specific task from the Hugging Face Model Hub, watch the course video titled <b>"Choosing the right Model from the Hugging Face Hub"</b>.</div>

## 2. Run Topic Classification on Sample Text
Next, we prepare a set of sample texts that resemble news headlines or short news blurbs. We then pass these texts to the pipeline, which outputs the predicted topic labels along with corresponding confidence scores.

In [None]:
sample_texts = [
    "Government announces new economic stimulus plan to boost small businesses.",
    "Local football team wins national championship after a thrilling final match.",
    "Scientists discover potential treatment for a rare genetic disease.",
    "Tech giant reveals latest smartphone model with advanced AI features.",
    "Widespread protests erupt over environmental concerns and climate policies.",
]

results = topics(sample_texts)
for text, result in zip(sample_texts, results):
    print(f"Text: {text}\nPrediction: {result}\n")

The pipeline processes each text and returns a dictionary containing the keys `'label'` and `'score'`. The `'label'` represents the predicted topic (for example, "economy, business and finance" or "sport"), while the `'score'` indicates the model's confidence in that prediction. These results offer a practical demonstration of how the model interprets various types of news content. The output can be used to automatically categorize incoming text data, which is particularly useful for news aggregation and content filtering applications.