<div class="alert alert-success"><h1>Named Entity Recognition with Pretrained Models in Python</h1></div>

**Named Entity Recognition (NER)** is a vital component of natural language processing (NLP) that transforms unstructured text into structured data by identifying and classifying key entities such as persons, locations, and organizations. In this tutorial, we use pre-trained Hugging Face models for NER. 

## Learning Objectives
By the end of this tutorial, you will:
+ **Set up a token classification pipeline for NER:** Build an end-to-end pipeline using the Hugging Face API to perform token classification.
+ **Interpret model outputs:** Analyze the outputs from the NER pipeline, understanding entity labels, confidence scores, and token positions.


## Prerequisites
Before we begin, please ensure that you have:
+ A working knowledge of Python, including variables, functions, and basic object-oriented programming.
+ Familiarity with deep learning model development in Python using Keras and TensorFlow.
+ A Python (version 3.x) environment with the `tensorflow`, `keras`, `pandas`, `ipywidgets`, and `transformers` packages installed.

Let's also reduce the log verbosity of the `transformers` package. This ensures that we only get error alerts but not informational logs.

In [None]:
from transformers import logging
logging.set_verbosity_error()

<hr>

## 1. Instantiate a Pipeline for Named Entity Recognition
The first thing we do is import the `pipeline` function from the Hugging Face `transformers` package. Then we instantiate a pipeline object called `recognizer` while specifying `"token-classification"` as the task. Note that NER is the default task for token classification.

In [None]:
from transformers import pipeline
recognizer = pipeline(task = "token-classification")

## 2. Run Named Entity Recognition on Sample Text
Next, we pass a short piece of text to our NER pipeline so it can extract the entities.

In [None]:
sample_text = "Barack Obama was born in Hawaii."
result = recognizer(sample_text)
display(result)

The output is a list of dictionaries where each dictionary represents a detected entity. Each dictionary includes the `entity_group` (e.g., `I-PER` for person, `I-LOC` for location), the `score` (model's confidence), the text span (`word`), and the character positions (`start` and `end`) of the entity within the input text. In this case, the model correctly identifies **"Barack"** and **"Obama"** as referring to a person and **"Hawaii"** as a location.

If we want sub-tokens to be unified into a single entity mention, we can set `aggregation_strategy = "simple"` within the pipeline. For instance, this would unify `"Barack"` and `"Obama"` into `"Barack Obama"`.

In [None]:
result = recognizer(sample_text, aggregation_strategy = "simple")
display(result)

For improved readability and further analysis, it can also be useful to convert the list of entity dictionaries into a Pandas DataFrame.

In [None]:
import pandas as pd
named_entities = pd.DataFrame(result)
display(named_entities)

By converting the results into a DataFrame, we can quickly inspect and manipulate the data. This is especially useful for applications such as customer feedback analysis, where we may want to track mentions of brands, locations, or individuals over time.

To further illustrate the capabilities of our NER pipeline, let's apply it to a longer piece of text that resembles customer feedback or a product review. 

In [None]:
sample_text = """
I recently purchased the new Apple iPhone 14 from Best Buy in Manhattan. 
The staff, especially John Smith, was incredibly helpful and explained all the features in detail. 
I’ve also tried Samsung smartphones in the past, but this iPhone’s battery life and camera are a huge upgrade. 
I’d definitely recommend it to my friends!
"""
result = recognizer(sample_text, aggregation_strategy = "simple")
named_entities = pd.DataFrame(result)
display(named_entities)

In this longer text, the NER model identifies multiple entities, including product names (e.g., "Apple iPhone 14"), organizations (e.g., "Best Buy", "Samsung"), locations (e.g., "Manhattan"), and person names (e.g., "John Smith"). This demonstrates how NER can help you extract structured insights from unstructured text, which is particularly valuable in customer feedback analysis and market research.