# Named Entity Recognition (NER)

**Named Entity Recognition (NER)** is the task of identifying and classifying named entities (like people, organizations, locations, dates, etc.) in text.

## Common Entity Types in spaCy

| Label | Description | Example |
|-------|-------------|---------|
| `PERSON` | People's names | *Elon Musk* |
| `ORG` | Organizations | *Tesla Inc*, *Google* |
| `GPE` | Countries, cities, states | *India*, *New York* |
| `MONEY` | Monetary values | *$45 billion* |
| `DATE` | Dates or periods | *yesterday*, *2022* |
| `PRODUCT` | Products | *iPhone*, *Windows 11* |
| `EVENT` | Named events | *World War II* |
| `LOC` | Non-GPE locations | *Mount Everest*, *Pacific Ocean* |

## Use Cases
- Information extraction from documents
- Question answering systems
- Knowledge graph construction
- Content recommendation

In [1]:
import spacy

In [2]:
nlp = spacy.load("en_core_web_sm")
nlp.pipe_names

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']

## Loading a Pre-trained Model

Let's load spaCy's English model and check its pipeline components. The NER component should be included.

In [3]:
doc = nlp("Tesla Inc is going to acquire twitter for $45 billion")
for ent in doc.ents:
    print(ent.text, " | ", ent.label_, " | ", spacy.explain(ent.label_))

Tesla Inc  |  ORG  |  Companies, agencies, institutions, etc.
$45 billion  |  MONEY  |  Monetary values, including unit


## Extracting Entities from Text

Entities are accessible via `doc.ents`. Each entity has:
- `text` - The entity text
- `label_` - The entity type
- `start_char` / `end_char` - Character positions

In [4]:
from spacy import displacy

displacy.render(doc, style="ent")

## Visualizing Entities

Use `displacy.render()` to create a visual representation of entities in the text.

<h3>List down all the entities</h3>

In [5]:
nlp.pipe_labels['ner']

['CARDINAL',
 'DATE',
 'EVENT',
 'FAC',
 'GPE',
 'LANGUAGE',
 'LAW',
 'LOC',
 'MONEY',
 'NORP',
 'ORDINAL',
 'ORG',
 'PERCENT',
 'PERSON',
 'PRODUCT',
 'QUANTITY',
 'TIME',
 'WORK_OF_ART']

List of entities are also documented on this page: https://spacy.io/models/en

In [13]:
doc = nlp("Michael Bloomberg founded Bloomberg Inc in 1982")
for ent in doc.ents:
    print(ent.text, "|", ent.label_, "|", spacy.explain(ent.label_))

Michael Bloomberg | PERSON | People, including fictional
Bloomberg Inc | ORG | Companies, agencies, institutions, etc.
1982 | DATE | Absolute or relative dates or periods


## Entity Disambiguation Challenges

NER models can sometimes make mistakes when the same word can be a person or an organization. Notice how "Bloomberg" (the company) might be misidentified.

Above it made a mistake in identifying Bloomberg the company. Let's try hugging face for this now.

https://huggingface.co/dslim/bert-base-NER?text=Michael+Bloomberg+founded+Bloomberg+in+1982

Here also go through 3 sample examples for NER 

In [7]:
doc = nlp("Tesla Inc is going to acquire Twitter Inc for $45 billion")
for ent in doc.ents:
    print(ent.text, " | ", ent.label_, " | ", ent.start_char, "|", ent.end_char)

Tesla Inc  |  ORG  |  0 | 9
Twitter Inc  |  PERSON  |  30 | 41
$45 billion  |  MONEY  |  46 | 57


## Entity Character Positions

Use `start_char` and `end_char` to get the exact position of entities in the text. This is useful for highlighting or extracting entity spans.

<h3>Setting custom entities</h3>

In [8]:
doc = nlp("Tesla is going to acquire Twitter for $45 billion")
for ent in doc.ents:
    print(ent.text, " | ", ent.label_)

Tesla  |  ORG
Twitter  |  PERSON
$45 billion  |  MONEY


In [9]:
s = doc[2:5]
s

going to acquire

## Understanding Spans

A **Span** is a slice of a document. Entities are essentially labeled spans. We can create spans manually using indexing.

In [10]:
type(s)

spacy.tokens.span.Span

In [11]:
from spacy.tokens import Span

s1 = Span(doc, 0, 1, label="ORG")
s2 = Span(doc, 5, 6, label="ORG")

doc.set_ents([s1, s2], default="unmodified")

### Creating Custom Entity Spans

We can manually add entities using `Span` objects. This is useful when the model misses an entity or when we need to add domain-specific entities.

- `Span(doc, start, end, label)` creates a labeled span
- `doc.set_ents()` adds custom entities to the document

In [12]:
for ent in doc.ents:
    print(ent.text, " | ", ent.label_)

Tesla  |  ORG
Twitter  |  ORG
$45 billion  |  MONEY
