<center>

# Named Entity Recognition (NER)

</center>

---

## 1. Introduction

**Named Entity Recognition (NER)** is a key task in **Natural Language Processing (NLP)** that identifies **named entities** in text and classifies them into **predefined categories** such as:

* Person names (e.g., Monower, Elon Musk)
* Organizations (e.g., Google, UN)
* Locations (e.g., Dhaka, London)
* Dates, Times, Percentages, Money amounts, etc.

NER is crucial for **information extraction, question answering, and knowledge graph construction**.


## 2. Why NER is Important

* Extract structured information from unstructured text
* Enhance **search engines** with entity-aware search
* Power **chatbots and AI assistants** with understanding of entities
* Help in **text summarization** and **recommendation systems**


## 3. NER Categories (Common)

| Entity Type | Examples                     |
| ----------- | ---------------------------- |
| PERSON      | Monower, Elon Musk           |
| ORG         | Google, Microsoft            |
| GPE         | Bangladesh, London           |
| LOC         | Mount Everest, Sahara Desert |
| DATE        | January 13, 2026             |
| TIME        | 10:30 AM                     |
| MONEY       | $100, 50 Taka                |
| PERCENT     | 50%, 10 percent              |
| PRODUCT     | iPhone, Tesla Model 3        |
| EVENT       | World Cup, Olympics          |

> **Note:** Different NLP libraries may use slightly different entity sets.


## 4. Approaches to NER

### 4.1 Rule-Based NER

* Uses **predefined patterns, regular expressions, and dictionaries**
* Example: All capitalized words followed by "Inc." → Organization
* Limitation: Cannot generalize well, labor-intensive

### 4.2 Machine Learning-Based NER

* Use features like:

  * Word shapes (capitalization, numbers)
  * POS tags
  * Contextual words
* Algorithms: HMM, CRF (Conditional Random Fields), SVM

### 4.3 Deep Learning NER

* **LSTM-CRF**, **BiLSTM-CRF**, and **Transformer-based models**
* Automatically learn features from data
* Handle **contextual ambiguity** and long sentences


**Output:**

```
Monower Hossen -> PERSON
Dhaka University -> ORG
Python -> LANGUAGE
Microsoft -> ORG
```

* `ent.text` → the entity in text
* `ent.label_` → type of entity

## 5. Applications of NER

1. **Information Extraction:** Extract key information from news, research papers, or resumes
2. **Search Enhancement:** Make search engines entity-aware
3. **Chatbots & Virtual Assistants:** Understand user queries better
4. **Text Summarization:** Identify important entities to summarize content
5. **Knowledge Graphs:** Build structured knowledge from unstructured text


## 6. Challenges in NER

* **Ambiguity:** One entity can have multiple meanings

  * Example: `Apple` → company or fruit
* **Nested Entities:** Entities inside other entities
* **Domain-Specific Text:** Scientific, medical, or legal terms may not be recognized
* **Out-of-Vocabulary Words:** Names or locations not seen in training data



## spacy

In [1]:
import spacy
from spacy import displacy

In [2]:
# Load English model
nlp = spacy.load("en_core_web_sm")

In [3]:
text = """
Apple Inc. is planning to acquire a small startup in the U.K. for $1 billion. 
Tim Cook, the CEO of Apple, announced this on January 10, 2026.
"""

In [4]:
# Process text
doc = nlp(text)

In [5]:
# Extract Named Entities
print("Named Entities, Labels, and Position:")
for ent in doc.ents:
    print(ent.text, ent.label_, ent.start_char, ent.end_char)

Named Entities, Labels, and Position:
Apple Inc. ORG 1 11
U.K. GPE 58 62
$1 billion MONEY 67 77
Tim Cook PERSON 80 88
Apple ORG 101 106
January 10, 2026 DATE 126 142


In [6]:
# Visualization of entities
displacy.render(doc, style="ent", jupyter=True)

## NLTK

In [7]:
import nltk
from nltk import word_tokenize, pos_tag, ne_chunk
from nltk.tree import Tree

In [8]:
nltk.download('maxent_ne_chunker_tab')

[nltk_data] Downloading package maxent_ne_chunker_tab to
[nltk_data]     C:\Users\Asus\AppData\Roaming\nltk_data...
[nltk_data]   Package maxent_ne_chunker_tab is already up-to-date!


True

In [9]:
# Download required NLTK resources
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('maxent_ne_chunker')
nltk.download('words')


[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\Asus\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\Asus\AppData\Roaming\nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package maxent_ne_chunker to
[nltk_data]     C:\Users\Asus\AppData\Roaming\nltk_data...
[nltk_data]   Package maxent_ne_chunker is already up-to-date!
[nltk_data] Downloading package words to
[nltk_data]     C:\Users\Asus\AppData\Roaming\nltk_data...
[nltk_data]   Package words is already up-to-date!


True

In [10]:
# Example text
text = """
Apple Inc. is planning to acquire a small startup in the U.K. for $1 billion. 
Tim Cook, the CEO of Apple, announced this on January 10, 2026.
"""

In [11]:
# Tokenization
tokens = word_tokenize(text)
print("Tokens:", tokens)

Tokens: ['Apple', 'Inc.', 'is', 'planning', 'to', 'acquire', 'a', 'small', 'startup', 'in', 'the', 'U.K.', 'for', '$', '1', 'billion', '.', 'Tim', 'Cook', ',', 'the', 'CEO', 'of', 'Apple', ',', 'announced', 'this', 'on', 'January', '10', ',', '2026', '.']


In [12]:
# Part-of-Speech (POS) tagging
pos_tags = pos_tag(tokens)
print("\nPOS Tags:", pos_tags)


POS Tags: [('Apple', 'NNP'), ('Inc.', 'NNP'), ('is', 'VBZ'), ('planning', 'VBG'), ('to', 'TO'), ('acquire', 'VB'), ('a', 'DT'), ('small', 'JJ'), ('startup', 'NN'), ('in', 'IN'), ('the', 'DT'), ('U.K.', 'NNP'), ('for', 'IN'), ('$', '$'), ('1', 'CD'), ('billion', 'CD'), ('.', '.'), ('Tim', 'NNP'), ('Cook', 'NNP'), (',', ','), ('the', 'DT'), ('CEO', 'NNP'), ('of', 'IN'), ('Apple', 'NNP'), (',', ','), ('announced', 'VBD'), ('this', 'DT'), ('on', 'IN'), ('January', 'NNP'), ('10', 'CD'), (',', ','), ('2026', 'CD'), ('.', '.')]


In [13]:
# Named Entity Recognition (NER) using NLTK's ne_chunk
ner_tree = ne_chunk(pos_tags)

In [14]:
# Function to extract named entities
def extract_named_entities(tree):
    entities = []
    for subtree in tree:
        if isinstance(subtree, Tree):
            entity_name = " ".join([token for token, pos in subtree.leaves()])
            entity_type = subtree.label()
            entities.append((entity_name, entity_type))
    return entities

In [15]:
named_entities = extract_named_entities(ner_tree)
print("\nNamed Entities (NLTK):")
for entity, label in named_entities:
    print(entity, "→", label)


Named Entities (NLTK):
Apple → PERSON
Inc. → ORGANIZATION
Tim Cook → PERSON
CEO → ORGANIZATION
Apple → GPE


<div style="text-align: right;">
    <b>Author:</b> Monower Hossen <br>
    <b>Date:</b> January 14, 2026
</div>
