## **NER (Named Entity Recognition)**

---

###  **What is NER?**

**NER (Named Entity Recognition)** means **finding and labeling important names** in a sentence, such as:

* People
* Organizations
* Places
* Dates
* Money
* Events

🔸 **Example:**

```text
Apple was founded by Steve Jobs in California in 1976.
```

**Entities found:**

* **Apple** → Organization
* **Steve Jobs** → Person
* **California** → Location
* **1976** → Date

---

### **Why is it Needed?**

* Helps extract **important information**
* Useful in **search engines, chatbots, news filtering, etc.**
* Helps identify **who**, **what**, **where**, and **when** in text

---
### **Common Entity Labels:**

| Label   | Meaning              |
| ------- | -------------------- |
| PERSON  | Person names         |
| ORG     | Organization names   |
| GPE     | Country/City/State   |
| DATE    | Dates                |
| TIME    | Time                 |
| MONEY   | Monetary values      |
| LOC     | Non-political places |
| PRODUCT | Product names        |
| EVENT   | Named events         |

---

### **Summary:**

* **NER identifies real-world named things in text**
* spaCy makes it easy using `.ents`
* It’s useful for extracting key data from unstructured text


## **Real-World Examples of NER Usage**

---

### 1. **News Article Tagging**

🔹 **Example Text**:

> "Elon Musk announced that Tesla will build a new Gigafactory in Berlin by 2025."

🔹 **NER Extracts**:

* **Elon Musk** → PERSON
* **Tesla** → ORG
* **Berlin** → GPE (GPE refers to countries, cities, states, or other politically defined locations.)
* **2025** → DATE

🔸 **Usefulness**:
Helps news platforms **auto-tag** articles with people, places, and companies for easy search and recommendation.

---

### 2. **Customer Support Chatbots**

🔹 **Customer says**:

> "I ordered an iPhone 14 from New York last Friday but haven’t received it."

🔹 **NER Extracts**:

* **iPhone 14** → PRODUCT
* **New York** → GPE
* **last Friday** → DATE

🔸 **Usefulness**:
The chatbot uses entities to **understand the issue context** and route it faster.

---

### 3. **Resumé Screening**

🔹 **Example Text**:

> "Worked at Google and Microsoft, completed BSc from MIT, currently living in Dhaka."

🔹 **NER Extracts**:

* **Google**, **Microsoft** → ORG
* **MIT** → ORG
* **Dhaka** → GPE

🔸 **Usefulness**:
Helps companies **auto-extract education, companies, and locations** from hundreds of resumés.

---

### 4. **Medical Report Analysis**

🔹 **Doctor’s Notes**:

> "Patient diagnosed with diabetes and prescribed Metformin by Dr. Smith at Green Valley Hospital."

🔹 **NER Extracts**:

* **diabetes** → DISEASE (with custom model)
* **Metformin** → DRUG
* **Dr. Smith** → PERSON
* **Green Valley Hospital** → ORG

🔸 **Usefulness**:
Used in **healthcare analytics** to track diseases, medicines, and doctors from thousands of reports.

---

### 5. **Financial Document Analysis**

🔹 **Financial News**:

> "Amazon’s shares rose 3% after its Q2 earnings beat expectations in July 2024."

🔹 **NER Extracts**:

* **Amazon** → ORG
* **3%** → PERCENT
* **Q2** → DATE
* **July 2024** → DATE

🔸 **Usefulness**:
Used by **investors** to extract data from **news, reports, and earnings statements** automatically.

---

## **Summary:**

| Industry         | Use of NER                              |
| ---------------- | --------------------------------------- |
| News & Media     | Auto-tag articles and summaries         |
| Customer Support | Understand issues and context           |
| HR & Recruitment | Extract skills, education, companies    |
| Healthcare       | Analyze patient data                    |
| Finance          | Track companies, money, and performance |

---



### **How to Do It with spaCy:**

In [1]:
import spacy

# Load the model
nlp = spacy.load("en_core_web_sm")

In [2]:
nlp.pipe_names

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']

In [2]:
# Input text
text = "Barack Obama was born in Hawaii and became the president of the United States."

# Process the text
doc = nlp(text)

In [3]:
doc.ents

(Barack Obama, Hawaii, the United States)

In [11]:
# Print named entities
for ent in doc.ents:
    print(ent.text,"-->" ,ent.label_)


Barack Obama --> PERSON
Hawaii --> GPE
the United States --> GPE


### **`for ent in doc.ents:`**

* This **loops through all named entities** that spaCy has detected in the `doc` (your processed text).
* `doc.ents` is a list-like container of named entities (like names of people, places, organizations, dates, etc.).

---

### **`ent`**

* Each `ent` is a **Span object** representing one named entity in the text.
* It contains the actual text of the entity and its label (type/category).

---

### **`ent.text`**

* This extracts the **actual text** of the named entity.
* Example: `"Barack Obama"`, `"Hawaii"`, etc.

---

### **`ent.label_`**

* This gives the **label/category** of the entity.
* Example labels:

  * `PERSON` → a person's name
  * `GPE` → a country, city, or state
  * `ORG` → organization
  * `DATE` → date expression

---

### **`print(ent.text, "-->", ent.label_)`**

* This prints the entity text, followed by an arrow, then its label.
* Example output:

  ```
  Barack Obama --> PERSON
  Hawaii --> GPE
  ```

---


In [14]:
from spacy import displacy
displacy.render(doc, style="ent")

### **More Examples**

In [15]:
text="""Elon Musk acquired Twitter (now X) for $44 billion. This deal, which took place in October 2022, 
                  saw Musk take ownership of the social media platform for $54.20 per share. """

In [16]:
doc=nlp(text)

In [None]:
for ent in doc.ents:
    print(ent.text, "-->" , ent.label_)

Elon Musk --> PERSON
Twitter --> PERSON
$44 billion --> MONEY
October 2022 --> DATE
54.20 --> MONEY


In [25]:
span= doc[0:4]
span

Elon Musk acquired Twitter

In [26]:
type(span)

spacy.tokens.span.Span

In [30]:
from spacy.tokens import Span

In [36]:
s1 = Span(doc, 3,4)  # Create a span from token 0 to token 1 (not inclusive)
print(s1.text)


Twitter


In [39]:
s1 = Span(doc, 3,4 , label="ORG")  # Create a span from token 0 to token 1 (not inclusive)
print(s1)

Twitter


In [42]:
doc.set_ents([s1], default="unmodified")

In [43]:
for ent in doc.ents:
    print(ent.text, "-->" , ent.label_)

Elon Musk --> PERSON
Twitter --> ORG
$44 billion --> MONEY
October 2022 --> DATE
54.20 --> MONEY
