<a href="https://colab.research.google.com/github/Sagaust/DH-Computational-Methodologies/blob/main/Named_Entity_Recognition.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Named Entity Recognition (NER)

---

**Definition:**  
Named Entity Recognition (NER) is a subtask of information extraction that classifies named entities into predefined categories such as names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, and more.

---

## 📌 **Why is NER important?**

1. **Information Retrieval**: Enhance search algorithms by focusing on main entities.
2. **Content Recommendation**: Recommend articles or news items that mention specific entities.
3. **Data Organization**: Organize content by categorizing based on entities for better content management.
4. **Research & Analysis**: Analyze texts to extract key information points swiftly.

---

## 🛠 **How Does NER Work?**

NER typically involves two main steps:
1. **Entity Identification**: Recognize words or sequences of words that correspond to proper nouns.
2. **Entity Classification**: Categorize the identified entities into predefined categories.

---

## 🌐 **Common Entity Categories**:

- **Persons**: Names of individuals.
- **Organizations**: Names of companies, institutions, etc.
- **Locations**: Names of countries, cities, landmarks, etc.
- **Dates**: Days, months, years, etc.
- **Time**: Hours, minutes, etc.
- **Monetary Values**: Money values along with currency types.
- **Percentages**: Recognition of percentage values.

And many more depending on the specificity of the task.

---

## 📚 **Applications of NER**:

1. **News & Media**: Tagging articles with relevant entities for better content discoverability.
2. **Research**: Swift extraction of key information points from large texts.
3. **Chatbots & Virtual Assistants**: Understand specific entities users refer to.
4. **Content Filtering**: Identify and possibly filter out content based on certain entities.

---

## 💡 **Insights from NER**:

By identifying and categorizing named entities in texts, NER can provide insights into:
- **Main Subjects**: Understand key focuses of the text.
- **Contexts**: Grasp the contexts in which the text operates.
- **Temporal Aspects**: Understand time-based contexts if dates and times are frequently mentioned.
- **Geographical Aspects**: If locations are frequently mentioned, get insights into the geographical contexts of the text.

---

## 🛑 **Challenges in NER**:

1. **Ambiguity**: Words with multiple meanings based on context (e.g., "Apple" as a fruit or the company).
2. **Variability**: Different ways to express the same entity (e.g., "USA", "U.S.A.", "United States").
3. **New Entities**: New or lesser-known entities that may not be in the training data.

---

## 🧪 **NER in spaCy**:

[spaCy](https://spacy.io/), a popular NLP library, provides out-of-the-box support for NER. Here's a simple example:

```python
import spacy
nlp = spacy.load('en_core_web_sm')
doc = nlp("Apple Inc. is planning to open a new store in San Francisco by 2023.")
for ent in doc.ents:
    print(ent.text, ent.label_)
