## Named Entity Recognition (NER) with spaCy

### 1. **Context**
**Named Entity Recognition (NER)** is a core task in Natural Language Processing (NLP) that focuses on identifying and classifying entities such as names of people, organizations, locations, dates, monetary values, and other specific terms in text. NER helps in understanding and extracting meaningful information from unstructured text, making it an essential step for many NLP applications such as information extraction, question answering, and summarization.

In this notebook, we will explore **Named Entity Recognition (NER)** using the **spaCy** library, a powerful NLP toolkit.

---

### 2. **Install spaCy**
Before starting, make sure spaCy is installed on your system. You can install it using the following command:

```bash
!pip install spacy

### 3. Load spaCy and the Language Model
Once spaCy is installed and the model is downloaded, let's load the language model.

In [2]:
import spacy

# Load the English language model
nlp = spacy.load('en_core_web_sm')

### 4. NER Example
Let's take a simple example to perform Named Entity Recognition on a sentence.

In [3]:
# Example sentence
sentence = "Apple is looking to buy a startup based in San Francisco for $1 billion."

# Process the text through spaCy
doc = nlp(sentence)

# Extract named entities from the processed document
entities = [(entity.text, entity.label_) for entity in doc.ents]

print("Named Entities:", entities)

Named Entities: [('Apple', 'ORG'), ('San Francisco', 'GPE'), ('$1 billion', 'MONEY')]


In the above example:

* **Apple** is recognized as an organization (ORG).
* **San Francisco** is recognized as a geopolitical entity (GPE).
* **$1 billion** is recognized as money (MONEY).

### 5. Explanation of NER Output

Each entity in the output consists of:

- **Text**: The entity itself (e.g., "Apple", "San Francisco").
- **Label**: The type of entity, such as:
  - **ORG**: Organization
  - **GPE**: Geopolitical entity (e.g., countries, cities)
  - **MONEY**: Monetary values
  - **PERSON**: Names of people
  - **DATE**: Dates and time expressions

For a complete list of NER labels, you can refer to [spaCy’s documentation](https://spacy.io/usage/linguistic-features#named-entities).


### 6. NER on Larger Text
NER can be applied to larger texts as well. Here’s how you can process paragraphs of text.

In [6]:
# Example paragraph
paragraph = """
Barack Obama was born on August 4, 1961, in Honolulu, Hawaii. He was elected as the 44th President of the United States in 2008.
"""

# Process the paragraph through spaCy
doc_paragraph = nlp(paragraph)

# Extract named entities
entities_paragraph = [(entity.text, entity.label_) for entity in doc_paragraph.ents]

print("Named Entities in Paragraph:", entities_paragraph)

Named Entities in Paragraph: [('Barack Obama', 'PERSON'), ('August 4, 1961', 'DATE'), ('Honolulu', 'GPE'), ('Hawaii', 'GPE'), ('44th', 'ORDINAL'), ('the United States', 'GPE'), ('2008', 'DATE')]


### 7. Customizing NER (Training a Custom NER Model)

spaCy allows you to train a custom NER model for identifying specific types of entities that are not covered by the pre-trained model. Here's a high-level overview of how to add custom entities:

1. **Collect and Annotate Training Data**: Prepare a dataset with annotated examples of your custom entities.
2. **Train the NER Component**: Use the annotated data to train the NER component.

This process involves:
- Setting up a **blank model**.
- Annotating the data for your specific entities.
- Using spaCy’s **training API** to create a custom model.

> Note: Training a custom NER model is an advanced process that requires a deeper understanding of spaCy and access to well-annotated data.


### 8. NER for Multiple Languages
spaCy supports NER for multiple languages. You can load different language models based on the language of your input text. For example, to use the French model, you would download and load it like this:

In [9]:
%%capture
!python -m spacy download fr_core_news_sm

In [8]:
# Load French language model
nlp_fr = spacy.load('fr_core_news_sm')

# Process a French sentence
sentence_fr = "Emmanuel Macron est le président de la France."
doc_fr = nlp_fr(sentence_fr)

# Extract named entities
entities_fr = [(entity.text, entity.label_) for entity in doc_fr.ents]
print("French Named Entities:", entities_fr)

French Named Entities: [('Emmanuel Macron', 'PER'), ('la France', 'LOC')]


### 9. Conclusion

Named Entity Recognition (NER) is a powerful tool for extracting useful information from text. By identifying entities like names of people, organizations, locations, dates, and more, NER helps in understanding the content of unstructured data. **spaCy** provides an easy-to-use API for performing NER, supports various languages, and can be customized for specific use cases.

#### Key Takeaways:
- **NER** helps identify and classify important entities in text.
- **spaCy** offers pre-trained models for English and several other languages.
- **Custom models** can be trained for domain-specific entities if required.

---

### 10. Further Enhancements

While spaCy’s pre-trained NER models are highly accurate, there are other ways to enhance NER:

1. **Domain-specific NER**: Train custom NER models to recognize entities relevant to specific industries (e.g., healthcare, finance).
2. **Deep Learning Approaches**: Use models like **BERT** and other transformer-based models to improve NER accuracy.
3. **NER with Context**: Consider using context to disambiguate entities that could have multiple meanings (e.g., "bank" as a financial institution vs. "bank" as a riverbank).
