<a href="https://colab.research.google.com/github/SKumarAshutosh/natural-language-processing/blob/master/NLP_NER_models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Named Entity Recognition (NER).

### 1. What is NER?

Named Entity Recognition (NER) is a sub-task of information extraction that classifies named entities in text into predefined categories such as names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.

Example:

Input: "Apple is planning to buy U.K. startup for $1 billion."

Output:
- "Apple" → ORGANIZATION
- "U.K." → LOCATION
- "$1 billion" → MONEY

### 2. When is it required?

NER is required in various scenarios including:

- Information retrieval: Enhance search by tagging named entities in documents.
- Content recommendation: Based on entities identified in the content.
- Data analytics: Extract structured information from unstructured data sources.
- Knowledge graph construction: Identifying entities and their relations.
- Automating business processes: e.g., extracting financial information or client names from emails.

### 3. Why is it required?

- **Structure to Unstructured Data:** Most of the world's data is unstructured. NER provides a way to extract structured information.
- **Efficiency:** It's an automated way to process large datasets and extract valuable information.
- **Enhanced Analysis:** By identifying entities, data scientists and analysts can focus on more specific aspects of data analysis.

### 4. How can we do NER?

NER can be achieved through:

- **Rule-based Systems:** Using regular expressions and dictionaries to identify named entities.
- **Statistical Models:** Such as Conditional Random Fields (CRF).
- **Deep Learning:** Using architectures like Recurrent Neural Networks (RNNs) or Transformer-based models like BERT.

### 5. Different types of models available for NER:

1. **Rule-based Models:**
    - Pros: Highly specific, can be very accurate for known patterns.
    - Cons: Not adaptive; need manual effort for rule creation.

2. **Statistical Models:**
   - **Hidden Markov Models (HMM):** Uses states and transitions with probabilities.
   - **Conditional Random Fields (CRF):** Especially popular for sequence labeling tasks.
     - Pros: Take into account the context.
     - Cons: Require feature engineering; might be outperformed by deep models.

3. **Deep Learning Models:**
   - **RNNs (LSTM/GRU):** Effective for sequence labeling; consider sequence context.
   - **Bidirectional LSTMs:** Capture forward and backward context.
   - **Transformer-based Models (like BERT, RoBERTa, etc.):** Pre-trained on vast corpora and fine-tuned for NER.
     - Pros: State-of-the-art performance, captures deep contextual information.
     - Cons: Computationally intensive.

4. **Hybrid Models:**
   - Combine rule-based, statistical, and deep learning methods to benefit from each.

5. **Ensemble Models:**
   - Combine predictions from multiple models to achieve better accuracy.

6. **Pre-trained Language Models for Transfer Learning:**
   - Models like BERT, GPT-2, and others can be fine-tuned on specific NER tasks to leverage the knowledge they've acquired during their extensive pre-training.

### Final Note:

The best model often depends on the specific task, the amount and quality of training data, and computational resources. In many real-world applications, a combination of different approaches is used to achieve robust and accurate NER.



---
# Layman's perspective.

### What is NER?

Named Entity Recognition (NER) is like highlighting the names of important things in a story or article. If you read a news article, NER will help underline names of people, places, companies, dates, and more.

### Why and When is it Used?

Imagine you're quickly skimming through a newspaper and just want to know the main people or places mentioned. NER helps with this. Businesses use it to quickly understand documents, researchers use it to summarize content, and search engines use it to categorize information.

### Is NER a Regression or Classification Problem?

NER is a classification problem. Think of it like sorting candies by their colors. Each word (or candy) is given a label (or color).

### Different Models for NER:

1. **Rule-based Models:** Like using a grammar book. If a rule says a word is a name, it's a name.
   - Pros: Simple and straightforward.
   - Cons: Can't adapt to new patterns easily.

2. **Statistical Models:** Imagine asking many friends who often read news about which words are names of companies. Over time, you'd get a good list.
   - Example: Conditional Random Fields (CRF) is a popular method here.

3. **Deep Learning Models:** It's like a very observant person reading thousands of books and noting down names of people, places, and more. Over time, this person gets really good at spotting names.
   - **RNNs:** Think of someone reading a sentence and remembering the previous words to guess the next word's type.
   - **Bidirectional LSTMs:** The same as above, but they remember both previous and upcoming words.
   - **Transformer-based Models (like BERT):** Think of a genius who's read millions of pages and can instantly tell you what each word in a new sentence likely represents.

4. **Hybrid Models:** Combining rules, statistics, and deep learning. Like having a grammar book, friends' suggestions, and a keen observer together.

5. **Ensemble Models:** Asking multiple people (or models) about names in a sentence and then going with the majority vote.

6. **Pre-trained Models:** Imagine someone who's already an expert in recognizing names in English sentences now fine-tuning their skill to recognize names in science articles. That's how models like BERT are used for NER after being pre-trained on lots of general data.

### Conclusion:

NER helps spot names of important things in text. There are many ways to do it, from simple rules to complex deep learning models. The best method often depends on how much data you have and what you want to achieve.




---

# Different NER models based on ML & DL teqniques.

Named Entity Recognition (NER) is a popular task in Natural Language Processing, and over the years, various machine learning and deep learning models have been proposed and used for it. Here's a list:

### Machine Learning Models:

1. **Rule-based Systems:** These systems use hand-crafted rules (often regular expressions) to identify entities.
  
2. **Decision Trees:** They predict the entity label based on features derived from the input text, though they're not the most popular choice for NER.

3. **Hidden Markov Models (HMMs):** These consider the sequence of words and their associated states to predict entities.

4. **Maximum Entropy Markov Models (MEMMs):** Like HMMs but more flexible, allowing for the inclusion of arbitrary features.

5. **Conditional Random Fields (CRFs):** A popular choice for NER in the pre-deep learning era. CRFs consider the entire sequence of words (and their context) to predict entity labels. They can include diverse features, like the word's position in a sentence, its capitalization pattern, and more.

### Deep Learning Models:

1. **Recurrent Neural Networks (RNNs):** They process sequences word-by-word, maintaining a hidden state from previous words to inform predictions for the current word.

   - **Long Short-Term Memory (LSTM):** A type of RNN that's better at capturing long-range dependencies in the data.
   
   - **Bidirectional LSTMs (BiLSTM):** These process the sequence from both directions (start-to-end and end-to-start), providing a more comprehensive view of the context for each word.

2. **Gated Recurrent Units (GRUs):** A variation of RNNs that's simpler than LSTMs but offers similar performance for many tasks.

3. **Convolutional Neural Networks (CNNs):** While mostly used for image processing, they've also been employed for NER, capturing local patterns within the text.

4. **Transformer-based Models:** These models use self-attention mechanisms to weigh the importance of different words in the sequence relative to a given word.

   - **BERT (Bidirectional Encoder Representations from Transformers):** Pre-trained on vast amounts of text and can be fine-tuned for NER.
   
   - **RoBERTa, DistilBERT, ALBERT:** Variations and optimizations of the original BERT architecture.
   
   - **XLNet:** A generalized autoregressive model that outperformed BERT on several benchmarks.
   
   - **GPT (Generative Pre-trained Transformer):** While it's primarily used for generation tasks, with the right setup, it can be adapted for NER.

5. **CRF layer on top of Deep Learning models:** It's common to combine the strengths of CRFs and deep learning by using, for example, a BiLSTM to extract features from sequences, followed by a CRF layer to make the final predictions, considering the sequence's structure.

### Hybrid Models:

These combine elements from both traditional machine learning and deep learning. For instance, using rule-based entity recognition to guide or correct a deep learning model's predictions.

In practice, while traditional machine learning models like CRFs were once the state of the art for NER, deep learning models, particularly transformer-based architectures like BERT, currently dominate the field in terms of performance.
