# <center>NLP</center>

## What is NLP?

Natural Language Processing is a subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to process and analyze large amounts of natural language data.

The ultimate goal of NLP is to enable computers to understand, interpret, and generate human language in a way that is both meaningful and useful.

<center>

![image.png](attachment:image.png) 
 
 </center>

## Need For NLP

In neuro-psychology, linguistics, and the philosophy of language, a **natural language** or **ordinary language** is any language that has evolved naturally in humans through use and repetition without conscious planning or premeditation.

Natural languages can take different forms, such as speech or signing. They are distinguished from constructed and formal languages such as those used to program computers or to study logic.

NLP is needed to bridge the gap between human communication and computer understanding, making interactions more intuitive and efficient.

## Real World Applications

1. Contextual Advertisements: Delivering ads based on the context of the content a user is viewing.
2. Email Clients: Features like spam filtering, auto-categorization, and smart reply suggestions.
3. Social Media: Removing inappropriate content, sentiment analysis, and opinion mining.
4. Search Engines: Improving search results by understanding user queries better.
5. Chatbots: Providing automated customer support and personal assistants.

## Common NLP Tasks

1. Text/Document Classification: Categorizing text into predefined groups.
2. Sentiment Analysis: Determining the sentiment or emotion expressed in a piece of text.
3. Information Retrieval / Named Entity Recognition (NER): Extracting relevant information and identifying entities like names, dates, and places.
4. Parts of Speech Tagging: Identifying the grammatical parts of words in a sentence.
5. Language Detection and Machine Translation: Identifying languages and translating text between languages.
6. Conversational Agents: Developing text-based or speech-based dialogue systems.
7. Knowledge Graph and QA Systems: Creating structured knowledge from text and building question-answering systems.
8. Text Summarization: Creating concise summaries of longer texts.
9. Topic Modelling: Identifying topics within a collection of documents.
10. Text Generation: Automatically generating human-like text.
11. Spell Checking and Grammar Corrections: Identifying and correcting spelling and grammatical errors.
12. Text Parsing: Analyzing the structure of sentences.
13. Speech to Text: Converting spoken language into written text.

## Approaches To NLP

1. Heuristic Methods
2. Machine Learning Based Methods
3. Deep Learning Based Methods

#### **1. Heuristic Methods**
A heuristic is a **mental shortcut that allows people to solve problems and make judgements quickly and efficiently.**

A heuristic, or heuristic technique, is any approach to problem solving or self-discovery that employs a practical method that is not guaranteed to be optimal, perfect, or rational, but is nevertheless sufficient for reaching an immediate, short-term goal or approximation.


**What are heuristic approaches? Examples:**
- Regular Expressions: Used for pattern matching in text, such as finding HTML tags or salutations.
- WordNet: A lexical dictionary for the English language.
- Open Mind Common Sense: A knowledge base of common-sense facts.


**Advantages:**
- Quick Approach
- Often more accurate for simple tasks

**Disadvantages:**

- Not scalable for large, complex tasks.
- Can be brittle and fail when faced with unexpected input.

##### **2. Machine Learning Based Approach:**

The Big Advantage:
    
- Helpful in solving open ended problems
- Converts textual data into numerical form, allowing for rule-based decision making.


**Algorithms Used:**

- Naive Bayes: Simple probabilistic classifier based on Bayes' theorem.
- Logistic Regression: Statistical model for binary classification.
- Support Vector Machines (SVM): Supervised learning models for classification and regression.
- Latent Dirichlet Allocation (LDA): A generative probabilistic model for topic modeling.
- Hidden Markov Models (HMM): Statistical models used for sequence data, like part-of-speech tagging.


**Advantages:**
- More flexible than heuristic methods.
- Can learn from data and improve over time.

**Disadvantages:**
- Requires labeled data for training.
- May not capture complex language patterns.

##### **3. Deep Learning Approaches:**

The Big Advantage:

- In ML approach we used to convert the textual data into numbers, and due to it, most of the time the sequential information of the text is lost. So unlike traditional ML approaches, deep learning models automatically generate features from raw data, preserving the sequential information in text.
- The feature generation is automatically done by the deep learning models.


**Architectures Used:**

- Recurrent Neural Networks (RNN): Suitable for sequence data but can suffer from vanishing gradient problems.
- Long Short-Term Memory (LSTM): A type of RNN that can capture long-term dependencies.
- Gated Recurrent Unit (GRU): Similar to LSTM but with a simplified architecture.
- Convolutional Neural Networks (CNN): Effective for text classification and feature extraction.
- Transformers: State-of-the-art models for NLP tasks, known for their attention mechanisms (e.g., BERT, GPT).

**Advantages:**
- Can handle large amounts of data.
- Automatically generate features, preserving the sequential information in text.

**Disadvantages:**
- Computationally intensive.
- Requires large datasets for training.

## Challenges In NLP

- Ambiguity: Words or sentences with multiple interpretations.
    - *I saw the boy on the beach with my binoculars.*
    - *I have never tested a cake quite like that one before.*
- Contextual Words: Words that change meaning based on context.
    - *I **ran** to the store because we **ran** out of the milk.*
- Colloquialisms and Slang: Informal expressions
    - *piece of cake, pulling your leg*
- Synonyms: Different words with the same or similar meanings.
- Irony, Sarcasm And Tonal Difference: Sentences that mean the opposite of their literal meaning.
    - *That's just what I needed today!*
- Spelling Errors: Mistakes in writing.
- Creativity: Artistic use of language, like in poetry or dialogue.
    - *Poems, dialogues, scripts*
- Diversity: Variability in how people express the same idea.

# The NLP Landscape: From 1960's to 2020's