# ___Natural Language Processing___
---

## ___What is NLP?___

_NLP stands for Natural Language Processing, which is a part of Computer Science, Human language, and Artificial Intelligence. It is the technology that is used by machines to understand, analyse, manipulate, and interpret human's languages. It helps developers to organize knowledge for performing tasks such as translation, automatic summarization, Named Entity Recognition (NER), speech recognition, relationship extraction, and topic segmentation._

## ___Components of NLP___
_There are the following two components of NLP:_

### ___Natural Language Understanding (NLU)___

_Natural Language Understanding (NLU) helps the machine to understand and analyse human language by extracting the metadata from content such as concepts, entities, keywords, emotion, relations, and semantic roles._

_NLU mainly used in Business applications to understand the customer's problem in both spoken and written language._

_NLU involves the following tasks:_

* _It is used to map the given input into useful representation._
* _It is used to analyze different aspects of the language._


### ___Natural Language Generation (NLG)___

_Natural Language Generation (NLG) acts as a translator that converts the computerized data into natural language representation. It mainly involves Text planning, Sentence planning, and Text Realization._


### ___Difference between NLU and NLG___
<table>
<tbody>
<tr>
	<th>NLU</th>
	<th>NLG</th>
</tr>
<tr>
	<td>NLU is the process of reading and interpreting language.</td>
	<td>NLG is the process of writing or generating language.</td>
</tr>
<tr>
	<td>It produces non-linguistic outputs from natural language inputs.</td>
	<td>It produces constructing natural language outputs from non-linguistic inputs.</td>
</tr>
</tbody>
</table>

## ___Few Applications of NLP___

* _Question Answering_
* _Spam Detection_
* _Sentiment Analysis/Opinion Mining_
* _Machine Translation_
* _Spelling correction_
* _Speech Recognition_
* _Chatbot_
* _Information extraction_
* _Autocomplete_
* _Predictive Typing_
* _Named Entity Recognition_

## ___Rule-based NLP vs. Statistical NLP___
_Natural Language Processing is separated in two different approaches:_
### ___Rule-based Natural Language Processing___
It uses common sense reasoning for processing tasks. For instance, the freezing
temperature can lead to death, or hot coffee can burn people’s skin, along with other
common sense reasoning tasks. However, this process can take much time, and it
requires manual effort.
### ___Statistical Natural Language Processing___
It uses large amounts of data and tries to derive conclusions from it. Statistical NLP uses
machine learning algorithms to train NLP models. After successful training on large
amounts of data, the trained model will have positive outcomes with deduction.

## ___NLP pipeline___

_There are the following steps to build an NLP pipeline:_

### ___Sentence Segmentation___

_Sentence Segment is the first step for building the NLP pipeline. It breaks the paragraph into separate sentences._

_Example: Consider the following paragraph:_

_Independence Day is one of the important festivals for every Indian citizen. It is celebrated on the 15th of August each year ever since India got independence from the British rule. The day celebrates independence in the true sense._

_Sentence Segment produces the following result:_

* _"Independence Day is one of the important festivals for every Indian citizen."_
* _"It is celebrated on the 15th of August each year ever since India got independence from the British rule."_
* _"This day celebrates independence in the true sense."_

### ___Word Tokenization___

_Word Tokenizer is used to break the sentence into separate words or tokens._

_Example:_

_We are Learning NLP._

_Word Tokenizer generates the following result:_

_"We", "are", "Learning", "NLP", "."_

### ___Stemming___

_Stemming is used to normalize words into its base form or root form. For example, celebrates, celebrated and celebrating, all these words are originated with a single root word "celebrate." The big problem with stemming is that sometimes it produces the root word which may not have any meaning._

_For Example, intelligence, intelligent, and intelligently, all these words are originated with a single root word "intelligen." In English, the word "intelligen" do not have any meaning._

### ___Lemmatization___

_Lemmatization is quite similar to the Stamming. It is used to group different inflected forms of the word, called Lemma. The main difference between Stemming and lemmatization is that it produces the root word, which has a meaning._

_For example: In lemmatization, the words intelligence, intelligent, and intelligently has a root word intelligent, which has a meaning._

### ___Identifying Stop Words___

_In English, there are a lot of words that appear very frequently like "is", "and", "the", and "a". NLP pipelines will flag these words as stop words. Stop words might be filtered out before doing any statistical analysis._

_Example: He is a good boy._

### ___Dependency Parsing___

_Dependency Parsing is used to find that how all the words in the sentence are related to each other._

### ___POS tags___

_POS stands for parts of speech, which includes Noun, verb, adverb, and Adjective. It indicates that how a word functions with its meaning as well as grammatically within the sentences. A word has one or more parts of speech based on the context in which it is used._

_Example: "Google" something on the Internet._

_In the above example, Google is used as a verb, although it is a proper noun._

### ___Named Entity Recognition (NER)___

_Named Entity Recognition (NER) is the process of detecting the named entity such as person name, movie name, organization name, or location._

_Example: Steve Jobs introduced iPhone at the Macworld Conference in San Francisco, California._

### ___Chunking___

_Chunking is used to collect the individual piece of information and grouping them into bigger pieces of sentences._

## ___Phases of NLP___

_There are the following five phases of NLP:_

![image.png](attachment:image.png)

### ___Lexical Analysis and Morphological___

_The first phase of NLP is the Lexical Analysis. This phase scans the source code as a stream of characters and converts it into meaningful lexemes. It divides the whole text into paragraphs, sentences, and words._

### ___Syntactic Analysis (Parsing)___

_Syntactic Analysis is used to check grammar, word arrangements, and shows the relationship among the words._

_Example: Agra goes to the Poonam_

_In the real world, Agra goes to the Poonam, does not make any sense, so this sentence is rejected by the Syntactic analyzer._

### ___Semantic Analysis___

_Semantic analysis is concerned with the meaning representation. It mainly focuses on the literal meaning of words, phrases, and sentences. Sentences such as “hot ice-cream” do not pass._

### ___Discourse Integration___

_Discourse Integration depends upon the sentences that proceeds it and also invokes the meaning of the sentences that follow it. For example: “He works at Google.” In this sentence, “he” must be referenced in the sentence before it._

### ___Pragmatic Analysis___

_Pragmatic is the fifth and last phase of NLP. It helps you to discover the intended effect by applying a set of rules that characterize cooperative dialogues._

_For Example: "Open the door" is interpreted as a request instead of an order._

## ___Why NLP is difficult?___

_NLP is difficult because Ambiguity and Uncertainty exist in the language._

### ___Ambiguity___

_There are the following three ambiguity:_

#### ___Lexical Ambiguity___
_Lexical Ambiguity exists in the presence of two or more possible meanings of the sentence within a single word._

_Example:_

_Manya is looking for a match._

_In the above example, the word match refers to that either Manya is looking for a partner or Manya is looking for a match. (Cricket or other match)_

#### ___Syntactic Ambiguity___
_Syntactic Ambiguity exists in the presence of two or more possible meanings within the sentence._

_Example:_

_I saw the girl with the binocular._

_In the above example, did I have the binoculars? Or did the girl have the binoculars?_

#### ___Referential Ambiguity___
_Referential Ambiguity exists when you are referring to something using the pronoun._

_Example: Kiran went to Sunita. She said, "I am hungry."_

_In the above sentence, you do not know that who is hungry, either Kiran or Sunita._

### ___Pragmatic Ambiguity___
_Such kind of ambiguity refers to the situation where the context of a phrase gives it multiple interpretations. In simple words, we can say that pragmatic ambiguity arises when the statement is not specific._

_For example, the sentence “I like you too” can have multiple interpretations like I like you (just like you like me), I like you (just like someone else does)._

## ___Standard NLP Workflow___

![image.png](attachment:image.png)

## ___NLP Libraries___

<table>
<tbody>
<tr>
	<th>LIBRARY</th>
	<th>DESCRIPTION</th>
</tr>
<tr>
	<td>Scikit-learn</td>
	<td>It provides a wide range of algorithms for building machine learning models in Python.</td>
</tr>
<tr>
	<td>Natural language Toolkit (NLTK)</td>
	<td>NLTK is a complete toolkit for all NLP techniques.</td>
</tr>
<tr>
	<td>Pattern</td>
	<td>It is a web mining module for NLP and machine learning.</td>
</tr>
<tr>
	<td>TextBlob</td>
	<td>It provides an easy interface to learn basic NLP tasks like sentiment analysis, noun phrase extraction, or pos-tagging.</td>
</tr>
<tr>
	<td>SpaCy</td>
	<td>SpaCy is an open-source NLP library which is used for Data Extraction, Data Analysis, Sentiment Analysis, and Text Summarization.</td>
</tr>
<tr>
	<td>Gensim</td>
	<td>Gensim works with large datasets and processes data streams.</td>
</tr>
<tr>
	<td>Quepy</td>
	<td>Quepy is used to transform natural language questions into queries in a database query language.</td>
</tr>
<tr>
	<td>Stanford CoreNLP</td>
	<td>Python	For client-server based architecture this is a good library in NLTK. This is written in JAVA, but it provides modularity to use it in Python.</td>
</tr>
<tr>
	<td>Polyglot</td>
	<td>For massive multilingual applications, Polyglot is best suitable NLP library. Feature extraction in the way on Identity and Entity.</td>
</tr>
<tr>
	<td>PyNLPI</td>
	<td>PyNLPI also was known as 'Pineapple' and supports Python. It provides a parser for many data format like FoLiA/Giza/Moses/ARPA/Timbl/CQL.</td>
</tr>
<tr>
	<td>Vocabulary</td>
	<td>This library is best to get Semantic type information from the given text.</td>
</tr>
<tr>
	<td>Language Understanding (LUIS)</td>
	<td> A machine learning-based service to build natural language into apps, bots, and IoT devices. Quickly create enterprise-ready, custom models that continuously improve.</td>
</tr>
<tr>
	<td>pyLDAvis</td>
	<td>pyLDAvis is designed to help users interpret the topics in a topic model that has been fit to a corpus of text data. The package extracts information from a fitted LDA topic model to inform an interactive web-based visualization.</td>
</tr>
</tbody>
</table>

### ___NLTK (Natural Language Toolkit)___
_The NLTK Python framework is generally used as an education and research tool. It’s not usually used on production applications. However, it can be used to build exciting programs due to its ease of use._

___Features___
* _Tokenization._
* _Part Of Speech tagging (POS)._
* _Named Entity Recognition (NER)._
* _Classification._
* _Sentiment analysis._
* _Packages of chatbots._

___Use-cases___
* _Recommendation systems_
* _Sentiment analysis_
* _Building chatbots_

### ___spaCy___
_spaCy is an open-source natural language processing Python library designed to be fast and production-ready. spaCy focuses on providing software for production usage._

___Features___
* _Tokenization_
* _Part Of Speech tagging (POS)_
* _Named Entity Recognition (NER)_
* _Classification_
* _Sentiment analysis_
* _Dependency parsing_
* _Word vectors_

___Use-cases___
* _Autocomplete and autocorrect_
* _Analyzing reviews_
* _Summarization_

### ___Gensim___
_Gensim is an NLP Python framework generally used in topic modeling and similarity detection. It is not a general-purpose NLP library, but it handles tasks assigned to it very well._

___Features___
* _Latent semantic analysis_
* _Non-negative matrix factorization_
* _TF-IDF_

___Use-cases___
* _Converting documents to vectors_
* _Finding text similarity_
* _Text summarization_

### ___Pattern___
_Pattern is an NLP Python framework with straightforward syntax. It’s a powerful tool for scientific and non-scientific tasks. It is highly valuable to students._

___Features___
* _Tokenization_
* _Part of Speech tagging_
* _Named entity recognition_
* _Parsing_
* _Sentiment analysis_

___Use-cases___
* _Spelling correction_
* _Search engine optimization_
* _Sentiment analysis_

### ___TextBlob___
_TextBlob is a Python library designed for processing textual data._

___Features___
* _Part-of-Speech tagging_
* _Noun phrase extraction_
* _Sentiment analysis_
* _Classification_
* _Language translation_
* _Parsing_
* _Wordnet integration_

___Use-cases___
* _Sentiment Analysis_
* _Spelling Correction_
* _Translation and Language Detection_