### What is Natural Language Processing?    
Natural Language Processing is a field of computer science, artificial intelligence concerned with the interactions between computers and human languages, and, in particular, concerned with programming computers to fruitfully proces large natural language data. This field is at the intersection of computer science, artificial intelligence and computational linguistics. Challenges in natural-language processing frequently involve speech recognition, natural-language understanding, and natural-language generation. Fully understanding and representing the meaning of language (or even defining it) is a difficult goal. Perfect language understanding is AI-complete.

__Source__:  
* [Wikipedia](https://en.wikipedia.org/wiki/Natural-language_processing)  
* [Deep Learning for Natural Language Processing, Stanford University School of Engineering](https://www.youtube.com/playlist?list=PL3FW7Lu3i5Jsnh1rnUwq_TcylNr7EkRe6)  

----------------------------------------------------------------------------------------------------------------------

### Levels of Natural Language Processing  

![Levels of Natural Language Processing](https://www.researchgate.net/profile/Puneet_Kumar26/publication/305807469/figure/fig1/AS:391235081654272@1470289120095/Figure-Levels-of-NLP.png)

The levels might differ depending on whether you are working with textual data or speech data.  

__Source__:  
[Kumar, Puneet. (2012), Natural Language Process: New Orientations in English Language Practices, Research Gate](http://bit.ly/2FYSTBa)  

----------------------------------------------------------------------------------------------------------------------

### Phonological Analysis - Interpretation of speech sounds

Phonology is a branch of linguistics concerned with the systematic organization of sounds in languages.

##### Difference between phonetics and phonology

Phonetics simply describes the articulatory and acoustic properties of phones (speech sounds).  Phonology studies how sounds interact as a system in a particular language.  Stated another way, phonetics studies which sounds are present in a language; phonology studies how these sounds combine and how they change in combination, as well as which sounds can contrast to produce differences in meaning (phonology describes the phones as allophones of phonemes). 
      
Example: Released vs. unreleased t
      
__Source__:  
* [Wikipedia](https://en.wikipedia.org/wiki/Phonology)  
* [Phonology](http://pandora.cii.wwu.edu/vajda/ling201/test2materials/Phonology1.htm)  

----------------------------------------------------------------------------------------------------------------------

### Morphological Analysis - arrangement of parts in a word

Morphology is branch of linguistics that studies how words can be structured and formed. In linguistics, morphology is the study of words, how they are formed, and their relationship to other words in the same language.

Example: {Call, Calling, Caller, ...} -> Call

![Morphological Analysis](https://www.cs.bham.ac.uk/~pjh/sem1a5/pt2/pt2_intro_morph_1.gif)

__Source__:  
* [Morphological Analysis](https://www.packtpub.com/mapt/book/big_data_and_business_intelligence/9781787121423/3/ch03lvl1sec26/morphological-analysis)  
* [Wikipedia](http://bit.ly/2t3vtZZ)  

----------------------------------------------------------------------------------------------------------------------

### Lexical Analysis - meaning of each word

In computer science, lexical analysis, lexing or tokenization is the process of converting a sequence of characters into a sequence of tokens (strings with an assigned and thus identified meaning).

It involves identifying and analyzing the structure of words. Lexicon of a language means the collection of words and phrases in a language. Lexical analysis is dividing the whole chunk of txt into paragraphs, sentences, and words.

__Source__:  
* [Wikipedia](https://en.wikipedia.org/wiki/Lexical_analysis)  
* [Lexical Analysis](https://www.tutorialspoint.com/artificial_intelligence/artificial_intelligence_natural_language_processing.htm) 

----------------------------------------------------------------------------------------------------------------------

### Syntactic Analysis - Structure of the grammar

It involves analysis of words in the sentence for grammar and arranging words in a manner that shows the relationship among the words. The sentence such as “The school goes to boy” is rejected by English syntactic analyzer.

![Syntactic Analysis](http://www.doc.ic.ac.uk/~nd/surprise_97/journal/vol1/ctt/tree.gif)

__Source__:  
* [Syntactic Analysis](https://www.tutorialspoint.com/artificial_intelligence/artificial_intelligence_natural_language_processing.htm)  
* [Natural Language Processing... Understanding what you say](http://www.doc.ic.ac.uk/~nd/surprise_97/journal/vol1/ctt/article1.htm)  

----------------------------------------------------------------------------------------------------------------------

### Semantic Analysis - Meaning of sentence  

The meaning of a text is called its semantics. Semantic analysis describes the process of understanding natural language - the way that humans communicate-based on meaning and context. 

It draws the exact meaning or the dictionary meaning from the text. The text is checked for meaningfulness. It is done by mapping syntactic structures and objects in the task domain. 

__Source__:  
* [Semantics - Meaning Representation in NLP](http://www.bowdoin.edu/~allen/nlp/nlp6.html)  
* [AI - Natural Language Processing](https://www.tutorialspoint.com/artificial_intelligence/artificial_intelligence_natural_language_processing.htm)  

----------------------------------------------------------------------------------------------------------------------

### Discourse Analysis - Analysis of larger speech units  

Discourse analysis may be defined as the process of performing text or language analysis, which involves text interpretation and knowing the social interactions. Discourse analysis may involve dealing with morphemes, n-grams, tenses, verbal aspects, page layouts, and so on. 

__Source__:  
* [Introduce discourse analysis](https://www.packtpub.com/mapt/book/big_data_and_business_intelligence/9781787285101/28/ch09lvl1sec0055/introducing-discourse-analysis)  

----------------------------------------------------------------------------------------------------------------------

### Pragmatic Analysis - Purposeful use for situation-specific meaning  

Pragmatic analysis deals with outside word knowledge, which means knowledge that is external to the documents and/or queries. Pragmatics analysis that focuses on what was described is reinterpreted by what it actually meant, deriving the various aspects of language that require real world knowledge.

During this, what was said is re-interpreted on what it actually meant. It involves deriving those aspects of language which require real world knowledge.

__Source__:  
* [Pragmatic Analysis](https://www.packtpub.com/mapt/book/big_data_and_business_intelligence/9781787121423/3/ch03lvl1sec31/pragmatic-analysis)  
* [AI - Natural Language Processing](https://www.tutorialspoint.com/artificial_intelligence/artificial_intelligence_natural_language_processing.htm)  

----------------------------------------------------------------------------------------------------------------------

### Types of applications of Natural Language Processing  

* Automatic Document Summarization
* Information Extraction
* Machine Translation
* Named Entity Recognition
* Natural language generation
* Natural language search
* Natural language understanding
* Question Answering
* Speech Recognition
* Text-to-speech conversion
* Sentiment Analysis
* Text Classification
* Chatbots
* Conversational Search
* Spell checking

__Recommended reading__:  
https://medium.com/@datamonsters/artificial-neural-networks-in-natural-language-processing-bcf62aa9151a

----------------------------------------------------------------------------------------------------------------------

### Why is Natural Language Processing hard?  

* Complexity in representing, learning and using linguistic/situational/world/visual knowledge  
* Human language are ambiguous (unlike programming and other formal languages)  
* Human language interpretation depends on the real world, common sense, and contextual knowledge  

----------------------------------------------------------------------------------------------------------------------

### NLTK Example

In [None]:
import nltk   
from urllib import urlopen

url = "https://en.wikipedia.org/wiki/Natural-language_processing"    
html = urlopen(url).read()    
raw = nltk.clean_html(html)  
print(raw)

In [None]:
print len(raw)

In [None]:
from nltk import word_tokenize
tokens = word_tokenize(raw)
print type(tokens)
print len(tokens)

In [None]:
print raw.find("Language")
print raw.rfind("Language")

In [None]:
import enchant
d = enchant.Dict("en_US")
clean_tokens = [token for token in tokens if d.check(token)]
print len(clean_tokens)
print clean_tokens

In [None]:
clean_tokens = [token for token in tokens if token.isalpha()]
print len(clean_tokens)
print clean_tokens

In [None]:
from nltk import word_tokenize, ne_chunk, pos_tag
chunked = ne_chunk(pos_tag(clean_tokens))

In [None]:
for i in chunked:
    print i

In [None]:
import nltk
from nltk.corpus import stopwords
stop_words = set(stopwords.words('english'))

In [None]:
filtered_tokens = [w for w in clean_tokens if not w in stop_words]
print filtered_tokens

In [None]:
from nltk.corpus import wordnet as wn
wn.synsets('language')