# Leveraging Linguistics for Chat

There is immense utility in being able to mine long human texts for particular information or manipulate it in meaningful ways. 

We will take a deep dive of the linguistic features of spaCy (and mention the old Stanford NLTK, CoreNLP for reference). 

We will cover dependency parsing, POS Tagging, Noun Phrase Chunking and Named Entities. 

As mentioned in the Text Cleaning section, these are powerful in combination and not in isolation. We will learn how to use pipelines in spaCy for the same. 

We will start with a walkthrough of the official spaCy guidelines and code examples. This uses the pre-trained models from spaCy itself. 

## Linguistics

### Linguistics Application: Chatbots
Chat bots or conversational systems such as Siri need to have intricate understanding of language to do two main things: 
1. Understand human input in either text or voice
    - this input is different from how we use search, for instance we might enter the exact item we want to buy in Amazon search but we will might Alexa for suggestions on best toys for 3 year olds

2. Generate language response
    - What does a Google search for *Steve Jobs Birthday* return for you? A list of web pages. On the other hand, you would expect Siri not only tell the exact date of Job's birth - but also a proper sentence such as: *Steve Jobs was born on 24 Feb 1955*


The way we study language is referred to as linguistics. This section covers language concepts as applied to Natural Language Processing. 

We have seen some of this when we studied English Grammar back in our school days. Famously, you might want to recap the following: _noun_, _verb_, _gerund_ and so on. 


## Main Headings :
- HEADING 1: Linguistic Roots of English Language
- HEADING 2: Leveraging Language: Example Tasks - Chat bots and Search
- HEADING 3: PoS Tagging, NP Chunking
- HEADING 4: NER: Inbuilt models
- HEADING 5: Gluing it all together

## Skills learned:
- SKILL 1: Linguistic Concepts in NLP
- SKILL 2: Spelling Correction, Slot Filling
- SKILL 3: Linguistic Tasks using spaCy
- SKILL 4: NER with spaCy pipelines
- SKILL 5: End to end spacy implementation with a toy example

### Slot Filling vs NER
The goals of slot filling are different. Slot filling is looking for specific pieces of information with respect to something. For instance, you might ask Siri or Google Assistant - _Who is the spouse of Sachin Tendulkar?_

In this example, you are looking for _spouse_ with respect to _Sachin Tendulkar_. 

Now this response information can be named entities _e.g. who is spouse of this person?_ but can also be other things _e.g. when was this person born?_ 

Exactly what information depends on the application, but Wikipedia info boxes are a good example. 
TKX Add Wiki info box screenshot of Sachin Tendulkar
TKX TIP: Similar thought process is quite useful for building a chatbot, say for customer service. 

NER is more generic and just looks for things. When we mean things, usually they are nouns such as names, like people, companies, places, etc. Your focus is not on the relation between these things. 

For instance, BookMyShow allows you to book tickets via WhatsApp. 

TKX: Add BMS screenshots from phone

TKX: This example needs to be worked out better
The relation is a movie, which has following _slots_:date, screen, theatre/movie hall name. A NER system would just tag <Bengaluru> and <movie name> as names of things, as opposed to 'dfsdf,' which is not the name of a specific individual thing. If the sentence said 'adasda' instead of 'adasda,' NER would pick that out as a movie name. 
    
Some NER taggers will also extract dates, money, and other numbers because they're useful, even though they're not really named entities.

These tasks don't HAVE to be done using sequence tagging, either; slot-filling has been done with templates and multi-stage approaches (extract candidate phrases by tagging and then classify or rank).

In [1]:
import spacy
from spacy import displacy # for visualization
nlp = spacy.load('en')

If there is an error above, try:
- Windows Shell:```python -m spacy download en``` as **Administrator**
- Linux Terminal:```sudo python -m spacy download en ```

## Part of Speech Tags



In [2]:
how_q_ex = 'How can I pay for my orders?'
assertion_ex = 'I make an order by mistake. I won’t pay.'
specific_qex = 'Can I use paypal for order #123?'
when_q_ex = 'When can I get my cashback?'
spelling_ex = 'My order number is #123. How can I pay?'

examples = [how_q_ex, assertion_ex, specific_qex, when_q_ex, spelling_ex]

In [3]:
doc = nlp(assertion_ex)
print(f'text\t|lemma\t|postag\t|tag\t|tag_explain\t\t\t\t   |dep_parser\t   |dep_explain\t\t|stop_word')
for token in doc:
    dep_explain = spacy.explain(token.dep_)
    if type(dep_explain)==type(None):
        dep_explain = 'None'
    
    tag_explain = spacy.explain(token.tag_)   
    print(f'{token.text:<8}|{token.lemma_:<7}|{token.pos_:<7}|{token.tag_:<7}|{tag_explain:42}|{token.dep_:15}|{dep_explain:<25}|{token.is_stop:<7}')

text	|lemma	|postag	|tag	|tag_explain				   |dep_parser	   |dep_explain		|stop_word
I       |-PRON- |PRON   |PRP    |pronoun, personal                         |nsubj          |nominal subject          |0      
make    |make   |VERB   |VBP    |verb, non-3rd person singular present     |ROOT           |None                     |1      
an      |an     |DET    |DT     |determiner                                |det            |determiner               |1      
order   |order  |NOUN   |NN     |noun, singular or mass                    |dobj           |direct object            |0      
by      |by     |ADP    |IN     |conjunction, subordinating or preposition |prep           |prepositional modifier   |1      
mistake |mistake|NOUN   |NN     |noun, singular or mass                    |pobj           |object of preposition    |0      
.       |.      |PUNCT  |.      |punctuation mark, sentence closer         |punct          |punctuation              |0      
I       |-PRON- |PRON   |PRP    |p

There is a wealth of detail in a single pipeline output. For instance, in the above example notice how spaCy assigns a Part-of-Speech tag to every token. These tags can be passed to your pipeline as a feature too, depending on the use case. 

In [4]:
options = {'compact': True, 
           'bg': '#09a3d5',
#            'bg': '#000',
           'color': 'white', 'font': 'Source Sans Pro'}

displacy.render(doc, jupyter=True, style='dep', options=options)

  "__main__", mod_spec)
  "__main__", mod_spec)


## Noun chunks

Sometimes, we want to quickly pull out keywords, or keyphrases from a larger body of text. This helps us mentally paint a picture of what this text is about. This is particularly helpful in analysis of texts like email length. 

We refer to these as noun chunks. Noun chunks are _noun phrases_ - not a single word, but a short phrase which describes the noun. For example, "the blue skies" or "the world’s largest conglomerate". 

To get the noun chunks in a document, simply iterate over Doc.noun_chunks: 

In [5]:
example_sentence = 'James B. Comey, the former F.B.I. director fired by President Trump, said in an ABC News interview that Mr. Trump was “morally unfit to be president,” portraying him as a danger to the nation.'

In [6]:
nlp = spacy.load('en')
doc = nlp(example_sentence)

In [8]:
for chunk in doc.noun_chunks:
    print(f'{chunk.text:<30},{chunk.root.text:<15},{chunk.root.dep_:<7},{spacy.explain(chunk.root.dep_):25},{chunk.root.head.text:<15}')

James B. Comey                ,Comey          ,nsubj  ,nominal subject          ,said           
the former F.B.I. director    ,director       ,appos  ,appositional modifier    ,Comey          
President Trump               ,Trump          ,pobj   ,object of preposition    ,by             
an ABC News interview         ,interview      ,pobj   ,object of preposition    ,in             
Mr. Trump                     ,Trump          ,nsubj  ,nominal subject          ,was            
president                     ,president      ,attr   ,attribute                ,be             
him                           ,him            ,dobj   ,direct object            ,portraying     
a danger                      ,danger         ,pobj   ,object of preposition    ,as             
the nation                    ,nation         ,pobj   ,object of preposition    ,to             
