# Building a Chatbot from scratch (without using any 3rd party API)

Train an NLP that identifies the intent. 

Here's how a sample conversation needs to go. I'll refer to the coffee bot as Alex as that makes things easier.
- Me: Hello!
- Alex: Hi, what is your name?
- Me: I'm Surya.
- Alex: What do you want to have?
- Me: I'd like a latte.
- Alex: Thanks for ordering a latte, you'll have it in a few minutes.
- Me: Thank you.



Few use cases:
- Greetings, finding name
    - I'm Surya.
    - Raju.
    - My name is Sandeep.
- Ordering 
    - Would like something to eat
    - Show me the menu
    - Can I have a latte? 
- Cancelling
- Identifying the difference of a snack & ordering a drink.

In [2]:
import nltk
from nltk import word_tokenize

## Exploring the use of NLTK POS tagger for identifying intent

In the ideal scenario that this works perfectly, we will not need any training.

In [21]:
# Different ways people respond to a question about their name
responses_name = ["I am Surya.", "Raju.", "My name is Sandeep.", "Surendra."]

In [22]:
# Tokenizing, tagging and displaying
print(nltk.pos_tag(word_tokenize(responses_name[0])))
print(nltk.pos_tag(word_tokenize(responses_name[1])))
print(nltk.pos_tag(word_tokenize(responses_name[2])))
print(nltk.pos_tag(word_tokenize(responses_name[3])))

[('I', 'PRP'), ('am', 'VBP'), ('Surya', 'NNP'), ('.', '.')]
[('Raju', 'NNP'), ('.', '.')]
[('My', 'PRP$'), ('name', 'NN'), ('is', 'VBZ'), ('Sandeep', 'NNP'), ('.', '.')]
[('Surendra', 'NNP'), ('.', '.')]


As you can see, in each of the cases, name is always classified as "NNP". So the word in the sentence whose POS is tagged as "NNP" can be assumed as the person's name.

In [24]:
# Use this command to understand more about each tag
nltk.help.upenn_tagset('NNP')

NNP: noun, proper, singular
    Motown Venneboerger Czestochwa Ranzer Conchita Trumplane Christos
    Oceanside Escobar Kreisler Sawyer Cougar Yvette Ervin ODI Darryl CTCA
    Shannon A.K.C. Meltex Liverpool ...


However, this isn't robust. Enough, just adding full stops changes how NLTK tags the words as shown below. This is not what we need.

In [26]:
# Different ways people respond to a question about their name
responses_name2 = ["I am Surya", "Raju", "My name is Sandeep", "Surendra"]

# Tokenizing, tagging and displaying
print(nltk.pos_tag(word_tokenize(responses_name2[0])))
print(nltk.pos_tag(word_tokenize(responses_name2[1])))
print(nltk.pos_tag(word_tokenize(responses_name2[2])))
print(nltk.pos_tag(word_tokenize(responses_name2[3])))

[('I', 'PRP'), ('am', 'VBP'), ('Surya', 'JJ')]
[('Raju', 'NN')]
[('My', 'PRP$'), ('name', 'NN'), ('is', 'VBZ'), ('Sandeep', 'JJ')]
[('Surendra', 'NN')]


**Conclusion:** NLTK's POS tagger is pretty bad when directly used. We need something better.

For better tagging, Stanford's NER (https://nlp.stanford.edu/software/CRF-NER.shtml) can be used, but I don't think that'll give the accuracy we need either.

## Using opennlp.apache.org