# BUILDING A CHATBOT
---

# &raquo; NLP
Natural Language Processing, or NLP for short, is a field of study that focuses on the interactions between human language and computers. It's at the crossroads of computer science, artificial intelligence, and computational linguistics [Wikipedia]. NLP is a method for computers to intelligently analyse, understand, and infer meaning from human language. Developers can use natural language processing (NLP) to organise and structure knowledge for tasks including automatic summarization, translation, named entity recognition, relationship extraction, sentiment analysis, speech recognition, and topic segmentation.

# &raquo; Some pre-requisite topics for this exercise:
- NLTK
- Text Pre- Processing with NLTK
- Bag of Words
- TF-IDF Approach
- Cosine Similarity

---
---

# Let's Begin!

___
### **STEP 1:** Importing the necessary libraries**

In [1]:
import nltk
import numpy as np
import random
import string # to process standard python strings

---
### **Step 2:** Reading the Corpus

> #### Corpus
For our example, we will be using the Wikipedia page for elon musk as our corpus. Copy the contents from the page and place them in a text file named ‘chatbot.txt.’ However, you can use any corpus of your choice.

#### Reading in the data
We will read in the corpus.txt file and convert the entire corpus into a list of sentences and a list of words for further pre-processing.

In [2]:
f=open('chatbot.txt','r',errors = 'ignore')
raw=f.read()
raw=raw.lower()# converts to lowercase
nltk.download('punkt') # first-time use only
nltk.download('wordnet') # first-time use only

[nltk_data] Downloading package punkt to
[nltk_data]     /Users/verniethorpe/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     /Users/verniethorpe/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

---
Let see an example of the sent_tokens and the word_tokens

In [3]:
sent_tokens = nltk.sent_tokenize(raw)# converts to list of sentences 
word_tokens = nltk.word_tokenize(raw)# converts to list of words

> #### Note:
> * Run the following code to see the tokenised sentences:
> `print(sent_tokens)`
> * Run the following code to see the tokenised words:
> `print(word)`
<br><br>
> Try running the above mentioned print statements in the space given below:

---
### **Step 3:** Pre-processing the raw text
We'll now define a function called LemTokens, which will accept the tokens as input and return normalised tokens.

In [4]:
lemmer = nltk.stem.WordNetLemmatizer()
#WordNet is a semantically-oriented dictionary of English included in NLTK.
def LemTokens(tokens):
    return [lemmer.lemmatize(token) for token in tokens]
remove_punct_dict = dict((ord(punct), None) for punct in string.punctuation)
def LemNormalize(text):
    return LemTokens(nltk.word_tokenize(text.lower().translate(remove_punct_dict)))

---
### **Step 4:** Keyword matching
Next, we'll design a function for the chatbot's greeting, i.e., if a user's input is a greeting, the bot should respond with a greeting.

In [5]:
GREETING_INPUTS = ("hello", "hi", "greetings", "sup", "what's up","hey",)
GREETING_RESPONSES = ["hi", "hey", "*nods*", "hi there", "hello", "I am glad! You are talking to me"]
def greeting(sentence):
 
    for word in sentence.split():
        if word.lower() in GREETING_INPUTS:
            return random.choice(GREETING_RESPONSES)

---
### **Step 5:** Generating Response
The concept of document similarity will be used to generate responses from our bot for input questions. As a result, we begin by importing the required modules.

In [6]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

This will be used to determine the degree of similarity between the user's words and the words in the corpus. This is the most basic way to construct a chatbot. 
We define a function response, which looks for one or more generic keywords in the user's speech and returns one of several potential responses. If it doesn't detect any input that matches any of the keywords, it responds with "I'm sorry!" "I didn't understand."

In [7]:
def response(user_response):
    elobot_response=''
    sent_tokens.append(user_response)
    TfidfVec = TfidfVectorizer(tokenizer=LemNormalize, stop_words='english')
    tfidf = TfidfVec.fit_transform(sent_tokens)
    vals = cosine_similarity(tfidf[-1], tfidf)
    idx=vals.argsort()[0][-2]
    flat = vals.flatten()
    flat.sort()
    req_tfidf = flat[-2]
    if(req_tfidf==0):
        elobot_response=elobot_response+"I am sorry! I don't understand you"
        return elobot_response
    else:
        elobot_response = elobot_response+sent_tokens[idx]
        return elobot_response

Finally, we will feed the lines that we want our bot to say while starting and ending a conversation, depending upon the user’s input.
> **Play with the chatbot by providing inputs like:**
> * `Where was Elon Musk born?`
> * `Tell me about Tesla`
> * `In which university did he study?`

In [None]:
flag=True
print("ELO: My name is ELO. I will answer your queries about Chatbots. If you want to exit, type Bye!")
while(flag==True):
    user_response = input()
    user_response=user_response.lower()
    if(user_response!='bye'):
        if(user_response=='thanks' or user_response=='thank you' ):
            flag=False
            print("ELO: You are welcome..")
        else:
            if(greeting(user_response)!=None):
                print("ELO: "+greeting(user_response))
            else:
                print("ELO: ",end="")
                print(response(user_response))
                sent_tokens.remove(user_response)
    else:
        flag=False
        print("ELO: Bye! take care..")

ELO: My name is ELO. I will answer your queries about Chatbots. If you want to exit, type Bye!


 Where was Elon Musk born?


ELO: 

  'stop_words.' % sorted(inconsistent))


elon reeve musk was born on june 28, 1971, in pretoria, south africa.


 Tell me about Tesla


  'stop_words.' % sorted(inconsistent))


ELO: in 2004, he joined electric vehicle manufacturer tesla motors, inc. (now tesla, inc.) as chairman and product architect, becoming its ceo in 2008. in 2006, he helped create solarcity, a solar energy services company that was later acquired by tesla and became tesla energy.


 In which university did he study?


  'stop_words.' % sorted(inconsistent))


ELO: he was enrolled at queen's university and transferred to the university of pennsylvania two years later, where he received a bachelor's degree in economics and physics.
