**Header File**

In [23]:
import io
import random
import string # to process standard python strings
import warnings
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import warnings
warnings.filterwarnings('ignore') # Ignoring warning messages

**Importing Natural Language Toolkit (nltk) for text processing**
*   NLTK stands for "Natural Language Toolkit". It is a powerful tool for building computer programs that work with human language.
*   It includes a variety of different resources that can analyze text, like dictionaries and algorithms.
*   It makes it easier for programmers to work with text data by providing interfaces and libraries that can help classify, group, and analyze text.
*    Overall, NLTK is a useful tool for anyone working with language data and interested in programming.

In [24]:
pip install nltk



In [25]:
import nltk
from nltk.stem import WordNetLemmatizer
nltk.download('popular', quiet=True) # for downloading packages
#nltk.download('punkt') # first-time use only
#nltk.download('wordnet') # first-time use only

True

**Open the text file containing the corpus for chatbot training**


In [26]:
f=open('/content/drive/MyDrive/projects/chatbot/chatbot.txt','r',errors = 'ignore')
raw=f.read()
raw = raw.lower()# converts to lowercase

**Tokenisation**

In [27]:
sent_tokens = nltk.sent_tokenize(raw)# converts to list of sentences
word_tokens = nltk.word_tokenize(raw)# converts to list of words

**Preprocessing : Function to lemmatize tokens (converting words to their base form)**
*   Function called LemTokens which will take as input the tokens and return


normalized tokens.

In [28]:
lemmer = nltk.stem.WordNetLemmatizer()
#WordNet is a semantically-oriented dictionary of English included in NLTK.
def LemTokens(tokens):
    return [lemmer.lemmatize(token) for token in tokens]
remove_punct_dict = dict((ord(punct), None) for punct in string.punctuation)


**Function to normalize the text (lemmatize and remove punctuation)**


In [29]:
def LemNormalize(text):
    return LemTokens(nltk.word_tokenize(text.lower().translate(remove_punct_dict)))

**Greeting inputs and responses for the chatbot**


In [30]:
GREETING_INPUTS = ("hello", "hi", "greetings", "sup", "what's up","hey",)
GREETING_RESPONSES = ["hi", "hey", "*nods*", "hi there", "hello", "I am glad! You are talking to me"]
def greeting(sentence): ## Function to respond to a greeting

    for word in sentence.split():
        if word.lower() in GREETING_INPUTS:
            return random.choice(GREETING_RESPONSES)

**Function to generate a response from the chatbot**


*   After text is preprocessed, it needs to be turned into numbers. The **bag-of-words method represents** text with a vocabulary of known words and measures the presence of those words, with **no regard for word order**.
*   This method assumes that documents with similar content are also similar in meaning.
*   One issue with this method is that **frequent words can be weighed too heavily**and unfavorable to longer documents. The **TF-IDF method rescales word frequency** according to how often they appear across all documents.
*   **Cosine similarity** is a measure of **how similar two documents** are based on their word frequency. It is used in generating a response from a bot by **searching the user's input for known keywords** and returning a possible response or an "I don't understand" message.


In [31]:
def response(user_response):
    alexa_response=''
    sent_tokens.append(user_response)
    TfidfVec = TfidfVectorizer(tokenizer=LemNormalize, stop_words='english')
    tfidf = TfidfVec.fit_transform(sent_tokens)
    vals = cosine_similarity(tfidf[-1], tfidf)
    idx=vals.argsort()[0][-2]
    flat = vals.flatten()
    flat.sort()
    req_tfidf = flat[-2]
    if(req_tfidf==0):
        alexa_response=alexa_response+"I am sorry! I don't understand you"
        return alexa_response
    else:
        alexa_response = alexa_response+sent_tokens[idx]
        return alexa_response

**Main loop of the chatbot**


In [None]:
flag=True
print("Alexa: My name is Alexa. I will answer your queries about Chatbots. If you want to exit, type Bye!")
while(flag==True):
    user_response = input()
    user_response=user_response.lower()
    if(user_response!='bye'):
        if(user_response=='thanks' or user_response=='thank you' ):
            flag=False
            print("Alexa: You are welcome..")
        else:
            if(greeting(user_response)!=None):
                print("Alexa: "+greeting(user_response))
            else:
                print("Alexa: ",end="")
                print(response(user_response))
                sent_tokens.remove(user_response)
    else:
        flag=False
        print("Alexa: Bye! take care..")

Alexa: My name is Alexa. I will answer your queries about Chatbots. If you want to exit, type Bye!
