<h2>  Simple Chat Bot Using NLTK

NLTK(Natural Language Toolkit) is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries.

In [30]:
import io
import random
import string # to process standard python strings
import numpy as np
import warnings
warnings.filterwarnings('ignore')

In [31]:
import nltk
from nltk.stem import WordNetLemmatizer
nltk.download('popular', quiet=True) # for downloading packages
nltk.download('punkt') # first-time use only
nltk.download('wordnet') # first-time use onl

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\tatav\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\tatav\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

<h2> Reading in the corpus

In [32]:
f=open('chat.txt','r',errors = 'ignore')
raw=f.read()
raw = raw.lower()# converts to lowercase

The main issue with text data is that it is all in text format (strings). However, the Machine learning algorithms need some sort of numerical feature vector in order to perform the task. So before we start with any NLP project we need to pre-process it to make it ideal for working. Basic text pre-processing includes:

1. Converting the entire text into uppercase or lowercase, so that the algorithm does not treat the same words in different cases as different

2. Tokenization: Tokenization is just the term used to describe the process of converting the normal text strings into a list of tokens i.e words that we   
   actually want. Sentence tokenizer can be used to find the list of sentences and Word tokenizer can be used to find the list of words in strings.

3. Stemming: Stemming is the process of reducing inflected (or sometimes derived) words to their stem, base or root form — generally a written word form
    Example if we were to stem the following words: “Stems”, “Stemming”, “Stemmed”, “and Stemtization”, the result would be a single word “stem”.

4. Lemmatization: A slight variant of stemming is lemmatization. The major difference between these is, that, stemming can often create non-existent words, 
   whereas lemmas are actual words. So, your root stem, meaning the word you end up with, is not something you can just look up in a dictionary, but you can look up a lemma. Examples of Lemmatization are that “run” is a base form for words like “running” or “ran” or that the word “better” and “good” are in the same lemma so they are considered the same

<h2> Tokenisation

In [33]:
sent_tokens = nltk.sent_tokenize(raw)# converts to list of sentences 
word_tokens = nltk.word_tokenize(raw)# converts to list of words

<h2> Stemming and lemmetization


In [34]:
lemmer = nltk.stem.WordNetLemmatizer()
#WordNet is a semantically-oriented dictionary of English included in NLTK.
def LemTokens(tokens):
    return [lemmer.lemmatize(token) for token in tokens]
remove_punct_dict = dict((ord(punct), None) for punct in string.punctuation)

def LemNormalize(text):
    return LemTokens(nltk.word_tokenize(text.lower().translate(remove_punct_dict)))

<h2> Keyword matching </h2>

Next, we shall define a function for a greeting by the bot i.e if a user’s input is a greeting, the bot shall return a greeting response.ELIZA uses a simple keyword matching for greetings. We will utilize the same concept here.

In [35]:
GREETING_INPUTS = ("hello", "hi", "greetings", "sup", "what's up","hey",)
GREETING_RESPONSES = ["hi", "hey", "*nods*", "hi there", "hello", "I am glad! You are talking to me"]
def greeting(sentence):
 
    for word in sentence.split():
        if word.lower() in GREETING_INPUTS:
            return random.choice(GREETING_RESPONSES)

<h2> Generating Response

Now we are going to genrate responces  useing a technique called TF-IDF and cosine similarity

In [36]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [37]:
def response(user_response):
    robo_response=''
    sent_tokens.append(user_response)
    TfidfVec = TfidfVectorizer(tokenizer=LemNormalize, stop_words='english')
    tfidf = TfidfVec.fit_transform(sent_tokens) #Fits the TfidfVectorizer object to the text data in sent_tokens and transforms it into a TF-IDF matrix named tfidf.
    vals = cosine_similarity(tfidf[-1], tfidf) #Calculates the cosine similarity between the TF-IDF vector of the last (most recent) sentence/token in sent_tokens and all the other sentences/tokens in the TF-IDF matrix tfidf.
    idx=vals.argsort()[0][-2] #Calculates the cosine similarity between the TF-IDF vector of the last (most recent) sentence/token in sent_tokens and all the other sentences/tokens in the TF-IDF matrix tfidf.
    flat = vals.flatten() ##conversts 2d into 1d makes it easy to retrive the info
    flat.sort()
    req_tfidf = flat[-2]
    if(req_tfidf==0): ##if no similarity is found
        robo_response=robo_response+"I am sorry! I don't understand you"
        return robo_response
    else:
        robo_response = robo_response+sent_tokens[idx]
        return robo_response

In [38]:
flag=True
print("ROBO: My name is Robo. I will answer your queries about Chatbots. If you want to exit, type Bye!")
while(flag==True):
    user_response = input()
    user_response=user_response.lower()
    if(user_response!='bye'):
        if(user_response=='thanks' or user_response=='thank you' ):
            flag=False
            print("ROBO: You are welcome..")
        else:
            if(greeting(user_response)!=None):
                print("ROBO: "+greeting(user_response))
            else:
                print("ROBO: ",end="")
                print(response(user_response))
                sent_tokens.remove(user_response)
    else:
        flag=False
        print("ROBO: Bye! take care..")

ROBO: My name is Robo. I will answer your queries about Chatbots. If you want to exit, type Bye!
ROBO: I am sorry! I don't understand you
ROBO: I am sorry! I don't understand you
ROBO: understanding chatbots

at its core, a chatbot is a computer program designed to simulate human conversation through text or voice interactions.
ROBO: understanding chatbots

at its core, a chatbot is a computer program designed to simulate human conversation through text or voice interactions.
ROBO: customer service: one of the most prominent applications of chatbots is in customer service and support.
ROBO: I am sorry! I don't understand you
ROBO: I am sorry! I don't understand you
ROBO: I am sorry! I don't understand you
ROBO: broadly categorized, chatbots can be classified into rule-based chatbots and ai-powered chatbots.
ROBO: types of chatbots

chatbots come in various forms, each tailored to specific use cases and functionalities.
ROBO: Bye! take care..
