# A SIMPLE RETRIEVAL BASED CHATBOT - LERATO
A Retrieval based bot - using cosine similarity between words entered by the user and the words in the corpus.
We 'll define a function response which searches the user’s utterance for one or more known keywords and returns one of several possible responses. If it doesn’t find the input matching any of the keywords, it returns a response:” I am so sorry! I dont understand your words"

### IMPORT NECCESSARY LIBRARIES OR DEPENDENCIES

In [1]:
import numpy as np # - Linear Algebra
import random # - Random values/strings
import nltk # - Natural languagua processing toolkit for Natural language preprocesing
import string # - To process standard python strings
from sklearn.feature_extraction.text import TfidfVectorizer #convert a collection of raw text to a matrix of TF-IDF features
from sklearn.metrics.pairwise import cosine_similarity #used to find the similarity between words/values

### READ TEXT DATA

In [2]:
Text =open('lerato.txt','r',errors = 'ignore') #Assign a variable to text path where the file is located
Text =Text.read() #Read the path of the assigned variable and store in a new variable for preprocessing usage

### TEXT PREPROCESSING

In [3]:
Text = Text.lower()# converts all text to lowercase; this help to avoid different meaning/pattern within text

In [4]:
sentence_tokens = nltk.sent_tokenize(Text)# converts a text file to list of sentences 
word_tokens = nltk.word_tokenize(Text)# converts a text file to list of words

In [5]:
#preview tokenized sentence example
sentence_tokens[:2] #Output the first 2 tokenized sentenced, you can tune to 1 and see how it works

['zindi is the first data science competition platform in africa and hosts an entire data science ecosystem of scientists, engineers, academics, companies, ngos, governments and institutions focused on solving africa’s most pressing problems,\n\nfor data scientists, from newbies to rock stars, zindi is a place to access african datasets and solve african problems.',
 'data scientists will find all the tools they need on zindi to compete, share ideas, hone their skills, build their professional profiles, find career opportunities, and have fun!']

In [6]:
#preview tokenized word example
word_tokens[:5] #Output the first 5 tokenized word, you can tune to see how it works

['zindi', 'is', 'the', 'first', 'data']

In [7]:
lemmer = nltk.stem.WordNetLemmatizer()
##WordNet is a semantically-oriented dictionary of English included in NLTK.

In [8]:
#Define a fxn to lemmatize the tokenized words
def LemTokens(tokens):
    lemmatized = [lemmer.lemmatize(token) for token in tokens]
    return lemmatized

In [9]:
#Store punctuations removal from words into a variable
remove_punctuations = dict((ord(punctuations), None) for punctuations in string.punctuation)

In [10]:
#Define a fxn to Normalized lemmatized words i.e remove puctuations and convert all text to lower case 
def LemNormalize(text):
    NormalizedLemmatized = LemTokens(nltk.word_tokenize(text.lower().translate(remove_punctuations)))
    return  NormalizedLemmatized

### SIMPLE KEYWORD MATCHING
Next,Define a fxn for a greeting by the AXABot i.e if a user’s input is a greeting, the bot respond with a greeting response.

In [11]:
GREETING_INPUTS = ("hello", "hi", "greetings", "what's up","hey")
GREETING_RESPONSES = ["i am here to attend to you", "hey", "how can i help you*", "what would you like to know about AXA Products/Services"]

In [12]:
def greeting(sentence):
 
    for word in sentence.split():
        if word.lower() in GREETING_INPUTS:
            return random.choice(GREETING_RESPONSES)

### Generating Response
To generate a response from our bot for input questions, the concept of document similarity will be used. So functionality of the modules imported above will be utilizied

In [13]:
def response(user_response):
    robo_response=''
    sentence_tokens.append(user_response)
    TfidfVec = TfidfVectorizer(tokenizer=LemNormalize, stop_words='english')
    tfidf = TfidfVec.fit_transform(sentence_tokens)
    vals = cosine_similarity(tfidf[-1], tfidf)
    idx=vals.argsort()[0][-2]
    flat = vals.flatten()
    flat.sort()
    req_tfidf = flat[-2]
    if(req_tfidf==0):
        robo_response=robo_response+"I am so sorry! I dont understand your words"
        return robo_response
    else:
        robo_response = robo_response+sentence_tokens[idx]
        return robo_response

Finally, we will feed the lines that we want our bot to say while starting and ending a conversation depending upon the user’s input.

In [None]:
flag=True
#welcome message
print("Welcome: I am LERATO from Zindi. I will answer your queries about Zindi Platforms. If you want to exit the conversion, type Bye!")
while(flag==True):
    user_response = input() #allows user input
    user_response=user_response.lower() #convert user response to lower case for the botb
    if(user_response!='bye'):
        if(user_response=='thanks' or user_response=='thank you' or user_response=='okay'):
            flag=False
            print("LERATO: You are welcome..")
        elif(greeting(user_response)!=None):
            print("LERATO: "+greeting(user_response))
        else:
            print("LERATO: ",end="")
            print(response(user_response))
            sentence_tokens.remove(user_response)
    else:
        flag=False
        print("LERATO: Thanks, Stay active on the platform and help others to learn, Bye.")

Welcome: I am LERATO from Zindi. I will answer your queries about Zindi Platforms. If you want to exit the conversion, type Bye!
hi
LERATO: how can i help you*
hello
LERATO: hey
what is zindi
LERATO: 

  'stop_words.' % sorted(inconsistent))


zindi is the first data science competition platform in africa and hosts an entire data science ecosystem of scientists, engineers, academics, companies, ngos, governments and institutions focused on solving africa’s most pressing problems,

for data scientists, from newbies to rock stars, zindi is a place to access african datasets and solve african problems.
how many numbers of submission per day


  'stop_words.' % sorted(inconsistent))


LERATO: maximum number of submissions are based on the challenge, kindly read instruction page
how many numbers of team in a challenge


  'stop_words.' % sorted(inconsistent))


LERATO: maximum number of individual in a team are 4, to earn some points be among the top 10 in a price or reward challenge.


# Feel free to contibute and make it better