* Ressources utiles : https://www.nltk.org/

### Importing the required libraries

In [73]:
import numpy as np
import nltk
import string
import random


* Importing and reading the corpus

In [74]:
f = open ('chatbot.txt', 'r', errors = 'ignore')
raw_doc = f.read()
raw_doc = raw_doc.lower() # traduire en minuscules
nltk.download('punkt') # télécharger le tokenizer punkt, qui est requis pour la tokenisation des mots
nltk.download('wordnet') #utiliser le lemmatiseur wordnet pour réduire les mots à leur forme de base
sent_tokens = nltk.sent_tokenize(raw_doc)# convertit le document en liste de phrases
word_tokens = nltk.word_tokenize(raw_doc) # convertit le document en liste de mots


[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\Franck\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\Franck\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


**Example of sentance tokens**

In [75]:
sent_tokens[:2]

['data science is an interdisciplinary academic field[1] that uses statistics, scientific computing, scientific methods, processing, scientific visualization, algorithms and systems to extract or extrapolate knowledge and insights from potentially noisy, structured, or unstructured data.',
 '[2]\n\ndata science also integrates domain knowledge from the underlying application domain (e.g., natural sciences, information technology, and medicine).']

**Example of word tokens**

In [76]:
word_tokens[:2]

['data', 'science']

**Text preprocessing**

In [77]:
lemmer = nltk.stem.WordNetLemmatizer()
#wordNet est un dictionnaire sémantiquement orienté de l'anglais inclus dans NLTK.
def LemTokens(tokens):
    return [lemmer.lemmatize(token) for token in tokens]
remove_punct_dict = dict((ord(c), None) for c in string.punctuation)
def LemNormalize(text):
    return LemTokens(nltk.word_tokenize(text.lower().translate(remove_punct_dict)))


**Defining the greeting function**

In [78]:
GREET_INPUTS = ("hello","hi","greetings","sup","what's up", "hey",)
GREET_RESPONSES = ["Hello!", "Hi!", "Greetings!", "Sup!", "hi there","nods","i am glad! you are talking to me"]
def greet(sentence):
    for word in sentence.split():
        if word.lower() in GREET_INPUTS:
            return random.choice(GREET_RESPONSES)

**Response generation**

In [79]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [80]:
def response(user_response):
    robo1_response = ''
    TfidfVec = TfidfVectorizer(tokenizer=LemNormalize, stop_words='english')
    tfidf = TfidfVec.fit_transform(sent_tokens)
    vals = cosine_similarity(tfidf[-1], tfidf)
    idx = vals.argsort()[0][-2]
    flat = vals.flatten()
    flat.sort()
    req_tfidf = flat[-2]
    if(req_tfidf==0):
        robo1_response = robo1_response+"I am sorry! I don't understand you. Please try again with different words."
        return robo1_response
    else:
        robo1_response = robo1_response+sent_tokens[idx]
        return robo1_response

**Defining conversation start/end protocols**

In [81]:
flag = True
print("Bot: My name is Jarvis. Let's have a conversation! Also, if you want to exit any time, just type 'exit'")
while (flag==True):
    user_response = input()
    user_response = user_response.lower()
    if (user_response != 'exit'):
        if(user_response == 'thanks' or user_response == 'thank you'):
            flag = False
            print("Bot: You are Welcome...")
        else:
            if(greet(user_response) != None):
                print("Bot: "+greet(user_response))
            else:
                sent_tokens.append(user_response)
                word_tokens = word_tokens+nltk.word_tokenize(user_response)
                final_words = list(set(word_tokens))
                print("Bot: ", end="")
                print(response(user_response))
                sent_tokens.remove(user_response)
    else:
        flag = False
        print("Bot: It was nice chatting with you. Have a great day! <3")

Bot: My name is Jarvis. Let's have a conversation! Also, if you want to exit any time, just type 'exit'
Bot: hi there
Bot: "how data science will impact future of businesses?"
Bot: It was nice chatting with you. Have a great day! <3
