# Vivabot

![](https://images.unsplash.com/photo-1527430253228-e93688616381?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=1191&q=80)

Photo by [Rock'n Roll Monkey](https://unsplash.com/photos/R4WCbazrD1g)

In this exercise, you will build your own bot: Vivabot. To do so, we will apply our knowledge about text preprocessing, TF-IDF and similarity, but also basic Python code.

Begin by importing the needed libraries:

In [10]:
# TODO: import needed libraries
import pandas as pd
import numpy as np
import string
import random

import matplotlib.pyplot as plt
import seaborn as sns

import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.tokenize import sent_tokenize
from nltk.stem import WordNetLemmatizer

from sklearn.feature_extraction.text import TfidfVectorizer
from nltk.tokenize import sent_tokenize

import warnings
warnings.filterwarnings("ignore", category=UserWarning)

First let's load our sentence database, stored in the file *chatbot_database.txt* and have a look at the data.

Warning, the file is not a CSV, so you might need to play with the paramaters of `pd.read_csv()` to open it correctly.

In [9]:
# TODO: load chatbot_database.txt
with open("chatbot_database.txt", "r") as file:
    chatbot_db = file.read()


In [10]:
print(type(chatbot_db))

<class 'str'>


In [11]:
lines = chatbot_db.split("\n")
num_lines = len(lines)
print("Number of lines:", num_lines)

Number of lines: 115


In [12]:
print(chatbot_db)

A chatbot (also known as a talkbot, chatterbot, Bot, IM bot, interactive agent, or Artificial Conversational Entity) is a computer program or an artificial intelligence which conducts a conversation via auditory or textual methods.
Such programs are often designed to convincingly simulate how a human would behave as a conversational partner, thereby passing the Turing test.
Chatbots are typically used in dialog systems for various practical purposes including customer service or information acquisition.
Some chatterbots use sophisticated natural language processing systems, but many simpler systems scan for keywords within the input, then pull a reply with the most matching keywords, or the most similar wording pattern, from a database.

The term "ChatterBot" was originally coined by Michael Mauldin (creator of the first Verbot, Julia) in 1994 to describe these conversational programs.
Today, most chatbots are either accessed via virtual assistants such as Google Assistant and Amazon A

It is necessary to compute the TF-IDF on this database. First, do not forget to preprocess the data, and then compute and store the TF-IDF.

In [13]:
# Tokenization of this DB into sentences
sentences = nltk.sent_tokenize(chatbot_db)

In [14]:
# TODO: preprocess and compute the TF-IDF of this database

from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer()

tf_idf = vectorizer.fit_transform(sentences).toarray()

tf_idf_df = pd.DataFrame(data=tf_idf, columns=vectorizer.get_feature_names_out())

tf_idf_df


Unnamed: 0,000,100,16,1950,1966,1972,1984,1994,2006,2008,...,workings,would,written,xico,yahoo,yekaliva,yet,york,your,zuckerberg
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,...,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,...,0.0,0.223085,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,...,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,...,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.257653,0.0,0.0,...,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
79,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,...,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.321862,0.0
80,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,...,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0
81,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,...,0.0,0.000000,0.0,0.0,0.504818,0.0,0.0,0.0,0.000000,0.0
82,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,...,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0


The next step is to get the closest sentence compared to a user query, using cosine similarity. This will be computed in the method `get_closest_sentence(query, tf_idf, vectorizer)`. This method will return the index of the closest sentence of `query` within the TF-IDF of the database, using `vectorizer` to compute the TF-IDF of the query.

Do not forget to preprocess the query before computing the TF-IDF.

Let's define greetings words and greetings answers in two separate variables.

Greetings words should be words or short sentences like "Hello", "Hey", "Hi", What's up?" and so on.
The greetings answers can be words or short sentences that you want.

In [23]:
# TODO: Define the greetings words and answers in two variables
greetings_inputs = ['Hello', 'Hey', 'Hi', 'Good morning', 'Good evenig', 'What\'s up?']
greetings_outputs = ['Hi, I am Vivabot. How can I assist you?', 'Hello! How may I help you?', 'Hey there! How can I assist you today?', 'Hey! What can I do for you?']


Now create a Greetings function, called `greetings(sentence, greetings_inputs, greetings_outputs)`. If the variable `sentence` is in `greetings_inputs`, the function returns randomly a sentence from `greetings_outputs`. Otherwise the function returns nothing.

Take into account when the case does not match too: for example 'hello' or 'Hello' should both work!

In [24]:
# TODO: Implement the function greetings

Text = input('User input:\n>> ')

def greetings(Text, greetings_inputs, greetings_outputs):
    if Text.lower() in [x.lower() for x in greetings_inputs]:
        output = np.random.randint(len(greetings_outputs))
        print(greetings_outputs[output])
    else:
        print("Sorry, I didn't understand that")




Hi, I am Vivabot. How can I assist you?


In [25]:
from sklearn.feature_extraction.text import TfidfVectorizer

user_input = input('How can I help you?\n>> ')

user_tf_idf = vectorizer.transform([user_input])
pd.DataFrame(data=user_tf_idf.toarray(), columns=vectorizer.get_feature_names_out())

Unnamed: 0,000,100,16,1950,1966,1972,1984,1994,2006,2008,...,workings,would,written,xico,yahoo,yekaliva,yet,york,your,zuckerberg
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [26]:
# TODO: implement get_closest_sentence(query, tf_idf, vectorizer)
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

similarity = cosine_similarity(user_tf_idf, tf_idf)
print(f"Similarity: {similarity}")

sentences[similarity.argmax()]

Similarity: [[0.         0.18270619 0.         0.         0.         0.
  0.03491769 0.         0.         0.         0.         0.
  0.         0.         0.         0.03308813 0.         0.
  0.         0.04199073 0.         0.31044231 0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.04823607 0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.05506254
  0.         0.         0.         0.         0.         0.
  0.04537258 0.         0.         0.         0.         0.
  0.         0.         0.08169175 0.         0.         0.08940547
  0.04862898 0.         0.0536158  0.05376098 0.         0.03855684
  0.06542544 0.13289158 0.         0.04494953 0.         0.03627354
  0.         0.         0.         0.         0.         0.        ]]


'Thus, for example, online help systems can usefully employ chatbot techniques to identify the area of help that users require, potentially providing a "friendlier" interface than a more formal search or menu system.'

Next step is to put it all together: let's define a function `vivabot(greetings_inputs, greetings_outputs, tf_idf, vectorizer, database)` that does the following:
<ol>
<li> Print some generic presentation </li>
<li> Ask for text input </li>
<li> If the text input is in greetings: call the function `greetings` and print its output using `greetings_inputs` and `greetings_ouputs`</li>
<li> If the text input is not in greetings, calls the function `get_closest_sentence` and prints the closest sentence using `tf_idf`, `vectorizer` and `database`</li>
<li> Go back to step 2 unless the text input is "Bye" </li>
</ol> 



In [31]:
# TODO: implement the function vivabot


def greetings(sentence, greetings_inputs, greetings_outputs):
    if sentence.lower() in [x.lower() for x in greetings_inputs]:
        output = random.choice(greetings_outputs)
        return output
    else:
        return None

def get_closest_sentence(query, tf_idf, vectorizer, database):
    query_tf_idf = vectorizer.transform([query])
    similarity = cosine_similarity(query_tf_idf, tf_idf)
    closest_sentence_index = np.argmax(similarity)
    return database[closest_sentence_index]

def vivabot(greetings_inputs, greetings_outputs, tf_idf, vectorizer, database):
    print("Welcome to Vivabot!")
    print("How can I assist you today?")
    
    while True:
        user_input = input("User input: ")
        
        if user_input.lower() == "bye":
            print("Goodbye! Have a great day!")
            break
        
        greetings_output = greetings(user_input, greetings_inputs, greetings_outputs)
        
        if greetings_output:
            print(greetings_output)
        else:
            closest_sentence = get_closest_sentence(user_input, tf_idf, vectorizer, database)
            print(closest_sentence)




Finally, call the function `vivabot` and see your chatbot coming to life!

If it does not work well, call the functions one by one and check they all work properly independently first.

In [None]:
# TODO: use your chatbot!
vivabot(greetings_inputs, greetings_outputs, tf_idf, vectorizer, chatbot_db)

**\[BONUS\]**: Let's implement some sentiment analysis features on our brand new chatbot:


If the chatbot does not understand the user query (meaning the similarity is under a pre-defined threshold) implement a small talk function. The small talk function will take as input the query and return a positive or negative message depending on the tone (polarity) of the user.

In [4]:
small_talks_good = ["Thanks for getting in touch with me", "I am so sorry I do not understand your point", 
                   "I'll make sure to understand you after my next update"]

In [5]:
small_talks_bad = ["I can not understand a word of what you are saying", "Please be more specific"]

In [8]:
# TODO: implement the function vivabot
from textblob import TextBlob

def small_talk(query):
    blob = TextBlob(query)
    polarity = blob.sentiment.polarity

    if polarity >= 0:
        return random.choice(small_talks_good)
    else:
        return random.choice(small_talks_bad)

In [11]:
user_query = "I don't understand your point."


response = small_talk(user_query)
print(response)

I am so sorry I do not understand your point


Many improvements can be done now if you have time: improving preprocessing, change your database if you want to use it for another reason

This is your bot!