### Very basic chat bot from Amazon's electronics Q&A data

For the purpose of the chatbot that we will create in this section, we will be using Amazon's Q&A data, which is a repository of questions and answers gathered from
Amazon's website for various product categories ( http://jmcauley.ucsd.edu/data/amazon/qa/ ). Since the dataset is massive, we will only
be using the Q&A data for electronic items. Being trained on Q&A data for electronic items, our chatbot could be deployed as automated Q&A support under the Electronic Items section. The corpus is in a JavaScript Object Notation (JSON)-like format. Each row of data is in a dictionary format with various key-value pairs. 

Now that we have familiarized ourselves with the corpus, let's design the architecture of the chatbot, as follows:

    * Store all the questions from the corpus in a list
    * Store all corresponding answers from the corpus in a list
    * Vectorize and preprocess the question data
    * Vectorize and preprocess the user's query
    * Assess the most similar question to the user's query using cosine similarity
    * Return the corresponding answer to the most similar question as a chat response

In [1]:
import numpy as np
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import CountVectorizer


#loading questions and answers in separate lists
import ast 
questions = []
answers = [] 
with open('qa_Electronics.json','r') as f:
    for line in f:
        data = ast.literal_eval(line)
        questions.append(data['question'].lower())
        answers.append(data['answer'].lower())

We will need to import the corpus ( qa_Electronics.json ) into Python. Dataset is under working directory for this notebook. We read the file as a text file and then use the ast library's literal_eval function to convert the rows from a string to a Python dictionary. We then iterate through each dictionary to extract and store questions and answers in separate lists. While importing, we also perform the preprocessing step of converting all characters to lowercase.

#### Transformations
Next, using the CountVectorizer module of the sklearn library, we convert the questions list into a sparse matrix and apply TF-IDF transformation.

In [8]:
vectorizer = CountVectorizer(stop_words='english')
X_vec = vectorizer.fit_transform(questions)

""" Transform data by applying term frequency inverse document frequency (TF-IDF) """

tfidf = TfidfTransformer() # by defualt norm 12 is applied
X_tfidf = tfidf.fit_transform(X_vec)

X_tfidf is the repository matrix that will be searched every time a new question is entered in the chatbot for the most similar question.

To implement this, we create a function to calculate the angle between every row of the X_tfidf matrix and the new question vector. We use the sklearn library's cosine_similarity module to calculate the cosine between each row and the vector, and then convert the cosine into degrees.

Finally, we search the row that has the maximum cosine (or the minimum angle) with the new question vector and return the corresponding answer to that question as the response. If the smallest angle between the question vector and every row of the matrix is greater than a threshold value, then we consider that question to be different enough to not warrant a response.

In [9]:
def conversation(im):
    global tfidf, answers, X_tfidf
    Y_vec = vectorizer.transform(im)
    Y_tfidf = tfidf.fit_transform(Y_vec)
    cos_sim = np.rad2deg(np.arccos(max(cosine_similarity(Y_tfidf, X_tfidf)[0])))
    if cos_sim > 60 :
        return "sorry, I did not quite understand that"
    else:
        return answers[np.argmax(cosine_similarity(Y_tfidf, X_tfidf)[0])]

def main():
    usr = input("Please enter your username: ")
    print("support: Hi, welcome to Q&A support. How can I help you?")
    while True:
        im = input("{}: ".format(usr))
        if im.lower() == 'bye':
            print("Q&A support: bye!")
            break
        else:
            print("Q&A support: "+conversation([im]))

In [10]:
main()

Please enter your username:  Sai Vyas


support: Hi, welcome to Q&A support. How can I help you?


Sai Vyas:  What is the warranty of my phone


Q&A support: the guarantee is one month. (the phone must be free of shocks or manipulated its hardware) the costs paid by the buyer.


Sai Vyas:  Does it have internal memory


Q&A support: no, that's the only thing i wish was different. other than that it's great. definitely worth every penny.


Sai Vyas:  Is it an Iphone?


Q&A support: it does not plug into an iphone. i love this plantronic headset, quality wonderful. i bought an adapter that fit the iphone but the sound quality was bad so i now use the white ear bud thing that came with the iphone. i have small ears so those things hurt when i use them for a long time. i don't recommend the plantronics headset with an iphone.


Sai Vyas:  Does my phone have long battery life?


Q&A support: i've used the hp laptop with out being plugged in for at least 8 hours. had to go out. so i turned it off after that. came home and turned bavk on. there was atleast half battery left.hope this info helps you....&#128540;


Sai Vyas:  does it have bluetooth?


Q&A support: no


Sai Vyas:  Theft coverage?


Q&A support: the coverage is for two years protection additional to the one year manufacturer's warranty to repair your broken item. the additional coverage protects your item to be fixed if you break it in an accident, i.e. drop it, spill water on it.


Sai Vyas:  Can I get a replacement if the screen is broken?


Q&A support: i went back to the seller and they were awesome at helping out.


Sai Vyas:  Fantastic, thank you


Q&A support: hi. frame china, optics swiss


Sai Vyas:  Alright, bye!


Q&A support: i would have to agree with samuel because most people can just use their iphone or droid phones to shoot great video for example. we have a droid x that has a generous 4.5" screen and takes hd video & sound just fine for the everyday user. videos can now be uploaded directly to youtube without having to convert them either (had to do this a year ago for quality)! i think little pocket-cams will become obsolete by 2012 (along with the world!!! lol).


Sai Vyas:  Bye!


Q&A support: i would have to agree with samuel because most people can just use their iphone or droid phones to shoot great video for example. we have a droid x that has a generous 4.5" screen and takes hd video & sound just fine for the everyday user. videos can now be uploaded directly to youtube without having to convert them either (had to do this a year ago for quality)! i think little pocket-cams will become obsolete by 2012 (along with the world!!! lol).


Sai Vyas:  bye


Q&A support: bye!


Basically, this chat bot returns answer statements and is on-par with accuracy.

Cosine similarity is the key here and so far, it performs decent enough, although it has some errors when not recognizing the question properly. This problem can be resolved with large corpus of data and rules that are fed in for usual common talk phrases in daily life of human communication.

Thanks for your time in reading this till here!