### CISB 63 - Midterm Project
### Angel Hernandez

### Building a basic chatbot

#### We will be usign Amazon's website for various product categories. Since the database is huge, we will be using the data for electronic items only. 

#### The corpus is a JavaScript Object Notation (JSON)-like format:

### Architecture of the chatbot

 1. Store all the questions from the corpus in a list
 2. Store all the corresponding answers from the corpur in a list
 3. Vectorize and preprocess the question data
 4. Vectorize and preprocess the user's query
 5. Assess the most similar question to the user's query using cosine similarity
 6. Return the corresponding answer to the most similar question as a chat response

#### Let's import some libraries

In [1]:
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
import pandas as pd

#### Let's add one more: 

### ast — Abstract Syntax Trees 

#### The ast module helps Python applications to process trees of the Python abstract syntax grammar.

In [2]:
import ast

#### Let's create two lists for our questions and answers

In [3]:
questions = []
answers = []

#### Let's load the questions and answers in separate lists

In [4]:
with open('dataset\qa_Electronics.json','r') as f:
    for line in f:
        data = ast.literal_eval(line)
        questions.append(data['question'].lower())
        answers.append(data['answer'].lower())

### Let's create a pandas dataframe with the questions and answers

In [6]:
DF = pd.DataFrame(answers,questions)

In [7]:
DF.head(20)

Unnamed: 0,0
is this cover the one that fits the old nook color? which i believe is 8x5.,yes this fits both the nook color and the same...
does it fit nook glowlight?,no. the nook color or color tablet
would it fit nook 1st edition? 4.9in x 7.7in ?,i don't think so. the nook color is 5 x 8 so n...
will this fit a nook color that's 5 x 8?,yes
will this fit the samsung galaxy tab 4 nook 10.1,"no, the tab is smaller than the 'color'"
does it have a flip stand?,"no, there is not a flip stand. it has a pocket..."
does this have a flip stand,"hi, no it doesn't"
also fits the hd+?,it should. they are the same size and the char...
does it have 2 positions for the reader? horizontal/vertical thank you kwod,yes
"is there a closure mechanism? bands, magnetic, etc.?",no- it is more like a normal book would be. it...


In [8]:
DF.tail()

Unnamed: 0,0
is the space from bottom of desktop to tray adjustable to more than one position?,no
can the mouse extension be mounted on the left side,"yes, you can put it on which ever side you want"
does it come with all the hardware,"it's been a while since i bought this, but i'm..."
how wide is it? i need a 19 inch length tray for my little desk,we just measured the tray and it is 21 inches ...
"can this be adapted to be clamped underneath a glass computer desk? the glass is 1/4"" thick",i do not think so.


#### Let's tokenize the text and convert data in matrix format

In [9]:
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(stop_words='english')
X_vec = vectorizer.fit_transform(questions)

#### This is new!!!! <br> Let's Transform data by applying term frequency inverse document frequency (TF-IDF) 

In [10]:
tfidf = TfidfTransformer() 
X_tfidf = tfidf.fit_transform(X_vec)

In [11]:
tfidf

In [12]:
X_tfidf

<314263x69189 sparse matrix of type '<class 'numpy.float64'>'
	with 2033712 stored elements in Compressed Sparse Row format>

#### X_tfidf is the repository matrix that will be searched every time a new question is entered in the chatbot for the most similar question. 

#### Let's do a functions "conversation"

#### We need to calculate the angle between every row of the X_tfidf matrix and the new question vector. Note that we are using skelearn cosine_similarity module to calculate the cosine between each row and the vector and then convert the cosine into degrees. 

#### Finally, we search the row that has the maximum cosine (or the minimum angle) with the new question vector and return the corresponding answer. 

In [13]:
def conversation(im):
    global tfidf, answers, X_tfidf
    Y_vec = vectorizer.transform(im)
    Y_tfidf = tfidf.fit_transform(Y_vec)
    cos_sim = np.rad2deg(np.arccos(max(cosine_similarity(Y_tfidf, X_tfidf)[0])))
    if cos_sim > 60 :
        return "sorry, I did not quite understand that"
    else:
        return answers[np.argmax(cosine_similarity(Y_tfidf, X_tfidf)[0])]


#### Finally, let's implement the chat where the user enters a question and the bot provides a response. The chat continues until the user types "bye"

In [14]:
def main():
    usr = input('Enter your name: ')
    print("Support: Hi, welcome to Q&A support. How can I help you?")
    while True:
        im = input("{}: ".format(usr))
        if im.lower() == 'bye':
            print("Q&A support: bye!")
            break
        else:
            print("Q&A support: "+conversation([im]))

### Let's test

In [15]:
main()

Enter your name: Angel
Support: Hi, welcome to Q&A support. How can I help you?
Angel: 
  @media print {
    .ms-editor-squiggles-container {
      display:none !important;
    }
  }
  .ms-editor-squiggles-container {
    all: initial;
  }My computer doesn't work
Q&A support: sure they will. they work on any computer. if you have a headphone jack, plug into that. otherwise turn the computer volume waaay down and plug into the speaker jack. but before doing any of that look on your present speakers, they might have a headphone jack.
Angel: My laptop is not working
Q&A support: hi, you may get you laptop in 3 to 5 business day depending on you location. thanks for you interest. tech mark.
Angel: I have a problem with my computer
Q&A support: 7 year old computer with old operating system. doesn't have the memory to be updated. apple told me i have to just buy a knew one if i want it to work with many of the programs i use.
Angel: can I purchase a new computer?
Q&A support: yes
Angel: does

In [None]:
main()