# Chatbot Prototyp

In diesem Notebook werden alle Komponenten, die in den anderen Notebooks entwickelt, trainert und gespeichert wurden, zusammengefügt, um einen voll funktionsfähigen Chatbot zu bilden. 

Im folgenden sind einige kleinere Vorbereitungen zu erledigen, wie das laden der Modells und die Extraktion einiger Informationen aus der Datenbank um die Funktionsfähigkeit des Chatbots zu gewährleisten. 

## Import benötigter Bibliotheken

In [1]:
# import of necessary libraries
import nltk 
from nltk.stem.lancaster import LancasterStemmer
stemmer = LancasterStemmer()

import datefinder
import numpy
import tflearn 
import tensorflow
import random 
import json
import pickle
import spacy
import sqlite3
import re
import os
import sqlite3
from difflib import SequenceMatcher

Instructions for updating:
non-resource variables are not supported in the long term
curses is not supported on this machine (please install/reinstall curses for an optimal experience)
Scipy not supported!


## Vorbereitungen 

#### intent classification

Das Neuronale Netzwerk wurde zuvor in einem separaten Notebook erstellt, trainert und gespeichert. Der folgende Code versucht das gespeciherte Neuronale Netzwerk zu laden. Falls dies nicht möglich sein sollte wird ein neues Neuronales Netzwerk erstellt und trainiert. Hierzu werden die Trainingsdaten "data.pickle" geladen und die Layer des Netztes definiert.

In [2]:
path = r"C:\Users\Sebi\OneDrive\Studium\Thesis_Chatbot\2. Intent Classififcation"
with open(path + "\data.pickle", "rb") as f:
    words, labels, training, output = pickle.load(f)

In [3]:
tensorflow.compat.v1.reset_default_graph()
    
# Creating the Neural Network
net = tflearn.input_data(shape= [None, len(training[0])]) #input layer Neurons = numer of words in training
net = tflearn.fully_connected(net, 8) #hidden layer fully connected with 8 neuron
net = tflearn.fully_connected(net, len(output[0]), activation="softmax" ) #output layer 6 Neurons = labels
net = tflearn.regression(net)

model = tflearn.DNN(net)

try:
    model.load(r"C:\Users\Sebi\OneDrive\Studium\Thesis_Chatbot\2. Intent Classififcation\model.tflearn")
except:
    model.fit(training, output, n_epoch=1000, batch_size=8, show_metric=True)
    model.save(r"C:\Users\Sebi\OneDrive\Studium\Thesis_Chatbot\2. Intent Classififcation\model.tflearn")

Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
INFO:tensorflow:Restoring parameters from C:\Users\Sebi\OneDrive\Studium\Thesis_Chatbot\2. Intent Classififcation\model.tflearn


In [4]:
model

<tflearn.models.dnn.DNN at 0x1090bcb0a90>

In [5]:
# Transformation des Inputs in Zahlen, um es lesbar für das Neuronale Netz zu machen
def bag_of_words(s, words):
    bag = [0 for _ in range(len(words))]
    
    s_words = nltk.word_tokenize(s)
    s_words = [stemmer.stem(word.lower()) for word in s_words]
    
    for se in s_words:
        for i, w in enumerate(words):
            if w == se:
                bag[i] = 1
    
    return numpy.array(bag)

#### slot filling

Für die Extraktion der verschiedenen Zusatzinformationen müssen unteranderem bereits bekannte Namen aus der Datenbank extrahiert werden und in Listen gespeichert werden. Eine Funktion gleicht den Input des Benutzers mit den Namen ab, um zu prüfen ob der vom Benutzer genannte Professor bekannt ist oder nicht. Zusätzlich werden zuvor trainerte Custom NER Modelle geladen und die Slot Filling definiert.

In [6]:
conn = sqlite3.connect("PROF_INFO_DB.db")
cur = conn.cursor()
cur.execute("select first_name FROM PROF_INFO_TABLE")
conn.commit()
rows = cur.fetchall()
conn.close()

#rows

In [7]:
#liste mit allen first_names
first_names = []
for i in range(len(rows)):
    answer = " ".join(rows[i])
    answer.strip()
    first_names.append(answer)

#first_names

In [8]:
conn = sqlite3.connect("PROF_INFO_DB.db")
cur = conn.cursor()
cur.execute("select last_name FROM PROF_INFO_TABLE")
conn.commit()
rows = cur.fetchall()
conn.close()

#rows

In [9]:
#liste mit allen last_names
last_names = []
for i in range(len(rows)):
    answer = " ".join(rows[i])
    answer.strip()
    last_names.append(answer)

#last_names

In [10]:
#lod the custom NER models
nlp_research_area = spacy.load(r"C:\Users\Sebi\OneDrive\Studium\Thesis_Chatbot\3. Slot Filling/custom_ner_model_research_area")
nlp_study = spacy.load(r"C:\Users\Sebi\OneDrive\Studium\Thesis_Chatbot\3. Slot Filling/custom_ner_model_study")

In [14]:
def slot_filling(inp, intent):
    # liste with intents that need information
    intent_telephone = ["prof_name_query_telephone","extract_new_telephone" ]
    intent_email=["prof_name_query_email","extract_new_email" ]
    intent_office=["prof_name_query_office", "extract_new_office"]
    intent_research_area=["prof_name_query_research_area", "extract_new_research_area"]
    intent_study=["prof_name_query_study", "extract_new_study"]
    intent_last_name=["prof_name_query_lastname"]
    intent_first_name=["prof_name_query_firstname"]
    intent_prof_contact=["prof_telephone_query_name", "prof_email_query_name", "prof_office_query_name", "prof_research_area_query_name", "prof_study_query_name"]
    intent_generic_conversation=["greeting","greeting_response","courtesy_greeting","courtesy_greeting_response","real_name_query", "goodbye","task_response"]
    #get the phone number
    if intent in intent_telephone:
        re_number_1 = r"[\d]{2}? [\d]{4} [\d]{3} [\d]{4}"
        re_number_2 = r"[\d]{2} [\(][\d]{1}[\)] [\d]{4} [\d]{3} [\d]{4}"
        extracted_info = re.compile("(%s|%s)" % (re_number_1, re_number_2)).findall(inp)
        
    elif intent in intent_email:
        extracted_info = re.findall('\S+@\S+', inp)
        
    elif intent in intent_office:
        extracted_info = re.findall('[A-Z].\d{1}.\d{2}', inp)
        
    elif intent in intent_research_area:
        doc = nlp_research_area(inp)
        for ent in doc.ents:
            if ent.label_ == "RESEARCH_AREA":
                extracted_info = ent.text    
    
    elif intent in intent_research_area:
        doc = nlp_study(inp)
        for ent in doc.ents:
            if ent.label_ == "STUDY":
                extracted_info = ent.text  
                
    elif intent in intent_last_name:
        for last_name in last_names:
            for word in inp.split():
                if SequenceMatcher(None, last_name, word).ratio() >= 0.7:
                    extracted_info = last_name
    
    elif intent in intent_first_name:
        for first_name in first_names:
            for word in inp.split():
                if SequenceMatcher(None, first_name, word).ratio() >= 0.7:
                    extracted_info = first_name
        
    elif intent in intent_prof_contact:
        for last_name in last_names:
            for word in inp.split():
                if SequenceMatcher(None, last_name, word).ratio() >= 0.7:
                    extracted_info = last_name
                    
    elif intent in intent_generic_response:
        extracted_info = "generic intent no need for info extraction"
    
    return extracted_info

## Chat

In [12]:
def chat():
    log = []
    print("Start talking with me!(type quit to stop):")
    while True:
        inp = input("You: ")
        log.append(inp) #saves all input of user in list
        if inp.lower() == "quit":
            print("Goodbye :)")
            break
        
        # predict the intent
        results = model.predict([bag_of_words(inp, words)])[0] #output is just a probability for each label
        results_index = numpy.argmax(results) #index of greatest value
        intent = labels[results_index] #output is the most probable label
        print(intent)
        extracted_info = slot_filling(inp, intent)
        
        print(extracted_info)
        


    return log # returns list of the inputs

In [13]:
chat()

Start talking with me!(type quit to stop):
You: Hello
greeting
test
You: How are you?
courtesy_greeting
test
You: What is the email of mr. lanquillon
prof_email_query_name
Lanquillon


KeyboardInterrupt: Interrupted by user