## Class: AIT 526                                             September 15, 2021
### Professor:  Dr. Duoduo (Lindi) Liao
### Team 5: 
- Anh "Tim" Hien Bach
- Robert "Robb" Jay Dunlap
- Vishnu Lasya Marthala
- David Earl Swanson

#### Programming Assignment – Chatbot Eliza
Write an Eliza program in Python. No chat packages/functions/libraries are allowed to use. The program should be called eliza.py, and it should run from the command line with no arguments. NOTE: if you use **jupyter notebook**, you can have all comments and required running outputs/logs in the notebook file(s). And then save into HTML(s) and zip all notebook file(s), HTML files, and other files into ONE zip file. Please submit only one zip file.

Your program should engage in a dialogue with the user, with your program Eliza playing the role of a psychotherapist. Your program should be able carry out "word spotting", that is it should recognize certain key words and respond simply based on that word being present in the input. It should also be able to transform certain simple sentence forms from statements (from the user) into questions (that Eliza will ask). Also, try to personalize the dialogue by asking and using the user's name. 

In addition, your program should be robust. If the user inputs gibberish or a very complicated question, Eliza should respond in some plausible way (I didn't quite understand, can you say that another way, etc.). “Word spotting”, sentence transformation, and robustness are the minimum requirements for your code. You can implement additional functionalities, inspired by the dialogues presented in Weizenbaum paper. You may receive up to 1 bonus point max for any additional functionalities. 

This program should rely heavily on the use of regular expressions, so please make sure to review some introductory material in Learning Python, Programming Python, or some other source before attempting this program.

 

#### Tasks
- 

### Be sure to comment your code. In particular, explain what words you are spotting for (and why) and what statement forms you are converting into questions (and why). Also make sure your name, class, etc. is clearly included in the comments.

Due: September 21, 2021

### ELIZA

***

### Load libraries 

In [1]:
# Load libraries
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import ast
import nltk 
import re
from nltk.corpus import wordnet 
from nltk.tokenize import RegexpTokenizer
from nltk.tokenize import word_tokenize,sent_tokenize
import spacy
import random
import sys
import os

from nltk import pos_tag
from nltk import RegexpParser, RegexpChunkParser
from nltk.tokenize import word_tokenize
from string import punctuation
#from nltk.stem import WordNetLemmatizer
#import contractions

# Using the TreebankTagger instead of the Perceptron model in NLTK (TreebankTagger is muuch better)
# I found an explanation and the below line of code in this SO post:
# https://stackoverflow.com/questions/30821188/python-nltk-pos-tag-not-returning-the-correct-part-of-speech-tag
treebankTagger = nltk.data.load('taggers/maxent_treebank_pos_tagger/english.pickle')

### We decided to "load" or read in several dictionaries that support the algorithm in each function rather than hard coding them into the notebook.
####    This section loads those dictionaries
Code written by:  The full team

In [2]:
# Read in a dictionary of contractions
file = open("contraction_dictionary.txt", "r")
contents = file.read()
contractions = ast.literal_eval(contents)
file.close

# Read in a dictionary of the emergency words
file = open("emergency_words.txt", "r")
content = file.read()
emergency_words = content.split(",")
file.close

# Read in the emergency response
file = open('emergecy_response.txt', 'r', encoding = 'utf-8')
emergency_response = file.read()
file.close

# Open the nonspecific responses that will be used when Eliza doesn't understand the patient's statement.
file = open("gibberish_responses.txt", "r")
nonspecific_response = file.readlines()
file.close

# Open the list of words to be spotted and find synonyms of them to creaste a dictionary of spotting words
file = open("spotted_words.txt", "r")
list_key = file.read()
file.close

dic_spotwords={}
ListKey = list_key.split()
for word in ListKey:
    dup_synonyms=[]
    for synonym in wordnet.synsets(word):
        for lemma in synonym.lemmas():
            dup_synonyms.append(lemma.name())

    unique_synonyms=[]
    for synonym in dup_synonyms:
        if synonym not in unique_synonyms:
            unique_synonyms.append(synonym)
    dic_spotwords[word]=unique_synonyms

# I don't think we use these regular expressions
reg_exp_keywords={
    r'.*[i|I] am (sad)':["Why do you think you are {0}?"],
    r'.*[i|I] am (dull)':["Why do you think you are {0}?"] 
    # r'.*I am (depressed|sad).*':["Why do you think you are" r' \1' "?" ] ,
    # r'.*I am (depressed|sad).*':["Why do you think you are (0) ?" ] 
    }

#These need to be greatly ezxpanded and turnedinto a file for reading
standardResponses={
       'hello':'Cool! Go ahead.',
       'help':'I am here to help you with your emotions and thought process.',
       'happy':'Awesome! What makes you happy?',
       'gibberish':"I'm not sure I'm following you, please explain."
    }

# This section loads the necessary information for the "alternate_phrase" function below. The csv contains:
# (1) RE patterns for creating specific chunks from using POS labels
# (2) Identifying the portion of the user input that will be "turned" back to them in the response
# (3) The framework response phrases that the portion identifiied in (2) will be married to

input_terms_df = pd.read_csv('input_chunk_terms.csv', 
                             dtype={'capture_expression':'str', 
                                     'insertion':'bool', 
                                     'response_template':'str'}
                            )

### Functions used by the main program

In [3]:
# Code written by Vishnu Lasya Marthala and edited by David Swanson 
    
def standard_phrase(text):
    for token in text:    
        for key, value in dic_spotwords.items(): 
            if token in value:
                for key_r,value_r in standardResponses.items():
                    if(key == key_r):
                        return(value_r)
    return("")

In [4]:
# Code by Robb Dunlap
# A function for counting the occurences of specific POS in a sentence - 
#his function takes in a target sentence and the a regular expression phrase, it then
# counts the occurence of the phrase in the 
def occurence_in_sent(sentence, regex):
    regex_of_word = re.compile(regex)
    occ_counter = 0
    for element in sentence:
        if re.match(regex_of_word, element[1]):
            occ_counter += 1
    return occ_counter

In [5]:
# Alternative method for parsing user input and generating a response - code written by Robb Dunlap

def alternate_phrase(user_sentence):
    
    # declaring empty strings that will hold the individual words from the sentence and the reconstructed 
    # sentence after the contractions are expanded 
    new_sentence = ""
    new_word = ""
    for word in user_sentence.split():
        for key, root in contractions.items():
            if key == word:
                word = root
        new_word = word+" "
        new_word += " "
        # Had to put the words back together as a string instead of as a list because the two separate base words on the contraction
        # end up being one entry in a list. So instead, reassemble the sentence as a string and split it again to be sure to get
        # individual words
        new_sentence += new_word

    # need to strip off extra space added to the end of the sentence
    new_sentence = new_sentence[:-2]

    # create a list of the sentence words in order
    split_sent = new_sentence.split()
    sent_words_wo_punct = [w.strip(punctuation) for w in split_sent]


    tokens_tag = treebankTagger.tag(sent_words_wo_punct)

    # using the occurence_in_sent function to count the number of nouns or verbs
    # if the count of either is greater than 4 then the sentence is too complicated
    # for turning the user's input. Instead, this module will return an empyt string
    # so the main loop can call the "gibberish/too complicated response"

    noun_pos_regex = "NN.?"
    count_of_nouns = occurence_in_sent(tokens_tag, noun_pos_regex)

    verb_pos_regex = "VB.?"
    count_of_verbs = occurence_in_sent(tokens_tag, verb_pos_regex)

    if count_of_nouns > 4 or count_of_verbs > 4:
        return elize_response

    else:
        index_in_df = -1
        # for statement in chunk_list:
        for statement in input_terms_df['capture_expression']:
            pattern_found = 0
            index_in_df+=1
            phrase_pattern = statement
            chunker = RegexpParser(phrase_pattern)
            if index_in_df<2:
                output = chunker.parse(tokens_tag)
            else:
                output = chunker.parse(output)

            key_word = []
            for item in output:
                if isinstance(item, nltk.tree.Tree):
                    pattern_found = 1
                    regex_desired_word = re.compile(input_terms_df.iloc[index_in_df,3])
                    for thing in item:
                        if re.match(regex_desired_word, thing[1]):     
                            key_word = thing[0]
            if pattern_found == 1:
                break

        # the below section captures the desired portion of the sentence so it can be flipped into the response

        position_in_sent_word = 0
        counter = 0
        for element in sent_words_wo_punct:
            if key_word == element:
                counter = position_in_sent_word
            position_in_sent_word += 1

        offset_in_sentence = counter + input_terms_df.iloc[index_in_df,4]
        turned_portion_of_sent = sent_words_wo_punct[offset_in_sentence:]

        # This section appends or inserts the flipped portion of the input text to the appropriate template 
        if pattern_found == 1:
            if not input_terms_df.iloc[index_in_df,1]:
                offset_in_sentence = counter + input_terms_df.iloc[index_in_df,4]
                turned_portion_of_sent = sent_words_wo_punct[offset_in_sentence:]
                if not turned_portion_of_sent:
                    return elize_response
                else:
                    temp_holder = ""
                    separator = " "
                    temp_holder = separator.join(turned_portion_of_sent)
                    question_mark = "?"
                    temp_holder.join(question_mark)
                    eliza_response = input_terms_df.iloc[index_in_df,2]+" "+temp_holder+"?"
            else:
                offset_in_sentence = counter + input_terms_df.iloc[index_in_df,4]
                turned_portion_of_sent = sent_words_wo_punct[offset_in_sentence:]
                if not turned_portion_of_sent:
                    return elize_response
                else:
                    temp_holder = ""
                    separator = " "
                    temp_holder = separator.join(turned_portion_of_sent)
                    turned_portion_of_sent = str(temp_holder)
                    eliza_response = input_terms_df.iloc[index_in_df,2].replace("XXYYMM",turned_portion_of_sent)
            return eliza_response    

        else:
            elize_response = ""
            return elize_response

### Eliza welcomes the patient, sets the tone for the conversation, and establishes an end of session codeword "quit."
#### We decided that Eliza could asked for the patient's name which is used to tag the patient's inputs lines as well as a salutation.
Code written by David Swanson and Vishnu Lasya Marthala

**To use Spacy en_core_web_sm** I had to load add the Spacy module to my Conda environment and then I also had to use the following code in a terminal in my environment to add the library: &emsp; `conda install -c conda-forge spacy-model-en_core_web_sm`


#### The following code is a loop that
1. Tokenizes the sentances in the response
2. If the patient talks too long (more than two sentences) Eliza asks in several ways for the patient to slow down 
3. Tokenizes the words of each sentence to be sent to four functions in sequence.
    1. Has the patient made a statement that is dangerous to themselves of the theoropist (aka emergency words)? 
                     - If so, instruct the patient to seek help and end the session 
    2. Can Eliza respond using spotted words?
    3. Can a sophisticated regular expression algorithm make a response?
    4. If all these fail, Eliza assumes the answer was gibberish and randomly selects a response for the patient to talk more.
    
Code written by: David Swanson

In [None]:
# print("Eliza: Hello! My name is Eliza.  Welcome to my practice.")
print("Eliza: I don't want you to think of these sessions as therapy but rather as an opportunity for self-reflection and growth.")
print('Eliza: As this is a safe place, you can stop our session at anytime by typing "Quit."')
print("")

# Get the patient's name
print("Eliza: What's your name?")
print("")
user_input = input("Patient: ")

nlp = spacy.load("en_core_web_sm")

doc = nlp(user_input)  
usernames=[e for e in doc.ents if e.label_ == 'PERSON']

if doc.ents==():
    patientName = "Patient: "
    print("Eliza: That's ok, we don't need to use names. So how are you feeling today?")
else:
    patientName=usernames[0].text
    print('Eliza: Hello '+ patientName +'. '+"How are you feeling today?")

    
# Entering while loop for continuing discussion with the patient

loopAgain = True

# Top of the loop
while (loopAgain == True):
    
    # Get the patient's input 
    print("")
    user_input = input(patientName+": ")
    
# First check the patient's response to see if they want to end the session.
    if (user_input.lower() == "quit"):
        print("Eliza: I hope our session was helpful.  Goodbye",patientName)
        loopAgain = False
        break
        
# Start a counter to cycle through some random responses when the patients drones on
    lengthyReponse = 0
    
# Separate the response into individual sentences
    PatientResponse = nltk.sent_tokenize(user_input)
    # print(PatientResponse)
    
# Send the patient's response word by word to check for emergency words
# Code written by Anh "Tim" Hien Bach

    HaltFlag = 0
    for sentences in PatientResponse:
        PatientWords = nltk.word_tokenize(sentences)
        for words in PatientWords: 
            for EmWord in emergency_words:
                # If any emergency word is found, return with a standard cautionary response
                if re.search(EmWord, words):
                    print(patientName,": ", emergency_response)
                    loopAgain = False
                    break
            if loopAgain == False:
                break
        if loopAgain == False:
            break

# If the response is more than 2 sentences
    if loopAgain == True and len(PatientResponse)>2:
        lengthyReponse = lengthyReponse + 1

# Cycle through two responses if the patient inputs 3 sentences 
        if(len(PatientResponse)==3): 
            if(lengthyReponse == 1):
                print("Eliza: Let's go slower and take things one at a time. What is on your mind?")
            if(lengthyReponse > 1):
                print("Eliza: Again, slow down. Try again.")
                lengthy_response = 0
                
# Cycle through three three responses if the patient inputs more than 3 sentences 
        else: 
            if(lengthyReponse == 1):
                print("Eliza: Whoa, that's a lot to unpack. Let's try again with shorter thoughts.")
            if(lengthyReponse == 2):
                print("Eliza: Again, slow down. Try again.")
            if(lengthyReponse > 2):
                print("Eliza: You have alot on your mind. Let's start again with your first thought.")
                lengthyReponse = 0

# If the patient gives one or two sentence response, address each            
    else:
        if loopAgain == False:
            break
        for sentence in PatientResponse:
            
            # We don't weant contractions in our evaluation of the patient's input so we expand them 
            newSentence = ""
            for word in sentence.split():
                for key, root in contractions.items():
                    if key == word:
                        word = root
                    elif key == word.lower():
                        word = root
            
            # Tokenize the updated sentence into words
            tokenizer = RegexpTokenizer (r'\w+')
            words_only = tokenizer.tokenize(newSentence)
            
            # Debugging Code
            # print ("The total number of words in sentence", i, "of the user's input is:", len(words_only)) 
            # print (words_only)
    
            # Set ElizaResponse to NULL
            ElizaResponse = ""
            
            # Send the patient's sentence to a word spotter routine 
            ElizaResponse = standard_phrase(words_only)
            if ElizaResponse != "":
                print("Eliza1: ", ElizaResponse)
                break
                
            # If the standard phrase routine fails to provide a response send to the alternate function 
            # We use the untokenized here because the function processes the phrase differently than the 
            # standard function
            ElizaResponse = alternate_phrase(PatientResponse[0])
            if ElizaResponse != "":
                print("Eliza2: ", ElizaResponse)
                break
                    
            # If the standard function and the alternate function aren't able to generate a response, then
            # a random pick from the gibberish/too complicated responses is called.
            print("Eliza3: ", random.choice(nonspecific_response))

Eliza: I don't want you to think of these sessions as therapy but rather as an opportunity for self-reflection and growth.
Eliza: As this is a safe place, you can stop our session at anytime by typing "Quit."

Eliza: What's your name?



Patient:  Rob


Eliza: Hello Rob. How are you feeling today?



Rob:  I'm feeling great


Eliza2:  What does "feeling great" mean to you?



Rob:  It means that I'm happy


Eliza2:  Why do you feel happy?



Rob:  Everything is working out great


Eliza3:  I see, please continue.




Rob:  Well, school is going swimmingly


Eliza2:  Is "going swimmingly" important to you



Rob:  Sure, I like when things are going well


Eliza2:  Is "going well" important to you



Rob:  Yes, that too is important to me


Eliza2:  Tell me more about your statement "too is important to me".



In [None]:
PatientResponse