<a href="https://colab.research.google.com/github/KatharinaGardens/computational-linguistics.github.io/blob/Week-7/Week_7_Seminar.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# LELA32051 Computational Linguistics Week 7

This week we are going to take a look at dialogue systems - first at chatbots and then at task-based dialogue systems.

## Rule-based chatbot: Eliza

As described in this week's lecture, Eliza is a chatbot that simulates a Rogerian therapist, making use of a set of rules in the form of regular expressions. At the heart of Eliza are the substitution function (re.sub in Python) and grouping. We'll start with a quick recap as to what these are.

We need to import the Regular Expressions module in Python.

In [1]:
import re

This gives as access to the very useful function re.sub.

### re.sub()

This finds all occurences of a given sequence and replaces it with a sequence provided:

In [2]:
utt = 'walked'
re.sub('ed','ing',utt)

'walking'

### Groups

Grouping is a very powerful technique for picking out substrings from a string that matches a specified pattern. Parentheses are used to indicate the start and end of the substring. It is very powerful when combined with substitution.

You can use parentheses to capture a particular substring within a pattern and then use it in your replacement string within sub. For example:


In [3]:
utt = "procrastinating"
re.sub('([a-z]+)ing','\\1ed',utt)

'procrastinated'

In [11]:
utt = "procrastinating on my homework"
utt = re.sub('([a-z]+ing)(.+)','Why are you \\1\\2?',utt)
re.sub('my','your',utt)

'Why are you procrastinating on your homework?'

## A very simple Elizabot

The code below implements a very simple Eliza. The function respond takes an utterance as input and using re.sub to generate responses. The loop below the function creates a simple interface that takes user input and prints the response.

We can extend Eliza's ability by adding additional rules.

In [75]:
def respond(utt):
  utt = re.sub('hello my name is (.+)','Hello \\1, my name is Eliza. How are you feeling today?', utt)
  utt = re.sub('i am feeling (.+ )*(.+)', 'Do you often feel \\2?',utt)
  utt = re.sub('yes since i (.+)ed(.+)', 'Can you tell me about \\1ing\\2?',utt)
  utt = re.sub('my ([^name])', 'your \\1', utt)
  return utt

In [76]:
utt = ""
while utt != 'goodbye':
    utt = input('> ')
    reply = respond(utt)
    if reply != utt:
        print(reply)
    else:
        if utt != "goodbye":
            print("Can you rephrase that?")

> yes since i changed my job
Can you tell me about changing your job?
> goodbye


### Activity

Add patterns (using substitutions and grouping) to the respond function that will allow Eliza to conduct both of these conversations. Test your system by conducting the conversation with Eliza.

User: hello my name is emma <br>
Eliza: Hello emma my name is Eliza. How are you feeling today? <br>
User: i am feeling very happy <br>
Eliza: Do you often feel happy? <br>
User: yes since I started my new job <br>
Eliza: Can you tell me about starting your new job? <br>

User: hello my name is john <br>
Eliza: Hello john, my name is Eliza. How are you feeling today? <br>
User: i am feeling pretty happy <br>
Eliza: Do you often feel happy? <br>
User: yes since I moved house <br>
Eliza: Can you tell me about moving house? <br>

### Reverse engineering the NLTK Eliza

There have been many implementations of Eliza over the years. One version is built into the NLTK toolkit. This can be run as follows:


In [45]:
import nltk

In [77]:
nltk.chat.eliza.demo()

Therapist
---------
Talk to the program by typing in plain English, using normal upper-
and lower-case letters and punctuation.  Enter "quit" when done.
Hello.  How are you feeling today?
>I am feeling annoyed
How do you feel about being feeling annoyed?
>I am feeling annoyed
How long have you been feeling annoyed?
>For a few hours
Why do you say that for a few hours?
>For a few hours
I see.
>For a few hours
Can you elaborate on that?
>I am feeling sad
How do you feel about being feeling sad?
>I am feeling sad
How do you feel about being feeling sad?
>I am feeling sad
How do you feel about being feeling sad?
>I am feeling annoyed
Did you come to me because you are feeling annoyed?
>Yes
You seem quite sure.
>I am sure
How do you feel about being sure?


KeyboardInterrupt: Interrupted by user

Activity: Conduct a four-turn-each conversation of your own with the NLTK Eliza. Adds the substitution that you think Eliza is using to generate the responses to your own chatbot using the code below. Where you find Eliza's response to be lacking, update the substitution to give a better response.

In [89]:
def respond(utt):
  utt = re.sub('hello my name is (.+)','Hello \\1, my name is Eliza. How are you feeling today?', utt)
  utt = re.sub('[i|I] am feeling ([^.|?|!]+)[.|?|!]*', 'How long have you been feeling \\1?', utt)
  utt = re.sub('([F|f])(or .+)','Why do you say that f\\2?',utt)
  utt = re.sub('I am (.+)', 'How do you feel about being \\1?',utt)
  return utt

In [91]:
utt = ""
while utt != 'goodbye':
    utt = input('> ')
    reply = respond(utt)
    if reply != utt:
        print(reply)
    else:
        if utt != "goodbye":
            print("Can you elaborate on that?")

> hello my name is cassie
Hello cassie, my name is Eliza. How are you feeling today?
> I am feeling annoyed
How long have you been feeling annoyed?
> For a few hours
Why do you say that for a few hours?
> I don't know
Can you elaborate on that?
> goodbye


## Corpus-based chatbots

Training and running a corpus-based chatbot takes more steps than we have time for today. If you want to have a go in your own time, then you will find a tutorial for doing so in Pytorch here:

https://pytorch.org/tutorials/beginner/chatbot_tutorial.html

There is a link at the top of the page to open the notebook in Colab.

To run this in colab easily just put the following code block at the top and run it before working through the rest of the notebook.

In [92]:
! wget https://zissou.infosci.cornell.edu/convokit/datasets/movie-corpus/movie-corpus.zip
! unzip movie-corpus.zip
! mkdir data
! mv movie-corpus data/

--2025-03-10 17:13:05--  https://zissou.infosci.cornell.edu/convokit/datasets/movie-corpus/movie-corpus.zip
Resolving zissou.infosci.cornell.edu (zissou.infosci.cornell.edu)... 128.253.51.179
Connecting to zissou.infosci.cornell.edu (zissou.infosci.cornell.edu)|128.253.51.179|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 40854701 (39M) [application/zip]
Saving to: ‘movie-corpus.zip’


2025-03-10 17:13:06 (115 MB/s) - ‘movie-corpus.zip’ saved [40854701/40854701]

Archive:  movie-corpus.zip
   creating: movie-corpus/
  inflating: movie-corpus/utterances.jsonl  
  inflating: movie-corpus/conversations.json  
  inflating: movie-corpus/corpus.json  
  inflating: movie-corpus/speakers.json  
  inflating: movie-corpus/index.json  


### Intent classification
We are now going to at task-based dialogue systems and specifically the sub-task of intent classification (the focus of your coursework). You should write rules to uniquely and correctly identify each of the following utterances:

PlayMusic: play the weather girls

AddToPlaylist: add this to my italian film soundtrack playlist

RateBook: give the restaurant guidebook 5 stars

SearchScreeningEvent: find screenings of the book thief at around 7

BookRestaurant: book me a table outside for 2 for dinner at the national theatre restaurant

GetWeather: will it be warm enough to eat dinner outside at around 7 tonight

SearchCreativeWork: find me songs films or books about restaurants

Here is the function from your coursework notebook:

In [128]:
import random

def assign_intent(utt, verbose=False):
  PlayMusic_Pattern = re.compile("play|music|song|album")
  AddToPlaylist_Pattern = re.compile("add|playlist")
  RateBook_Pattern = re.compile("rate|book|novel|star|give")
  SearchScreeningEvent_Pattern = re.compile("screening|find")
  BookRestaurant_Pattern = re.compile("^book|restaurant|food|table|dinner|lunch|meal")
  GetWeather_Pattern = re.compile("get|weather (on|in)|warm|hot|cold|cool|outside")
  SearchCreativeWork_Pattern = re.compile("creative|find|about|songs?|films?|books?")

  weights = {}
  weights['PlayMusic'] = len(re.findall(PlayMusic_Pattern,  utt))
  weights['AddToPlaylist'] = len(re.findall(AddToPlaylist_Pattern,  utt))
  weights['RateBook'] = len(re.findall(RateBook_Pattern,  utt))
  weights['SearchScreeningEvent'] = len(re.findall(SearchScreeningEvent_Pattern,  utt))
  weights['BookRestaurant'] = len(re.findall(BookRestaurant_Pattern,  utt))
  weights['GetWeather'] = len(re.findall(GetWeather_Pattern,  utt))
  weights['SearchCreativeWork'] = len(re.findall(SearchCreativeWork_Pattern,  utt))
  if verbose:
      print(weights)
  if max(weights.values()) == 0:
      return random.choice(list(weights.keys()))
  else:
      weights_as_list = list(weights.items())
      random.shuffle(weights_as_list)
      weights=dict(weights_as_list)
      return max(weights, key=lambda key: weights[key])

In [126]:
example_inputs = ['play the weather girls','add this to my italian film soundtrack playlist','give the restaurant guidebook 5 stars','find screenings of the book thief at around 7','book me a table outside for 2 for dinner at the national theatre restaurant','will it be warm enough to eat dinner outside at around 7 tonight','find me songs films or books about restaurants']
[print(str(assign_intent(utt)) + " : " + utt) for utt in example_inputs]

PlayMusic : play the weather girls
AddToPlaylist : add this to my italian film soundtrack playlist
RateBook : give the restaurant guidebook 5 stars
SearchScreeningEvent : find screenings of the book thief at around 7
BookRestaurant : book me a table outside for 2 for dinner at the national theatre restaurant
GetWeather : will it be warm enough to eat dinner outside at around 7 tonight
SearchCreativeWork : find me songs films or books about restaurants


[None, None, None, None, None, None, None]