# Week 5: Parsing text

This week you will be learning about methods for parsing text in Python, including [Python String methods](https://www.w3schools.com/python/python_ref_string.asp) and [Python Regular Expressions (regex)](https://www.w3schools.com/python/python_regex.asp).

Before you get started though, let just make sure that this notebook is setup to run using the `nlp` conda environment that you created last week.

To set this notebook to the right environment, click the **Select kernel** button in the top right corner of this notebook, then select **Python Environments...** and then select the environment `nlp`.

To double check you have done this correctly, hit the run cell button (▶) on the cell below:

In [None]:
import os
print(os.environ['CONDA_DEFAULT_ENV'])

## Python String methods

Python [has many String methods](https://www.w3schools.com/python/python_ref_string.asp) that allow you to parse and manipulate strings. Lets look at some of them.

##### Case manipulation:

In [None]:
my_string = 'My name is Terence Broad.'
print(my_string)
print(my_string.lower())
print(my_string.upper())
print(my_string.swapcase())

##### Replace

In [None]:
my_string = 'I like apples.'
print(my_string)
print(my_string.replace('I', 'You'))

##### Split

In [None]:
my_string = 'take key and open door'
print(my_string)
print(my_string.split(' '))

##### Find

In [None]:
my_string = 'take key and open door'
print(my_string)
print(my_string.find('open'))
print(my_string.find('door'))
print(my_string.find('lock'))

## Regular Expressions

Regular Expressions (regex) allows us to do more complex pattern matching with strings.

[regex101.com](https://regex101.com/) is a handy website for creating and testing regex patterns. 

The [regex cheatsheet](https://cheatography.com/davechild/cheat-sheets/regular-expressions/) is a handy resource for checking what each symbol does in regex.

Regex is much older than Python, and exists in many programming languages (though sometimes with slight variations in how they implement instructions).

Regex is so commonly used in Python code it has it's own in-built library.

Run the cell below to import it:

In [6]:
import re

This notebook won't go over all the possible things you can do with regex or how to construct them, for that refer to the slides, or to the [regex cheatsheet](https://cheatography.com/davechild/cheat-sheets/regular-expressions/).

If you want to practice making regex's, you can do the activities on [regex golf](https://alf.nu/RegexGolf?world=regex&level=r00) or [regex crossword](https://regexcrossword.com/).

The python regex library has many similar functions to the sting methods, but with the pattern matching power of regex:

##### Search

Return True of False if there is a match

In [None]:
test_strings = ['1234', '123', 'hello', '4567']

for test_str in test_strings:
    match = re.search(r'\d{4}', test_str)
    if match:
        print('its a match!')
    else:
        print('not a match')

##### Match

You can use the match function to extract the first match in your regex, including and groups defined by parenthesis: `()`

In [None]:
test_string = '01 April 2024'

match = re.match(r'(\d\d) (\w*) (\d\d\d\d)', test_string)

print(match.group(0))
print(match.group(1))
print(match.group(2))
print(match.group(3))

##### Find all

Find all matches in a string:

In [None]:
test_string = '1234, 5678, hello, 9876'
matches = re.findall(r'\d{4}', test_string)

for match in matches:
    print(match)

##### Substitute 

Substitute a match with a new string:

In [None]:
test_string = '1234, 5678, hello, 9876'
new_string = re.sub(r'\d{4}', '****' ,test_string)

print(new_string)

##### Split

Split a string using a regex pattern:

In [None]:
test_string = 'dogs, cats, guinea pigs and hamsters'
animals = re.split(r'and|,' ,test_string)

print(animals)

## Tasks in notebook

### Task 1: Process String

Write a function for pre-processing your string. Your function should:

- (A) Make the string lowercase
- (B) Remove all [non-ascii characters from the string](https://stackoverflow.com/a/35579848)
- (C) Return the modified string

In [None]:
def process_input(user_input):
    # Put your code here

Test your function below:

In [None]:
my_str = "🌟 Hey ThErE!! 🌈 Have YoU HeArD tHe LaTeSt NeWs?! 😱 It’S tOtAlLy AmAzInG! ✨😆 I jUsT gOt BaCk FrOm ThE BeAcH 🏖️ AnD ThE wEaThEr wAs PeRfEcT! ☀️ I eVeN sAw A dOlPhIn! 🐬 (I cAn’T eVeN! 😂)"

new_str = process_input(my_str)
print(new_str)

### Task 2: Replace pronouns 

One functionality of ELIZA is that it will replace pronouns so that statements made in the first and second person by the user get swapped when the text is part of a statement is relayed back to the user as a question by the chatbot. 

Here are the pronoun pairings, given as a list of tuples. The first variable in the tuple pair is the regex to find a string pattern and the second is the replacement string. 

In [6]:
pronoun_pairings = [
    (r"am", "are"),
    (r"was", "were"),
    (r"i", "you"),
    (r"i'd", "you would"),
    (r"i've", "you have"),
    (r"i'll", "you will"),
    (r"my", "your"),
    (r"are", "am"),
    (r"you've", "I have"),
    (r"you'll", "I will"),
    (r"your", "my"),
    (r"yours", "mine"),
    (r"you", "me"),
    (r"me", "you")
]

In the function below write a `for` loop iterates over every tuple in `pronoun_pairings`.

Use regex to find and substitute the matching string. 

> Tip: You will need to find a way of breaking out of the for loop the first time that a swap has been performed, otherwise you may end up swapping it twice and ending up with the same string you started with. 
> 
> Try combing `re.match` and `re.sub`, or use `re.subn` to get a count of how many replacements have been made.

Once substituted, return the string.

In [None]:
def swap_pronoun(self, input_str):        
    # Put your code here

Test your code here:

In [None]:
test_strings = ["was good", "am happy", "your help", "you've made","i'll try", "i've messed up"]

for test_str in test_strings:
    print(f"input str: '{test_str}', output str:'{swap_pronoun(test_str)}' ")

Now lets test it with some more difficult examples:

In [None]:
test_strings = ["i like pie", "youve helped", "ive given it", "jam and ham","yummy", "fare"]

for test_str in test_strings:
    print(f"input str: '{test_str}', output str:'{swap_pronoun(test_str)}' ")

Can you figure text fragments in `pronoun_pairs` have been wrongly substituted in here?

Try and edit the regex's in pronoun pairs to prevent these erroneous substitutions being made.
> Tip: Use `\b` to set word boundaries before and after the text in the regexs and use `?` to characters where the string is valid with they are included 0 or 1 times in the text.

### Task 3: Identify pattern and return response 

The first text pattern that your ELIZA chatbot is going to respond to is the statement 'I need {...}'. When the chatbot finds the string 'I need {...}' it responds with one of the three following statements:

- Why do you need {...}?
- Are you sure you need {...}?
- Would it really help you to get {...}?

Create a regex that can match a string with the pattern 'I need {...}', e.g. 'I need help', 'I need a holiday', 'I need a pet to keep me company' 

The regex will need to extract whatever text follows 'I need' as a separate group, this will then be extracted and used to formulate a response that is posed to the user.
> Tip: You can create groups in your regex with parenthesis `()`

Try building and testing your regex in [regex101.com](https://regex101.com/) before copying and pasting it in the cell below. 
> Tip: The [regex cheatsheet](https://cheatography.com/davechild/cheat-sheets/regular-expressions/) is a handy resource for checking what each symbol does.

In [14]:
need_regex = r''

Here are the preset responses the the statement 'I need {x}' in code form:

In [15]:
need_responses = ['Why do you need {x}?','Are you sure you need {x}?', 'Would it really help you to get {x}']

Now write a function that takes in a regex and list of responses.

This function should: 
- Use `re.match` if the regex matches
- Extract the string from the group which captures the text that follows 'I need'
- Use the function `swap_pronoun` to change the pronouns in the extracted string
- Use `re.sub` to substitute '{x}' in the response with the extracted string
- return the string

If there is no match then return `None`

In [None]:
def match_and_respond(regex, responses, input_str):
    # Your code goes here

Now test your code:

In [None]:
test_strings = ["i need a dog", "i need a holiday", "i need my life back", "don't tell me what i need"]

for test_str in test_strings:
    new_str = match_and_respond(need_regex, need_responses, test_str)
    print(f"Input str: '{test_str}', Output str: '{new_str}'")

The first three should return a response from `need_responses` with the statement substituted, the last one should return `None`.

## ELIZA chatbot tasks

In the file [week-5b-eliza-chatbot.py](week-5b-eliza-chatbot.py),

### Task 4: Create farewell message

In `ELIZA` create a class member function `farewell` that overrides `farewell` in chatbot_base.py.

Your function should randomly print out or return one of at least three messages (feel free to add your own to this list): 'Goodbye', 'Have a nice day', 'I hope you had a productive therapy session, see you next time'.

Your function should also set the boolean value `self.conversation_is_active` from `True` to `False`.

### Task 5: Add functions and variables from tasks to the chatbot

Add the functions that you have created for tasks [1](#task-1-process-string), [2](#task-2-replace-pronouns) and [3](#task-3-identify-pattern-and-return-response) as member functions to the chatbot Eliza.

Don't forget to use the keywords `self` when defining the functions!

Also make sure to add the variable `pronoun_pairings` from [Task 2](#task-2-replace-pronouns) to your chatbot when the constructor is called.

### Task 6: Generate response 

Create a member function `generate_response` in the class `ELIZA` that overrides `generate_response` from ChatbotBase. This function should return a response a String. 

This function should have a series of `if-else` statements that do the following:
- Use `re.search` to look for the words 'hi', 'hello', 'hey' and respond with one of the following:
  - 'Hello',
  - 'Nice to meet you',
  - 'We have already introduced ourselves...'
- Use `re.search` to look for the words 'bye', 'goodbye', 'quit', 'exit', then call the function `self.farewell`.
- Use the function `match_and_respond` with the regex `need_regex` and `need_responses` to create a response statement. If a response is generated return it, otherwise `pass` this.
- In the final `else` statement randomly return one of default responses in `self.default_responses`

### Task 7: Test your chatbot 

Now you can test your chatbot. Try putting in statements to test all of the conditions that are in the function `generate_response` from Task 6.

Is it behaving as expected, if not then try to figure out what is going wrong in your code. If you suspect that a regex is causing a bug, then it is a good idea to test this separately in [regex101.com](https://regex101.com/)

## Bonus tasks

Here are some bonus tasks:

### Task A: Add responses for different inputs

Here is a selection of more patterns and responses, add some (or all) of these to your chatbot code. 
> Tip: You may want to put these in a data structure owned by the class `ELIZA` or in a separate config file that gets loaded into a data structure when the constructor is called.

'{x} computer {x}'
- Are you really talking about me?
- Does it seem strange to talk to a computer?
- How do computers make you feel?
- Do you feel threatened by computers?

'Is it {x}'
- Do you think it is {x}?
- Perhaps it's {x} -- what do you think?
- If it were {x}, what would you do?
- It could well be that {x}.

'It is {x}'
- You seem very certain.
- If I told you that it probably isn't {x}, what would you feel?

'Can you {x}?'
- What makes you think I can't {x}?
- If I could {x}, then what?
- Why do you ask if I can {x}?

'Can I {x}?'
- Perhaps you don't want to {x}.
- Do you want to be able to {x}?
- If you could {x}, would you?

'You are {x}'
- Why do you think I am {x}?
- Does it please you to think that I'm {x}?
- Perhaps you would like me to be {x}.
- Perhaps you're really talking about yourself?

'You're {x}'
- Why do you say I am {x}?
- Why do you think I am {x}?
- Are we talking about you, or me?

'I don't {x}'
- Don't you really {x}?
- Why don't you {x}?
- Do you want to {x}?

'I feel {x}'
- Good, tell me more about these feelings.
- Do you often feel {x}?
- When do you usually feel {x}?
- When you feel {x}, what do you do?

'I have {x}'
- Why do you tell me that you've {x}?
- Have you really {x}?
- Now that you have {x}, what will you do next?

'I would {x}'
- Could you explain why you would {x}?
- Why would you {x}?
- Who else knows that you would {x}?

'Is there {x}'
- Do you think there is {x}?
- It's likely that there is {x}.
- Would you like there to be {x}?

'My {x}'
- I see, your {x}.
- Why do you say that your {x}?
- When your {x}, how do you feel?

'You {x}'
- We should be discussing you, not me.
- Why do you say that about me?
- Why do you care whether I {x}?

'Why {x}'
- Why don't you tell me the reason why {x}?
- Why do you think {x}?

'I want {x}'
- What would it mean to you if you got {x}?
- Why do you want {x}?
- What would you do if you got {x}?
- If you got {x}, then what would you do?

'{x} mother {x}'
- Tell me more about your mother.
- What was your relationship with your mother like?
- How do you feel about your mother?
- How does this relate to your feelings today?
- Good family relations are important.

'{x} father {x}'
- Tell me more about your father.
- How did your father make you feel?
- How do you feel about your father?
- Does your relationship with your father relate to your feelings today?
- Do you have trouble showing affection with your family?

'{x} child {x}'
- Did you have close friends as a child?
- What is your favorite childhood memory?
- Do you remember any dreams or nightmares from childhood?
- Did the other children sometimes tease you?
- How do you think your childhood experiences relate to your feelings today?

'{x}?'
- Why do you ask that?
- Please consider whether you can answer your own question.
- Perhaps the answer lies within yourself?
- Why don't you tell me?

'Yes'
- You seem quite sure.
- OK, but can you elaborate a bit?

### Task B: Error handling 

In the function `process_input` do some checks on the string:
- If it exceed 500 characters in length, throw a `ValueError` with the message 'input string is too long, keep responses within 500 characters'
- If the variable `user_input` is not a string, throw a `ValueError` with the message 'input string is too long, keep responses within 500 characters'
>Tip: use the function `isinstance()` to check a variable's type.
- Override the function `respond` from `ChatbotBase` and call the function `process_input` in a [Try-Except block](https://www.w3schools.com/python/python_try_except.asp), [catch the error](https://stackoverflow.com/a/4690655) and return the error message as a string as a response.

### Task C: Log chat history

1. Log the chat history between the user and the chatbot, any time the chatbot outputs a message in `respond` `greeting` or `farewell` append this string to a list called `self.chat_log`, make sure to prefix your string with 'ELIZA: ' before logging it.
2. In the function `respond` or `receive_input` (you will have to override these from ChatbotBase in ELIZA) or `process_input`, append the response from the user to the list `self.chat_log`, make sure to prefix your string with 'User: ' before logging it.
3. Create a [destructor](https://www.geeksforgeeks.org/destructors-in-python/) for your `ELIZA` class with the function `__del__`. In this function save the chat-log to a text file, where each string in the list self.chat_log is output to a new line.
> Tip: Check [this stackoverflow thread](https://stackoverflow.com/questions/33686747/save-a-list-to-a-txt-file) for how to print a list to a text file.
4. Use the code example from [Week-4-Classes.ipynb](Week-4-Classes.ipynb) to name this file with the date and time of the session.