<a href="https://colab.research.google.com/github/SeanGMONeill/nlpworkshop_instructor/blob/main/Lesson3_Checkpoint.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Example of how student code should look at the end of this lesson:

In [9]:
# This command will install chatterbot-corpus, a library which contains a corpus of conversations in YAML format
# You can view these raw files in the chatterbot-corpus GitHub repo: https://github.com/gunthercox/chatterbot-corpus/tree/master/chatterbot_corpus/data/english
!pip install chatterbot-corpus

import chatterbot_corpus
from yaml import load
import inspect
import os
import random

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [10]:
# Initialize the run variable to True
run = True

def tokenize(msg):
  msg = msg.lower()
  tokens = msg.split(' ')
  return tokens


def remove_punctuation(msg):
  symbols = ['?','-',',',':',';','!']
  for symbol in symbols:
    msg = msg.replace(symbol, '')
  return msg

elements = {
    'hydrogen': 1,
    'oxygen': 8,
    'carbon': 3,
    'plutonium': 94,
    'helium': 2,
    'lithium': 3
}


def normalize_text(msg):
  msg = msg.lower()
  symbols = ['?','-',',',':',';']
  for symbol in symbols:
    msg = msg.replace(symbol, '')
  return msg


def choose_response(msg):
  try:

    # Fetch the list of possible responses
    options = lookup[normalize_text(msg)]
    # Return a randomly selected item from the list (using the Python random library)
    return random.choice(options)

  # Handle the case where the input isn't in the dictionary
  except KeyError:
    return None


# Create a dict of msg->response from the files in the corpus
def load_conversations_from_corpus():
  # 1) Get the location of the corpus YAML files installed with the chatterbot corpus package
  data_path = os.path.join(os.path.dirname(inspect.getfile(chatterbot_corpus)), 'data/english')

  # 2) Build a list of conversations (each file is a full conversation)
  conversations = []
  for file in os.listdir(data_path):
    convos = load(open(os.path.join(data_path, file), 'r'))
    conversations = conversations + convos['conversations']

  # 3) Build a dictionary of all the msg->[response] pairs in every conversation
  lookup = {}
  for convo in conversations:
    lookup[normalize_text(convo.pop(0))] = convo # Note we're now normalizing the dictionary key. We're keeping the responses in their original case, with punctuation.
  return lookup

lookup = load_conversations_from_corpus()


# While run is still True, loop through the rest of the script
while run:
  # Wait for the user to input text, and store it in the msg variable
  msg = input().lower()
  msg = remove_punctuation(msg)
  tokens = tokenize(msg)
  # Try to get a response from the corpus - this might be None
  corpus_response = choose_response(msg)
  # Give a response, based on the input (if we recognise it)
  if msg == 'exit':
    print('Goodbye!')
    # Set run to False, so the loop won't run again
    # This means we won't be trapped in an infinite loop
    run = False
  elif corpus_response:
    print(corpus_response)
  elif msg == 'hello':
    print('Hi!')
  elif msg == 'how are you':
    print('I\'m pretty good, thanks!')
  elif 'rain' in tokens:
    print('I love rain!')
  elif 'atomic number' in msg:
    found_element = False
    for token in tokens:
      if token in elements:
        print('The atomic number for {element} is {symbol}'.format(element=token, symbol=elements[token]))
        found_element = True
    if not found_element:
      print('You asked about an atomic number, but I don\'t recognise an element name in your message')
  # If the input doesn't match any of our statements, print a generic answer
  else:
    print('Sorry, I don\'t understand')

hello
Greetings!
how are you?
I am doing well.
what's a computer?
Sorry, I don't understand
what is a computer?
The thing you're using to talk to me is a computer.
how are you?
I am doing well.
great!
Sorry, I don't understand
do you enjoy the rain?
I love rain!
goodbye
Sorry, I don't understand
exit
Goodbye!


This lesson is less prescriptive than lesson 2, so the code might differ somewhat and still be valid.

Here, they're introduced to a corpus of historical conversations, and shown a way to build a very basic chatbot from this. 

They're re-introduced to the concept of cleaning up the user input to improve the hit-rate (making the bot case-insensitive), and are shown how they could use Python's random library to create a non-deterministic chatbot which doesn't always serve the same response to a question (by picking at random from a bank of multiple replies).



**TASK:**
* Modify your chatbot to give a randomized reply from this training data. 
* If the user's input isn't in the corpus, the bot should reply using your existing logic. 
* Ensure that your chemical symbol question still works.

There's only one task in this lesson - the students are given the freedom to implement a set of requirements in whatever way they can.

Below is an example of some steps they'll need to take for a sensible approach.

# Installation and Imports

Add imports (and install chatterbot-corpus) at the top of their notebook: 
```
# This command will install chatterbot-corpus, a library which contains a corpus of conversations in YAML format
# You can view these raw files in the chatterbot-corpus GitHub repo: https://github.com/gunthercox/chatterbot-corpus/tree/master/chatterbot_corpus/data/english
!pip install chatterbot-corpus

import chatterbot_corpus
from yaml import load
import inspect
import os
import random
```

This can be copy-pasted from the top two code cells of the lesson notebook. If the students are hitting any *NameError*s, double-check that they have all the imports. *import random* is introduced at the relevant point in the lesson

If they hit any *ModuleNotFoundError*s, confirm that they have the *pip install* line, to fetch the chatterbot_corpus before importing it.

# Loading the corpus

Students will need to load the corpus, in order to access it. To avoid the tedium of writing this themselves, they're expected to copy-paste .... from the lesson.

This function will need to be created *and* executed prior to the main while loop.

```
def normalize_text(msg):
  msg = msg.lower()
  symbols = ['?','-',',',':',';','!']
  for symbol in symbols:
    msg = msg.replace(symbol, '')
  return msg

# Create a dict of msg->response from the files in the corpus
def load_conversations_from_corpus():
  # 1) Get the location of the corpus YAML files installed with the chatterbot corpus package
  data_path = os.path.join(os.path.dirname(inspect.getfile(chatterbot_corpus)), 'data/english')

  # 2) Build a list of conversations (each file is a full conversation)
  conversations = []
  for file in os.listdir(data_path):
    convos = load(open(os.path.join(data_path, file), 'r'))
    conversations = conversations + convos['conversations']

  # 3) Build a dictionary of all the msg->[response] pairs in every conversation
  lookup = {}
  for convo in conversations:
    lookup[normalize_text(convo.pop(0))] = convo # Note we're now normalizing the dictionary key. We're keeping the responses in their original case, with punctuation.
  return lookup

lookup = load_conversations_from_corpus()
```

If they run into any issues, they should ensure that they've copied the **load_conversations_from_corpus()** method from towards the end of the lessont, not the **load_conversations_from_corpus_simple()** method (which doesn't clean up the text).

Note this method is dependent upon **normalize_text(msg)**, which they'll also need to copy from the lesson.

# Choosing a Response

As the task involves returning a random response based on the user's input, the students can borrow from the **choose_response(msg)** method in the lesson. 

They'll probably need to modify this slightly to tidily fit into their program. Below is one example of a modified version:

```
def choose_response(msg):
  try:

    # Fetch the list of possible responses
    options = lookup[normalize_text(msg)]
    # Return a randomly selected item from the list (using the Python random library)
    return random.choice(options)

  # Handle the case where the input isn't in the dictionary
  except KeyError:
    return None
```

The only modification is the final **return** in the KeyError case - this will return the Python special value *None* if there is no valid response in the corpus, rather than throwing an error. 

The original version returned specific text, but a *None* is easier/tidier to deal with within a program.

They could try to get a response (using choose_response) and store it in a variable before they enter their if,elif,else cases, then add a case for it.


```
  # Try to get a response from the corpus - this might be None
  corpus_response = choose_response(msg)
  # Give a response, based on the input (if we recognise it)
  if msg == 'exit':
    print('Goodbye!')
    # Set run to False, so the loop won't run again
    # This means we won't be trapped in an infinite loop
    run = False
  elif corpus_response:
    print(corpus_response)
  #elif ...(the rest of their existing cases)
```

*corpus_response* will evaluate to True if it has a value, and False if it's set to None (no response found in the corpus), so the *elif corpus_response* will only be entered if there's a response to print.