# Reusing Code with Functions

At this point, you could write a pretty impressive chatbot if you put in the necessary time to write several hundred lines of code that cover all kinds of possible user inputs.
If you do so, you will quickly notice that you have to copy-paste the same code over and over again for distinct cases, so if you later realize you have to change a piece of code, there may be dozens of copies where you have to make that change.
That's not a nice situation to be in, and this is why Python offers an alternative to copy-pasting code.

## Seeing the problem: a more complex chatbot

Suppose we now want to design a more elaborate chatbot.
It should be able to do multiple things:

1. If a user reply starts with *Can you* or *Do you*, the bot will either say *Let's not talk about me.* or reuse the user input for a suitable reply.
1. If a user reply starts with *Can I* or *May I*, the bot will either say *Of course!*, *Certainly.*, *No.*, or reuse the user input for a suitable reply.
1. In all other cases, a random answer is chosen.
1. Whenever input is reused, the bot switches first and second person pronouns and fixes the punctuation.

Let us develop this chatbot step by step, starting with just the random replies without user input.

In [None]:
# a simple chatbot that gives random replies

# we need the random module
import random

# define some random greetings
greetings = ["Hi!",
             "Hello!",
             "Greetings!",
             "How do you do?",
             "How's it going?"]

# define some random answers
answers = ["Very interesting.",
           "I do not understand.",
           "Can you tell me more about this?"]

# pick a random greeting
print(random.choice(greetings))

# tell the user how to quit
print("I am a very chatty chatbot.")
print("If you want me to stop talking to you, just say stop.")
print("Alright, your turn!")

# get the first user reply
reply = input()

# we start our infinite talking loop
while str.lower(reply) != "stop":
    print(random.choice(answers))
    reply = input()

# farewell message
print("It was nice talking to you.")

### Input normalization and regular conditions with `re.search`

Before we continue adding more functionality to this chatbot, there is already one thing we would like to fix, and that's how the user can make the chatbot stop.
Right now, the user has to enter *Stop* or *stop*, but *Stop.* or *stop!!!* won't work.
That's because we only normalize the user input with `str.lower`.
We could instead normalize the input with `re.sub`:

In [None]:
# only the relevant part of the chatbot is shown here

# now we also need the re module
import re

# get the first user reply
reply = input()
# normalize reply by rewriting every string that contains no word except stop as "stop"
normalized_reply = re.sub(r"(?i)^[^\w]*stop[^\w]*$", r"stop", reply)

# we start our infinite talking loop
while normalized_reply != "stop":
    print(random.choice(answers))
    reply = input()
    # normalize reply by rewriting every string that contains no word except stop as "stop"
    normalized_reply = re.sub(r"(?i)^[^\w]*stop[^\w]*$", r"stop", reply)

This regular expression is a little complicated, so let's go through it step by step:

1. `(?i)`: case-insensitive matching
1. `^`: beginning of the string
1. `[^\w]`: any character that is **not** a word character; `\w` is a shorthand for `[A-Za-z0-9_]`
1.  `*`: 0 or more times
1.  `stop`: matches *stop* and any capitalization variant thereof
1.  `[^\w]*`: 0 or more instances of any characters that are not word characters
1.  `$`: end of string

Ignoring capitalization, this regular expression matches any string of the form *XstopY* where *X* and *Y* are arbitrary sequences of spaces, punctuation symbols, and any other characters that are not word characters.
So the user could type *@/,   sToP ???--* and the chatbot would still quit.

**Exercise.**
In the lecture slides on regular expression we saw a more succinct regex for *not a word character*.
Adapt the regular expressions in the code below accordingly (it's the same code as in the previous cell).

In [None]:
# only the relevant part of the chatbot is shown here

# now we also need the re module
import re

# get the first user reply
reply = input()
# normalize reply by rewriting every string that contains no word except stop as "stop"
normalized_reply = re.sub(r"(?i)^[^\w]*stop[^\w]*$", r"stop", reply)

# we start our infinite talking loop
while normalized_reply != "stop":
    print(random.choice(answers))
    reply = input()
    # normalize reply by rewriting every string that contains no word except stop as "stop"
    normalized_reply = re.sub(r"(?i)^[^\w]*stop[^\w]*$", r"stop", reply)

**Exercise.**
Let's practice the use of \w for word characters and ^ for negation.
The code cell below contains several instances of `re.sub` using these symbols.
You can run the code to see what output each line produces.
Then add comments to the code that explain step by step how each regular expression works, similar to the list in the preceding text.

In [1]:
import re

# define an input string for testing purposes
test = "John and I --- who have been friends for a long time --- haven't talked to each other for a while, don't want to, and won't ever again."

# Print output of each substitution, with additional empty line at the end
print("Output 1:", re.sub(r"\w+", r"(word)", test), "\n")
print("Output 2:", re.sub(r"[^\w]+", r"(non-word)", test), "\n")
print("Output 3:", re.sub(r"[^\w ]+", r"(non-word/non-space)", test), "\n")
print("Output 4:", re.sub(r"\w+ \w+", r"(two words w/space)", test), "\n")
print("Output 5:", re.sub(r"\w+[^\w']+\w+", r"(two words w/misc)", test), "\n")

Output 1: (word) (word) (word) --- (word) (word) (word) (word) (word) (word) (word) (word) --- (word)'(word) (word) (word) (word) (word) (word) (word) (word), (word)'(word) (word) (word), (word) (word)'(word) (word) (word). 

Output 2: John(non-word)and(non-word)I(non-word)who(non-word)have(non-word)been(non-word)friends(non-word)for(non-word)a(non-word)long(non-word)time(non-word)haven(non-word)t(non-word)talked(non-word)to(non-word)each(non-word)other(non-word)for(non-word)a(non-word)while(non-word)don(non-word)t(non-word)want(non-word)to(non-word)and(non-word)won(non-word)t(non-word)ever(non-word)again(non-word) 

Output 3: John and I (non-word/non-space) who have been friends for a long time (non-word/non-space) haven(non-word/non-space)t talked to each other for a while(non-word/non-space) don(non-word/non-space)t want to(non-word/non-space) and won(non-word/non-space)t ever again(non-word/non-space) 

Output 4: (two words w/space) I --- (two words w/space) (two words w/space) (tw

**Exercise.**
Now let's do the opposite: the comments in the code tell you what the output should be, and you have to put in the correct regular expression.

In [None]:
import re

# define an input string for testing purposes
test = "John and I --- who have been friends for a long time --- haven't talked to each other for a while, don't want to, and won't ever again."

# Print output of each substitution, with additional empty line at the end
# Output 1: if a word contains an uppercase letter, replace the word by "(uppercase)"; do not use an if-else for this!
print("Output 1:", re.sub(r"", r"", test), "\n")
# Output 2: replace any letter except a by z; spaces and punctuation symbols (-;,!?.) should not be changed, either
print("Output 2:", re.sub(r"", r"", test), "\n")
# Output 3: replace all sequences of characters between spaces that do not contain a, b, or c by " (not abc) "
# (careful here, make sure you don't match too much;
#  if your output starts with "John and (not abc) have", you've matched too many words at once)
print("Output 3:", re.sub(r"", r"", test), "\n")

# hint: another way of saying this is "replace every instance of xUy by (uppercase),
# where x and y are sequences of 0 or more word characters and U is an uppercase letter."

*Hints for the exercise*:
- Another way of phrasing the condition for Output 1 is the following: replace every instance of `xUy` by `(uppercase)`, where `x` and `y` are sequences of 0 or more word characters and `U` is an uppercase letter.
- For Output 2, focus on what characters should not be changed into `z`.
- In Output 3, it is important that you get the shortest possible match. In *John or Bill left*, the first match should be *or*, not *or Bill*. You can do this by making sure that matches cannot contain spaces.

While the code for our chatbot is functional, it is also redundant.
We have one line for normalizing the reply outside the `while` loop, which is for the very first user reply.
But inside the `while` loop the exact same line reoccurs to normalize later inputs.
Duplicating code like this is usually a sign of bad design.
If it turns out later on that the regular expression is not quite right, we have to fix it in two places.
In a more complex program, it's easy to forget where the changes have to be made, so we might end up with an inconsistent program where inputs are sometimes normalized in one way and sometimes in another.

One way to fix this is to store the regular expression as a variable.

In [None]:
# only the relevant part of the chatbot is shown here

# now we also need the re module
import re

# define a normalization pattern
normalization = r"(?i)^[^\w]*stop[^\w]*$"

# get the first user reply
reply = input()
# normalize reply by rewriting every string that contains no word except stop as "stop"
normalized_reply = re.sub(normalization, r"stop", reply)

# we start our infinite talking loop
while normalized_reply != "stop":
    print(random.choice(answers))
    reply = input()
    # normalize reply by rewriting every string that contains no word except stop as "stop"
    normalized_reply = re.sub(normalization, r"stop", reply)

This code works exactly as before, except that now we have a single place where we can change the normalization pattern.
However, we still have to define `normalized_reply` twice.
And when you think about it, even defining this variable complicates our code because it is not immediately apparent why we define this variable anyways.
All we want to do is to have a more powerful string matching condition for the `while` loop, and our way of accomplishing this is to first normalize the string so that it can be used with the simple condition `!= "stop"`.
A more direct solution would be to just use the regular operation directly in the condition.

In [None]:
# only the relevant part of the chatbot is shown here

# now we also need the re module
import re

# get the first user reply
reply = input()

# we start our infinite talking loop,
# which now contains a regular substitution as part of the condition
while re.sub(r"(?i)^[^\w]*stop[^\w]*$", r"stop", reply) != "stop":
    print(random.choice(answers))
    reply = input()

Nice, this code is much shorter and easier to understand than what we had before!
But we can still make it a smidgen more elegant.
Right now, we rewrite strings according some regular expression *R* and then check whether the output does or does not match a given string.
This is a very roundabout way of checking whether a string matches *R*.
Instead, we can just use `re.search(R, string)` to check if *string* matches the regular expression *R*.

In [None]:
# only the relevant part of the chatbot is shown here

# now we also need the re module
import re

# get the first user reply
reply = input()

# we start our infinite talking loop,
# as long as the reply does not match the search pattern
while not re.search(r"(?i)^[^\w]*stop[^\w]*$", reply):
    print(random.choice(answers))
    reply = input()

Now the `while` loop is run only as long as `reply` does **not** match the pattern `r"(?i)^[^\w]*stop[^\w]*$"`.

**Exercise.**
Write a program that asks the user who their favorite actor is.
Use `re.search` to check if the answer contains *Arnold*, possibly followed by *Schwarzenegger*, but not followed by any other last name and not preceded by any other word.
If so, the program says *Hasta la vista, baby!*.
Otherwise, is just says *Lame!*.

*Examples*
- `Arnold -> Hasta la vista, baby`
- ` arnold   schwarzenegger !!!  -> Hasta la vista, baby`
- `!!!Arnold SchwarzenEgger -> Hasta la vista, baby`
- `Tom Arnold -> Lame!`
- `Arnold Weizenstecker -> Lame!`

*Hints*:
- Remember that you can do case-insensitive matching.
- Any sequence of characters before *Arnold* is fine as long as they are not word characters.
- Any sequence of characters at the end is fine as long as they are not word characters.
- Remember the special use of ? for optionality.

In [None]:
# put your code here

### Adding context-dependent replies

With the help of `re.search`, we can also add some `if` statements to the chatbot to produce more tailored replies for certain inputs.
Remember, these were our design goals for the chatbot:

1. If a user reply starts with *Can you* or *Do you*, the bot will either say *Let's not talk about me.* or reuse the user input for a suitable reply.
1. If a user reply starts with *Can I* or *May I*, the bot will either say *Of course!*, *Certainly.*, *No.*, or reuse the user input for a suitable reply.
1. In all other cases, a random answer is chosen.
1. Whenever input is reused, the bot switches first and second person pronouns and fixes the punctuation.

We already have the random answer part covered, now let's tackle the first two points.
First, we put the general structure in place with `if`, `else`, and `re.search`.

In [None]:
# only the relevant part of the chatbot is shown here

# now we also need the re module
import re

# get the first user reply
reply = input()

# we start our infinite talking loop,
# as long as the reply does not match the search pattern
while not re.search(r"(?i)^[^\w]*stop[^\w]*$", reply):
    # does the reply start with "can you" or "do you "?
    if re.search(r"(?i)^(can|do) you ", reply):
        print("Let's not talk about me.")
    # if not, we have other constructions
    else:
        # does the reply start with "can I " or "may I "?
        if re.search(r"(?i)^(can|may) I ", reply):
            print(random.choice(["Of course!", "Certainly.", "No."]))
        # we didn't match anything in the input, just give a random reply
        else:
            print(random.choice(answers))
    reply = input()

The code above already has all the conditions in place that are needed for the chatbot to provide specific answers to inputs that match certain patterns.
But it only replies certain pre-canned replies for those special cases, it still does not reuse the user input.
Let's add at least a basic solution for the first case, i.e. strings that start with *can you* or *do you*.

In [None]:
# only the relevant part of the chatbot is shown here

# now we also need the re module
import re

# get the first user reply
reply = input()

# we start our infinite talking loop,
# as long as the reply does not match the search pattern
while not re.search(r"(?i)^[^\w]*stop[^\w]*$", reply):
    # does the reply start with "can you" or "do you "?
    # then we give a pretty sophisticated reply now
    if re.search(r"(?i)^(can|do) you ", reply):
        # define some possible starts for dynamically generated replies
        starts = ["You don't believe that I can",
                  "Do you want me to be able to"]
        # from reply of the form "can/do you [sentence] [end of sentence] [other stuff]" produce "[sentence]?"
        reply = re.sub(r"(?i)^(can|do) you ([^\.\?!]*).*", r"\2?", reply)
        # combine reply with a randomly chosen start
        bot_reply = random.choice(starts) + " " + reply
        # print either a precanned response or the bot_reply
        print(random.choice(["Let's not talk about me.",
                             bot_reply]))
    # if not, we have other constructions
    # nothing has changed here
    else:
        # does the reply start with "can I " or "may I "?
        if re.search(r"(?i)^(can|may) I ", reply):
            print(random.choice(["Of course!", "Certainly.", "No."]))
        # we didn't match anything in the input, just give a random reply
        else:
            print(random.choice(answers))
    reply = input()

The only change compared to the previous version of the bot is that now we also reuse parts of the input whenever the reply starts with *can you* or *do you*.
Compared to the solution we saw in an earlier unit, this code does many things in a single instance of `re.sub` by using a backreference.
The regular expression works as follows:

1. `(?i)`: case-insensitive matching
1. `^`: match beginning of string
1.  `(can|do)`: match *can* or *do*; since this is bracketed, it is also group 1 for backreferences
1.  `you`: match *you*
1.  `(`: begin the second group for backreferences
1.  `[^\.\?!]*`: zero or mor instances of any character that is not `.`, `?`, or `!`
1.  `)`: end of second group for backreferences
1.  `.*`: 0 or more instances of any characters; this matches whatever appears after the first sentence-ending punctuation

By rewriting this as `\2` we instruct Python to keep only the part between *can you* or *do you* and the first sentence-ending punctuation symbol.
Everything else is thrown away.
We also take care of our own punctuation by rewriting the string as `\2?` instead of `\2`.

Alright, now let us do the same for the second case, where we match *can I* and *may I*.

In [None]:
# only the relevant part of the chatbot is shown here

# now we also need the re module
import re

# get the first user reply
reply = input()

# we start our infinite talking loop,
# as long as the reply does not match the search pattern
while not re.search(r"(?i)^[^\w]*stop[^\w]*$", reply):
    # does the reply start with "can you" or "do you "?
    # then we give a pretty sophisticated reply now
    if re.search(r"(?i)^(can|do) you ", reply):
        # define some possible starts for dynamically generated replies
        starts = ["You don't believe that I can",
                  "Do you want me to be able to"]
        # from reply of the form "can/do you [sentence] [end of sentence] [other stuff]" produce "[sentence]?"
        reply = re.sub(r"(?i)^(can|do) you ([^\.\?!]*).*", r"\2?", reply)
        # combine reply with a randomly chosen start
        bot_reply = random.choice(starts) + " " + reply
        # print either a precanned response or the bot_reply
        print(random.choice(["Let's not talk about me.",
                             bot_reply]))
    # if not, we have other constructions
    else:
        # does the reply start with "can I " or "may I "?
        if re.search(r"(?i)^(can|may) I ", reply):
            # again we define some possible starts for dynamically generated replies
            starts = ["Why do you want to",
                      "Why are you asking for permission to"]
            # from reply of the form "can/may I [sentence] [end of sentence] [other stuff]" produce "[sentence]"
            reply = re.sub(r"(?i)^(can|may) I ([^\.\?!]*).*", r"\2?", reply)
            # combine reply with a randomly chosen start
            bot_reply = random.choice(starts) + " " + reply
            print(random.choice(["Of course!", "Certainly.", "No.",
                                 bot_reply]))
        # we didn't match anything in the input, just give a random reply
        else:
            print(random.choice(answers))
    reply = input()

Oh boy, our code is certainly getting long and unwieldy.
And if you look at the *can you* case and compare it to the *can I* case, you can see that the two are almost exactly the same.
We are replicating a lot of code verbatim without any changes.
The only meaningful difference is in how we define `starts`, the final selection of random choices, and whether the regular expression contains `(can|do) you` or `(can|may) I`.
We even repeat many of the comments without any changes.
If we can find a way to reuse the code in a smarter way than just copy-pasting it with minor differences, the code will become much more readable.
Python offers *custom functions* for exactly this purpose.

## Custom functions

You already know a fair amount of Python functions:

- `print`
- `input`
- `list.append`
- `str.lower`
- `str.upper`
- `str.title`
- `re.sub`
- `re.search`

But those are all built-in functions.
Python also allows programmers to define their own custom functions.
Once defined, a custom function behaves like a built-in one:

1. It has a specific name.
1. It must be followed by `(` and `)`, with 0 or more arguments between the parentheses.
1. How many arguments a function allows depends on how the programmer defined it.

### Some simple examples

Before we use functions to clean up the chatbot code above, let us start with a very simple example to get a better idea of how functions are defined.

In [None]:
def greeting(name):
    string = "Nice to meet you, " + name + "!"
    return string

print(greeting("Mary"))
print(greeting("Gandalf the Gray"))
print(greeting("well this is a weird sentence"))

The code above defines a function `greeting`, which takes a single argument.
Within the function, we can refer to this argument as the variable `name`.
Whatever string the function gets as its argument, it will combine it with the strings `"Nice to meet you, "` and `"!"`.
The result of that concatenation is returned as the output of the function.

The general format for defining a function is as follows:

```def name_of_function(argument1, argument2, ..., last_argument):
        # some Python code of your choice
        return output_of_the_function
```

Here is a slightly more complex example.

In [None]:
def conditional_greeting(user_name, chatbot_name):
    if user_name == chatbot_name:
        return "Hey, my name is also " + user_name + "!"
    else:
        return "Nice to meet you, " + user_name + "! My name is " + chatbot_name + "."
    
print(conditional_greeting("Mary", "Mary"))
print(conditional_greeting("John", "Mary"))

Custom functions are like a more powerful version of variables.
Whereas variables store certain values, functions store specific pieces of code.
So we can use functions to assign names to certain computations and then call them later on with that name.
In fact, we can even call a function from within another function.

In [None]:
def same_name(name):
    return "Hey, my name is also " + name + "!"

def different_name(name1, name2):
    return "Nice to meet you, " + name1 + "! My name is " + name2 + "."

def conditional_greeting(user_name, chatbot_name):
    if user_name == chatbot_name:
        return same_name(user_name)
    else:
        return different_name(user_name, chatbot_name)
    
print(conditional_greeting("Mary", "Mary"))
print(conditional_greeting("John", "Mary"))

Here's what happens when we call `conditional_greeting("Mary", "Mary")`.
We run the code for the function, which starts with `if user_name == chatbot_name`, which in our case reduces to `if "Mary" == "Mary"`.
Since this condition is satisfied, we move on to the code in the scope of this if statement: `return same_name(user_name)`.
This tells us that the value returned by `conditional_greeting("Mary", "Mary")` is whatever is returned by `same_name("Mary")`.
So now we have to compute the output of `same_name("Mary")`.
But this is simple: the output is `"Hey, my name is also " + "Mary" + "!"`, which is the same as `"Hey, my name is also Mary!"`.
And this is therefore the output of `conditional_greeting("Mary", "Mary")`.

**Exercise.**
Explain how the output of `conditional_greeting("John", "Mary")` is computed in a step-by-step fashion.

*put your explanation here*

And this is all you need to know to get going with functions.
There's some additional bells and whistles that are convenient once in a while, for instance specifying default values for arguments.
We will introduce those later down the road if they should happen to suit our needs.
For now, let's get back to our chatbot.

### Factorizing the chatbot code with functions

For the sake of reference, the full chatbot code is shown belong.
This is quite a bit longer than anything we have written so far, so take your time to fully absorb the code.
Make sure you fully understand each line and the overal structure of the code.

In [None]:
# a simple chatbot that gives random replies

# we need the random module
import random

# define some random greetings
greetings = ["Hi!",
             "Hello!",
             "Greetings!",
             "How do you do?",
             "How's it going?"]

# define some random answers
answers = ["Very interesting.",
           "I do not understand.",
           "Can you tell me more about this?"]

# pick a random greeting
print(random.choice(greetings))

# tell the user how to quit
print("I am a very chatty chatbot.")
print("If you want me to stop talking to you, just say stop.")
print("Alright, your turn!")

# now we also need the re module
import re

# get the first user reply
reply = input()

# we start our infinite talking loop,
# as long as the reply does not match the search pattern
while not re.search(r"(?i)^[^\w]*stop[^\w]*$", reply):
    # does the reply start with "can you" or "do you "?
    # then we give a pretty sophisticated reply now
    if re.search(r"(?i)^(can|do) you ", reply):
        # define some possible starts for dynamically generated replies
        starts = ["You don't believe that I can",
                  "Do you want me to be able to"]
        # from reply of the form "can/do you [sentence] [end of sentence] [other stuff]" produce "[sentence]?"
        reply = re.sub(r"(?i)^(can|do) you ([^\.\?!]*).*", r"\2?", reply)
        # combine reply with a randomly chosen start
        bot_reply = random.choice(starts) + " " + reply
        # print either a precanned response or the bot_reply
        print(random.choice(["Let's not talk about me.",
                             bot_reply]))
    # if not, we have other constructions
    else:
        # does the reply start with "can I " or "may I "?
        if re.search(r"(?i)^(can|may) I ", reply):
            # again we define some possible starts for dynamically generated replies
            starts = ["Why do you want to",
                      "Why are you asking for permission to"]
            # from reply of the form "can/may I [sentence] [end of sentence] [other stuff]" produce "[sentence]"
            reply = re.sub(r"(?i)^(can|may) I ([^\.\?!]*).*", r"\2?", reply)
            # combine reply with a randomly chosen start
            bot_reply = random.choice(starts) + " " + reply
            print(random.choice(["Of course!", "Certainly.", "No.",
                                 bot_reply]))
        # we didn't match anything in the input, just give a random reply
        else:
            print(random.choice(answers))
    reply = input()

We will now make this code more readable by defining two functions, one for the *can you* case, the other one for the *can I* case.
Let us look at each function in isolation first.

In [None]:
def can_you_reply(reply):
    # define some possible starts for dynamically generated replies
    starts = ["You don't believe that I can",
              "Do you want me to be able to"]
    # from reply of the form "can/do you [sentence] [end of sentence] [other stuff]" produce "[sentence]?"
    reply = re.sub(r"(?i)^(can|do) you ([^\.\?!]*).*", r"\2?", reply)
    # combine reply with a randomly chosen start
    bot_reply = random.choice(starts) + " " + reply
    # print either a precanned response or the bot_reply
    return random.choice(["Let's not talk about me.",
                          bot_reply])

def can_I_reply(reply):
    # define some possible starts for dynamically generated replies
    starts = ["Why do you want to",
              "Why are you asking for permission to"]
    # from reply of the form "can/may I [sentence] [end of sentence] [other stuff]" produce "[sentence]"
    reply = re.sub(r"(?i)^(can|may) I ([^\.\?!]*).*", r"\2?", reply)
    # combine reply with a randomly chosen start
    bot_reply = random.choice(starts) + " " + reply
    return random.choice(["Of course!", "Certainly.", "No.",
                          bot_reply])

As you can see, those functions are no different from the code we had above for the distinct cases.
So we haven't really enhanced our program in any way.
But now we can write the code in a way that is slightly easier to read.

In [None]:
# a simple chatbot that gives random replies

# we need the random module and the re module
import random
import re


# define some random greetings
greetings = ["Hi!",
             "Hello!",
             "Greetings!",
             "How do you do?",
             "How's it going?"]

# define some random answers
answers = ["Very interesting.",
           "I do not understand.",
           "Can you tell me more about this?"]

# define some custom functions
def can_you_reply(reply):
    # define some possible starts for dynamically generated replies
    starts = ["You don't believe that I can",
              "Do you want me to be able to"]
    # from reply of the form "can/do you [sentence] [end of sentence] [other stuff]" produce "[sentence]?"
    reply = re.sub(r"(?i)^(can|do) you ([^\.\?!]*).*", r"\2?", reply)
    # combine reply with a randomly chosen start
    bot_reply = random.choice(starts) + " " + reply
    # print either a precanned response or the bot_reply
    return random.choice(["Let's not talk about me.",
                          bot_reply])

def can_I_reply(reply):
    # define some possible starts for dynamically generated replies
    starts = ["Why do you want to",
              "Why are you asking for permission to"]
    # from reply of the form "can/may I [sentence] [end of sentence] [other stuff]" produce "[sentence]"
    reply = re.sub(r"(?i)^(can|may) I ([^\.\?!]*).*", r"\2?", reply)
    # combine reply with a randomly chosen start
    bot_reply = random.choice(starts) + " " + reply
    return random.choice(["Of course!", "Certainly.", "No.",
                          bot_reply])

###################################
# the chatbot starts talking here #
###################################

# pick a random greeting
print(random.choice(greetings))

# tell the user how to quit
print("I am a very chatty chatbot.")
print("If you want me to stop talking to you, just say stop.")
print("Alright, your turn!")


# get the first user reply
reply = input()

# we start our infinite talking loop,
# as long as the reply does not match the search pattern
while not re.search(r"(?i)^[^\w]*stop[^\w]*$", reply):
    # does the reply start with "can you" or "do you "?
    # then we give a pretty sophisticated reply now
    if re.search(r"(?i)^(can|do) you ", reply):
        print(can_you_reply(reply))
    # if not, we have other constructions
    else:
        # does the reply start with "can I " or "may I "?
        if re.search(r"(?i)^(can|may) I ", reply):
            print(can_I_reply(reply))
        # we didn't match anything in the input, just give a random reply
        else:
            print(random.choice(answers))
    reply = input()

This code is much nicer because it is now easy to see how the chatbot moves through the distinct cases based on certain matches.
But we can still tighten it up a bit because the two functions `can_you_reply` and `can_I_reply` share a lot of code.
This means that we can break them up into subfunctions, too.

In [None]:
# define some custom functions
def produce_reply(starts, reply):
    # only keep part of reply until first punctuation symbol
    reply = re.sub(r"(?i)^([^\.\?!]*).*", r"\1?", reply)
    bot_reply = random.choice(starts) + " " + reply
    return bot_reply
    
def can_you_reply(reply):
    # define some possible starts for dynamically generated replies
    starts = ["You don't believe that I can",
              "Do you want me to be able to"]
    # delete sentence-initial "can|do you "
    reply = re.sub(r"(?i)^(can|do) you ", r"", reply)
    # print either a precanned response or a dynamic reply
    return random.choice(["Let's not talk about me.",
                          produce_reply(starts, reply)])

def can_I_reply(reply):
    # define some possible starts for dynamically generated replies
    starts = ["Why do you want to",
              "Why are you asking for permission to"]
    # delete sentence-initial "can|do I "
    reply = re.sub(r"(?i)^(can|may) I ", r"", reply)
    # print either a precanned response or a dynamic reply
    return random.choice(["Of course!", "Certainly.", "No.",
                          produce_reply(starts, reply)])

Here we define a general purpose function `produce_reply` that takes two arguments, a list of strings for possible starts and a single string, called `reply` here.
It throws away everything from `reply` except the part that occurs before the first sentence-ending punctuation symbol.
Then it combines a randomly chosen start with the remainder of `reply`.
The functions `can_you_reply` and `can_I_reply` is now reduced to defining a list `starts` of possible starts and deleting the initial *can|do you* or *can|do I* from `reply`.
At the end, they randomly choose between precanned responses or the output of `produce_reply(starts, reply)`.

Here is what the full chatbot code looks like now.
Notice that we did not change anything below the line `the chatbot starts talking here`.
This is one of the great advantages of functions: we can hammer out a general structure for where certain steps in a program take place, and then we can define and modify the respective function for each step somewhere else without changing the overall structure of the program.

In [None]:
# a simple chatbot that gives random replies

# we need the random module and the re module
import random
import re


# define some random greetings
greetings = ["Hi!",
             "Hello!",
             "Greetings!",
             "How do you do?",
             "How's it going?"]

# define some random answers
answers = ["Very interesting.",
           "I do not understand.",
           "Can you tell me more about this?"]

# define some custom functions
def produce_reply(starts, reply):
    # only keep part of reply until first punctuation symbol
    reply = re.sub(r"(?i)^([^\.\?!]*).*", r"\1?", reply)
    bot_reply = random.choice(starts) + " " + reply
    return bot_reply
    
def can_you_reply(reply):
    # define some possible starts for dynamically generated replies
    starts = ["You don't believe that I can",
              "Do you want me to be able to"]
    # delete sentence-initial "can|do you "
    reply = re.sub(r"(?i)^(can|do) you ", r"", reply)
    # print either a precanned response or a dynamic reply
    return random.choice(["Let's not talk about me.",
                          produce_reply(starts, reply)])

def can_I_reply(reply):
    # define some possible starts for dynamically generated replies
    starts = ["Why do you want to",
              "Why are you asking for permission to"]
    # delete sentence-initial "can|do I "
    reply = re.sub(r"(?i)^(can|may) I ", r"", reply)
    # print either a precanned response or a dynamic reply
    return random.choice(["Of course!", "Certainly.", "No.",
                          produce_reply(starts, reply)])

###################################
# the chatbot starts talking here #
###################################

# pick a random greeting
print(random.choice(greetings))

# tell the user how to quit
print("I am a very chatty chatbot.")
print("If you want me to stop talking to you, just say stop.")
print("Alright, your turn!")


# get the first user reply
reply = input()

# we start our infinite talking loop,
# as long as the reply does not match the search pattern
while not re.search(r"(?i)^[^\w]*stop[^\w]*$", reply):
    # does the reply start with "can you" or "do you "?
    # then we give a pretty sophisticated reply now
    if re.search(r"(?i)^(can|do) you ", reply):
        print(can_you_reply(reply))
    # if not, we have other constructions
    else:
        # does the reply start with "can I " or "may I "?
        if re.search(r"(?i)^(can|may) I ", reply):
            print(can_I_reply(reply))
        # we didn't match anything in the input, just give a random reply
        else:
            print(random.choice(answers))
    reply = input()

**Exercise.**
The current code satisfies almost all of our initial design goals, except that pronouns do not get replaced.
The code fragment below gives you a starting point for how to do this.
Complete the definition of the function `substitute_pronouns` that carries out these substitutions with regular expressions.
It is alright to recycle code from an earlier unit, the important thing is that at the end `substitute_pronouns` correctly exchanges first and second person pronouns.
Once you have the function working the way you want, call it at the appropriate place in `produce_reply`.

In [None]:
# define some custom functions
def substitute_pronouns(string):
    # some magic happens here
    return string

# careful: don't forget to call the function somewhere below

def produce_reply(starts, reply):
    # only keep part of reply until first punctuation symbol
    reply = re.sub(r"(?i)^([^\.\?!]*).*", r"\1?", reply)
    bot_reply = random.choice(starts) + " " + reply
    return bot_reply
    
def can_you_reply(reply):
    # define some possible starts for dynamically generated replies
    starts = ["You don't believe that I can",
              "Do you want me to be able to"]
    # delete sentence-initial "can|do you "
    reply = re.sub(r"(?i)^(can|do) you ", r"", reply)
    # print either a precanned response or a dynamic reply
    return random.choice(["Let's not talk about me.",
                          produce_reply(starts, reply)])

def can_I_reply(reply):
    # define some possible starts for dynamically generated replies
    starts = ["Why do you want to",
              "Why are you asking for permission to"]
    # delete sentence-initial "can|do I "
    reply = re.sub(r"(?i)^(can|may) I ", r"", reply)
    # print either a precanned response or a dynamic reply
    return random.choice(["Of course!", "Certainly.", "No.",
                          produce_reply(starts, reply)])

That's it for this unit.
Congrats for making it this far!
The code in this unit is more challenging because this is the first time we move beyond toy examples and instead look at something that is structured more like an actual programming.
When you come across Python code in the wild, you will probably be intimidated at first because there may be several dozen lines of code, if not hundreds or thousands, and they combine all the tricks we know: variables, if, while, complex conditionals, commands from Python modules, and custom functions.
The important thing is not to get discouraged.
Sure, figuring out a program with 100 lines takes more concentration than one with 10 lines, but it isn't impossible.
Just like running 3 miles is harder than running 1 mile, neither one is particularly challenging once you've been working out for a few weeks.
The code above may look intimidating, but with a few more weeks of practice you'll breeze right through it.
Just keep at it!