# Combining and comparing strings

You now have a few basic skills under your belt that are useful for any kind of programming project that involve language:

- You can show text on the screen using `print`.
- With `input` you can accept typed text from the user.
- Variables allow you to name certain pieces of information and retrieve them later on.

That is already enough for some simple programming projects, but it's clearly not enough to create the next [Cleverbot](http://www.cleverbot.com) or [Postmodernism Generator](http://www.elsewhere.org/journal/pomo/), let alone large-scale projects like [Google Translate](http://translate.google.com).

But at least for a chatbot, we actually aren't too far away from producing something that can compete with the likes of Cleverbot.
The most important tool for those is the ability to manipulate text, or as Python calls it, *strings*.

## What is a string?

All programs are based on text, since that's the easiest way for humans to construct files.
In principle, one could also design a programming language that revolves around drawing boxes and connecting them with lines.
In fact, the video game engine *Unreal 4* has something along those lines, called [blueprint visual scripting](https://docs.unrealengine.com/latest/INT/Engine/Blueprints/).
While these are often easier to use for novices, they are very clunky and scale badly.
Entering text is much faster, and that's why all programs are text.

But that does not mean that your program only operates with text.
For example, the Python code `x = 5` is just a sequence of the five text characters `x`, space, `=`, space, and `5`.
But Python instead treats the `5` as a number, rather than the character 5.
This makes a huge difference:

In [None]:
5 + 3

In [None]:
"5" + "3"

Don't worry too much about what the `+` here does, the crucial thing is that we get very different results for `5` and `"5"`.
That's because the quotation marks tell Python that we don't want 5 to be a number, but text.
Or as Python calls it, a *string*.
In Python, a string is anything that occurs between single quotation marks `'` or double quotation marks `"`.
The two behave more or less the same, and in this class we will always use `"` for strings.

Since we are mostly interested in working with written language, strings are of great importance to us.

## Putting strings together

In the last unit, we wrote a small chatbot.
As a reminder, here's the initial code:

In [None]:
# A very simplistic chatbot
print("Greetings and salutations! I am Bending Unit 22.")
print("What is your name?")
name = input()
print("Wow, really? I have a cousin whose name is also")
print(name)

One thing that is dissatisfying about this code is that the bot's answer is split across two lines in a very unnatural fashion.
That's because each print statement always goes on its own line.
So if we want the user's name to appear on the same line as the first part of the reply, they need be part of the same print statement.
But as you also know by now, the code below does not work.

In [None]:
print("Wow, really? I have a cousin whose name is also" name)

When you run this code, you get the error `SyntaxError: invalid syntax`.
So Python won't allow us to do this.
However, a minor variation of the above command does work.

In [None]:
print("Wow, really? I have a cousin whose name is also", name)

Can you spot the difference? It is very subtle: there is now a `,` between the string and the variable `name`.
Why is the comma so important?
Well, `print` is what is called a *function*.
That's a piece of code that takes one or more *arguments* as input and returns an output.
It's similar to the mathematical notion of function that you have probably learned about in high school, for instance $f(x) = x + 1$ or $f(x,y) = x^y$.
In these mathematical examples, $f$ is the *name* of the function, and $x$ and $y$ are its arguments.
For the `print` function, `print` is the name and its arguments are whatever appears between `(` and `)`, separated by a comma.
The `print` function combines all its arguments into a single string and then outputs this string.

[*Remark:* If you can't see the mathematical functions in this cell, make sure javascript isn't deactivated in your browser.]

**Exercise. **
Now that you know that `print` can take multiple arguments and output them as a single string, copy-paste the code for your chatbot from Unit 02 into the cell below and modify it so that it no longer produces weird linebreaks.

In [None]:
# put your revised chatbot code here

**Exercise. **
With string concatenation you can also play a round of *Mad Libs*.
If you aren't familiar with the game, here's how it works: you have a predetermined text where certain words have been replaced by their part of speech, for example *verb*, *noun*, *adjective*.
For each gap, you ask a friend to say a word of that part of speech.
You then put those words in the gap and read out the text aloud.
Ideally, hilarity ensues.

Here's an example adapted from the very first Mad Libs book:

~~~
"[exclamation]! he said [adverb] as he jumped into his convertible
[noun] and drove off with his [adjective] wife."

"Ron! he said better as he jumped into his convertible
Laniado and drove off with his irritating wife."
~~~

Write a program that allows the user to play a single round of Mad Libs with the computer.
So you should have some kind of prefabricated text, and a way of filling in the gaps with the words the user picked.

In [None]:
# put your Mad Libs code here

## Comparing strings

In Python, strings aren't just some unanalyzable object that you input, print to the screen, or save in a variable.
As we just saw, the `print` function can produce a new strings from its arguments, and quite generally strings are objects that can be altered and compared.

For example, we can ask Python whether two strings are the same.

In [None]:
"John" == "John"

In [None]:
"Mary" == "John"

In [None]:
john = "John"
john == "John"

In [None]:
"John" == 'John'

Here we are using the `==` operator to compare two expressions and see if they are the same value.
Note that this is two equals signs, rather than just one.
We use `=` to define variables, and `==` to test values for equality.
Never confuse the two, it will certainly result in errors that immediately make your program stop.

In [None]:
# a variable definition accidentally becomes an equality test
mary == "Mary"

**Exercise. **
You know the drill: experimentation time!
Play around with the `==` operator to figure out how it can and cannot be used.
As usual, try to think of less obvious cases, e.g. whether one can compare more than two expressions at the same time, how it works with variables, and so on.
Add at least 5 instructive comparisons to the cell below, and explain in short comments what each comparison demonstrates.

In [None]:
# put your 5+ comparisons here

Note that `==` can only produce two values; `True` if the equivalence holds, and `False` otherwise.
These two values are also called *Booleans*, named after [George Boole](https://en.wikipedia.org/wiki/George_Boole) (a brilliant 19th century mathematician, who had the misfortune of dying a rather funny death at the hands of his doting wife).
Many different Python operations produce Booleans as values.
For example, there is also `in`.

In [None]:
"Dan" in "Mary met Dan and Paul."

**Exercise. **
The next few cells contain a number of Python constructions with `in`.
Before you run the code, try to make an educated guess for each one as to whether it is well-formed or not, and if so, whether it evaluates to `True` or `False`.
Based on the outcome, formulate a hypothesis as to how `in` can be used and what it does.
Feel free to add your own code to test your hypothesis

In [None]:
"Dan" in "Daniel"

In [None]:
"Dan" in "Dan"

In [None]:
"Dan" in "Dandruff is a common problem for many Americans"

In [None]:
"Dan" in "Dandruff" in "Dandruff is a common problem for many Americans"

In [None]:
"Dan" in "Mary" in "Dandruff"

In [None]:
"Dan" in "D and E are letters of the alphabet"

In [None]:
"Dan" in "DaDaDaDan"

In [None]:
"the" in "The woman saw a man"

In [None]:
"the" in "The theory is wrong"

In [None]:
# you can put your own tests in this cell

*Write down your hypothesis in this cell*

**Exercise. **
There's two more operators that are very useful for comparing strings.
Those are `!=` and `not in`.
Fill the cell below with at least 5 examples for each operator to determine how they are used and what they do.
Then add your own description of each operator.

In [None]:
# put your 5+ examples for each operator in this cell

*Write down your descriptions of `!=` and `not in` in this cell*

You might be wondering at this point what these comparisons are good for.
The answer is "A million things!", but let's look at how we can use them to make our chatbots a bit more responsive.

## Testing user input with `if`

Let's look one more time at our very simple chatbot.

In [None]:
# A very simplistic chatbot
print("Greetings and salutations! I am Bending Unit 22.")
print("What is your name?")
name = input()
print("Wow, really? I have a cousin whose name is also", name)

This chatbot cannot really react to the user's input, it just repeats it.
This allows the user to shamelessly make fun of our poor chatbot.

In [None]:
print("What is your name?")
# the user gives a facetious reply
name = "freaking stupid"
print("Wow, really? I have a cousin whose name is also", name)

Wouldn't it be nice if we could show that smart alec who's the boss?
If we could somehow check if the user's reply contains undesirable words like *stupid*, then the chatbot could give an appropriately snarky reply.
Fortunately, that's exactly what string comparisons allow us to do.
All we need in addition is the `if` statement.

In [None]:
print("What is your name?")
# the user's reply
name = input()

# but this time we're prepared for shenanigans:
if "stupid" in name:
    # the user's reply contained "stupid",
    # so we retaliate
    print("Damn, man, that is one stupid name.")
    print("I'll just call you Smart Alec instead.")
    name = "Smart Alec"
    
# the program continues normally
print("Nice to meet you", name)

Run the program above twice, once with a name that contains *stupid*, and once with a regular name like *John* or *Adolph Blaine Charles David Earl Frederick Gerald Hubert Irvin John Kenneth Lloyd Martin Nero Oliver Paul Quincy Randolph Sherman Thomas Uncas Victor William Xerxes Yancy Wolfeschlegelsteinhausenbergerdorff* (no, I didn't make that one up).

As you can see, you always get the *Hi, nice to meet you {name}* part.
But if the name contains *stupid*, some additional lines of text get printed.
This is the result of using the `if` statement.
This tells Python that a portion of code should be run only if a specific condition is met --- in our case, only if the user wrote *stupid* in their reply.

**Exercise.**
Experiment with how the program changes if you change the indentation of lines.
Do you get errors?
Or does the program run but behave differently?
Based on your observations, try to formulate a hypothesis of how `if` statements work.

*put your hypothesis here*

An `if` statement is a way of making the program more flexible.
Our programs so far were like a long straight road with only one way from the start to the end.
The `if` statement introduces forks: "if you meet the condition, take this detour".
And just like a road may contain multiple forks, a program can contain multiple `if` statements.

In [None]:
print("What is your name?")
# the user's reply
name = input()

# but this time we're prepared for shenanigans:
if "stupid" in name:
    # the user's reply contained "stupid",
    # so we retaliate
    
    # did user actually say both stupid and asshole?
    # that son of a gun, we'll show him
    if "asshole" in name:
        print("You know what?")
        print("I don't want to talk to a rude person like you.")
        print("In fact, I don't want to live in a world with such rude people!")
        print("Farewell, my beloved human overlords!")
        # Chatbot prematurely ends its life with an error message
        raise Exception("Chatbot lost will to live")

    print("Damn, man, that is one stupid name.")
    print("I'll just call you Smart Alec instead.")
    name = "Smart Alec"
    
if "Nemo" in name:
    print("Ah, a classically trained user. Or a Pixar fan.")

# the program continues normally
print("Nice to meet you", name)

**Exercise. **
Carefully test the code above with various inputs.
Also try one that contains both *stupid* and *Nemo*.
Then draw a flowchart of the program's "road".
Here is an example for the simpler version with only one `if` statement.

```
greet user
|
get name
|
|--------------> name contains stupid
|                 |
|                mock user
|                 |
|                change name to Smart Alec
|                 |
|<----------------|
|
greet user with name
```

*put your flowchart here*

**Maintenance. **
Your Python vocabulary has more than doubled with this unit.
It has grown from `print`, `input`, and variables, to also include `==`, `!=`, `in`, `not in`, and the very important `if` statement.
Make sure you include all of them in your reference notebook, with detailed descriptions of their usage.