# Week 4: Conditionals, Iteration, and Counting Types

This week we will focus on control flow.  When we run a cell in a Jupyter Notebook (or a full Python program), the system evaluates and runs one line at a time in order just like you would following a recipe for making a cake. But there are some control flow features that allow us to do more sophisticated things.

### Part 1: Conditionals

We often have to make decisions on what to do next based on some *condition*. By *condition* we typically mean the values of one or more variables. Here, we learn all about `if`, `elif`, `else`, and indentation. 

### Part 2: Loops and Iteration

We also frequently want to do the same operation multiple times, or to each element in a collection. We will learn all about `for` loops and `while` loops, and learn how to embed conditionals *inside* loops.


### Part 3: Using Loops and Iteration to Calculate Types

Now, we can put together what we learned about conditionals and iteration, plus some new operations on lists (`in` and `list.append()`) in order to calculate the number of unique words in a text.

### Links

You may find these sections of Melanie Walsh's textbook useful:
* [Python Comparisons and Conditionals](https://melaniewalsh.github.io/Intro-Cultural-Analytics/02-Python/08-Comparisons-Conditionals.html)
* [Python Links and Loops](https://melaniewalsh.github.io/Intro-Cultural-Analytics/02-Python/09-Lists-Loops-Part1.html)



# 1. Conditionals

## Revisiting comparisons

Two whole weeks ago (how the time flies!), you may recall that we mets lot of **operators**.

* `==`: equal to
* `!=`: not equal to
* `>`: greater than
* `>=`: greather than or equal to
* `<`: less than
* `<=`: less than or equal too

We used these operators to make **comparisons**. For instance, 


In [9]:
name = "Karen"
distance = 63.5

In [10]:
name == "Karen"

True

In [11]:
distance < 100

True

Let's add one further layer of complexity to comparisons with the logical operators `and`, `or`, and `not`.

| **Logical Operator** | **Explanation**                                                                                   |
|:-------------:|:---------------------------------------------------------------------------------------------------:|
| `x and y`         | `True` if x and y are both True                                                                             |
| `x or y`         | `True` if either x or y is True                                              |
| `not x` | `True` if x is not True

In [12]:
# Try changing the values of these variables and then rerunning the cells 
# below to see how the results change.
sugar = False
cream = True

In [13]:
sugar == True and cream == True

False

In [14]:
sugar == True or cream == True

True

Note that Python's `or` is not to be confused with "the exclusive or". 

`Or` returns `True` if EITHER (or BOTH) of the conditions is true.

The "exclusive or" returns `True` ONLY if ONE but **not BOTH** of the conditions is true.

* **`or`**: "Do you take sugar or cream in your coffee?" You can choose one, the other, or both.
* **"exclusive or"**: "Would you like fries or a salad with your burger?" You're being asked to choose one or the other, not both. 

The `or` we're talking about today is the "sugar or cream" `or`; **not** the "fries or salad" "exclusive or."

The following "truth table" shows the values of the experessions for each value of x and y. In the cells above, try all of the variations of True and False for sugar and cream to see if the results match this table.

| **`x`** | **`y`** | **`x and y`** | **`x or y`**
|:-------------:|:-------------:|:-------------:|:---------------------------------------------------------------------------------------------------:|
| `True`        |   `True` | `True` | `True`                                  |
| `True`        |   `False` | `False` | `True`                                  |
| `False`        |   `True` | `False` | `True`                                  |
| `False`        |   `False` | `False` | `False`                                  |


Try some examples of more interesting expressions to solidify your understanding.  Before you run each line of code, decide in your head what the result should be to check your understanding.

In [15]:
distance < 100 and name == "Karen"

True

In [16]:
distance < 100 or name == "Karen"

True

In [17]:
distance < 100 and name == "Nat"

False

In [18]:
distance < 100 or name == "Nat"

True

In [19]:
distance > 80 or name == "Nat"

False

In [20]:
distance > 80 or name == "Karen"

True

The thing we're learning about right now, **conditionals**, allow us to actually **do something** with the comparisons that we make. 

For instance, we might want to write a congratulatory note if the distance was greater than 50 km. (I am making an assumption about the units for `distance`.)

You do it with an **`if` statement**. 

## `if` statements

An `if` statement is an instruction to do something *if* a particular condition is met.

A common Python conditional is made up of two lines
* On the first line, you type the English word `if` followed by an **expression** (for instance, a **comparison**) and then a colon (`:`) 
* On the second line, you **indent** (Jupyter will automaticalaly insert this indentation for you\*\*, but you could also use the `Tab` key on your keyboard) and write an instruction or "statement" to be completed if the condition is met

\*\* The distinction between tabs and spaces is a fraught topic in the programming community. Fortunately, Jupyter Lab (and Hub) make it easy for us by converting all tab characters into four spaces.

Here's a Python `if` statement:

In [21]:
if distance > 50:
    print("That's an impressive bike ride")

That's an impressive bike ride


As humanities students, you're all ready to handle the syntax of an `if` statement, because it's a lot like the way you introduce a block quotation in an essay.

```
The opening of Eliot's The Waste Land immediately establishes a mood of dread:
    April is the cruellest month, breeding
    Lilacs out of the dead land, mixing
    Memory and desire, stirring
    Dull roots with spring rain. (1-4)
From this point onward in the poem, it is all just further downhill.
```

The first line introduces the quotation — it sets up what's to follow — and ends with a colon `:`, which signals that we're about to move into something else. 

Then all subsequent lines are *indented*, to signal that they are in some way subordinate to that introductory phrase. The power of the colon `:` — the subordination of all subsequent lines to that introductory phrase — goes away only when we stop indenting.

**So much is communicated with one bit of puncutation (`:`) and one element of layout (indentation)!**

Same with an `if` statement in Python. Other languagues are hopelessly inelegant in the way they handle them. Python, like our convention for introducing block quotes, does a lot with a little: mere colons and indentation.

The opening line of an `if` statement names the condition we're looking to meet; and it ends with a `:`, signalling we're about to specify what will actually happen if that condition is met. The `:` leaves us hanging, waiting to know what action will occur if the condition is met!

The second line is indented, to show that it's subordinated to the first line. It picks up where the `:` left off, filling in the blank: if the condition is met, **do this**.

In [22]:
if distance > 50:
    print("Wow, your bike ride was really long today!")

Wow, your bike ride was really long today!


Although this syntax is goregously and elegantly minimalist, it is also quite unforgiving. Think of Python as a demanding aesthete: it has exquisite taste, and so will not tolerate even the slightest gaffe.

In [23]:
if distance > 50
    print("Wow, your bike ride was really long today!")

SyntaxError: expected ':' (1600612911.py, line 1)

Actually, what I just said is not fair. Because, unlike a demanding aesthete — who would merely turn up their nose and shoo you away — Python is kind enough to explain where we've gone wrong when we make a faux-pas.

In [None]:
if distance > 50:
print("Wow, your bike ride was really long today!")

## `else` statements

You can add more complexity to your `if` statement by specifying what do to if the condition in the `if` statement **isn't** met.

An `else` statement comes after an `if` statement and is formatted in the same way, except that you don't have to specify a condition (because it serves as an "if-all-else-fails-do-*this*" bucket).

In [None]:
if distance > 50:
    print("Wow, your bike ride was really long today!")
else:
    print("Great work, you made it out on your bike today.")

## `elif` statements

We can add even **more** nuance with `elif` — "else if" — statements.

The Python Interpreter (🐍) will evaluate the `if` statement first. Then, if it's not true, Python will go to the `elif` statement (and we can stack as many of these as we like) until Python finds a true one. If none of those are true, Python will go to the `else` statement, if we've provided one.

In [None]:
if distance > 50:
    print("Wow, your bike ride was really long today!")
elif distance < 5:
    print("That's a good start. Can you go farther tomorrow?")
else:
    print("That's a solid ride.")

In [None]:
# What happens if we try the following?  Try giving distance different values.
# Especially try it with a distance value of 4. Can you explain the result?
# Can you give distance a value that will get to the else statement?

distance = 4
if distance > 50:
    print("Wow, your bike ride was really long today!")
elif distance <=50:
    print("That's a solid ride.")
elif distance < 5:
    print("That's a good start. Can you go farther tomorrow?")
else:
    print("Is there a value for distance to ever cause this to be printed?")

# 🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍

# 2. Iteration and Loops

I'm just going to show you some code for a particular kind of loop, and let's see if you can figure out its syntax and what it does.

In [24]:
number = 10

while number > 0:
    print(number)
    number = number - 1
print(f"{number} Blastoff!")

10
9
8
7
6
5
4
3
2
1
0 Blastoff!


## `for` loops

That's one kind of Python loop — a `while` loop. And `while` it's very cool and useful, it's not as useful `for` us as another kind of loop: the `for` loop.

Last class, we talked about **indexing** and **slicing** in relation to two data types: `str`s and `list`s. 

This taught us the way that Python "breaks down" those two data types:
* `str`s are broken up into...
* and `lists` are broken up into...

In [25]:
# Let's us a line from "Poetry’s Data: Digital Humanities and the History of Prosody"
text = "Poetry is full of data. We read poetry informed by principles that we accept based on how we have been trained to read, speak, and interpret."
text_words = text.split()

In [26]:
text[:6]

'Poetry'

In [27]:
text_words[:5]

['Poetry', 'is', 'full', 'of', 'data.']

A `for` loop allows us to **move through the parts of** a particular variable — **iterate over it**, in the stylish and fashionable lexicon of Python — and **do something** to each part of it.

In [None]:
for character in text:
    print(character.upper())

In [None]:
for word in text_words:
    print(word.upper())

Here's a way of thinking about the syntax of a `for` loop.

In the below `for` loop, you're telling Python, 
> **Hey, Python! `for` every `element` that's "`in`" the variable `whatever`, please go in and do the following thing to it`:` `print()` off that `element`**

In [None]:
whatever = "blah blah blah"

for element in whatever:
    print(element)

Below is a more formal overview of a `for` loop.

The above `for` loops consist of two lines and have this syntax:

* On the first line, you type the word `for`, then a **variable name** for each item in the thing you'll be iterating over, then the word `in`, then the **name of the variable you want to iterate over**, and then a colon (`:`)
* On the second line, you indent and write an instruction or “statement” to be completed for each item in the list.

Note that the **variable name** you provide between `for` and `in` can be anything (as long as it follows variable naming conventions). It's nice to give it a descriptive name that corresponds to what the individual items of the stering or list *are* — but it doesn't need to be.

In [None]:
instructors = ["Karen", "Nat", "Mitchell", "Sarah", "Alexandra", "Kevin", "Cameron", "Hangrui"]

In [None]:
for name in instructors:
    print(f"This instructor's name is {name}.")

In [None]:
for x in instructors:
    print(f"This instructor's name is {x}.")

## Combining loops and conditionals

So, it turns out that our new friends `if` and `for` are already friends! They get along really well with one another. 

For instance:

In [None]:
for name in instructors:
    print(f"Your name is {name}.")
    if name == "Karen":
        print("You are standing at the front of the room.")
    elif name == "Nat":
        print("You are sitting at the front of the room.")
    elif name == "Mitchell" or name == "Sarah" or name == "Kevin":
        print("You are sitting at the back of the room.")

Notice, in the above, how indentation and colons work to indicate how everything fits together, keeping everything nestly **nested** like a matryoshka doll.

![Matryoshka dolls](matryoshki.jpg)

```
On the outside is the for loop:
    Inside that is a print() function.
    Then there is an if statement:
        Which contains a print() function.
    Then there is an elif statement:
        Which contains another print() function.
    Then there is yet another elif statement:
        Which contains yet another print() function.
    And then we go back to the start of the for loop, for as long as there are items to iterate over.
And then when there are no items left to iterate over, the for loop is done, and we are outside it.
```
On your own time, play around with the levels of indentation, breaking the logical, nested structure in multiple ways and then bringing it back to life!

How could we combine our big distance-related `if` statement above,

```
if distance > 50:
    print("Wow, your bike ride was really long today!")
elif distance < 5:
    print("That's a good start. Can you go farther tomorrow?")
else:
    print("That's a solid ride.")
```

with a for loop that iterates over a list of distances?



In [None]:
# Here is a space for you to write this.
distances = [4, 63.5, 22]

# 3. Using Loops and Iteration to Calculate Types

Okay, let's work together to think through what we would actually need to do in order to calculate the number of unique words or **types** in a text file that we've created.

* First we would need to load the text into a string.
* Then we would need to break it up into a list of words.
* Then we would need to go through that list of words and, at every step, determine if we've already met that words before. If it's a new word, we would store it in a new list of unique words.
* When we're done, we need to count how many words are in that new list of unique words.

We already have pretty much all the tools we need to do this. 

## The `in` operator — and `not`

`in` checks whether a particular item is in a particular list.

It can be combined with `not` — cousin to `and` and `or`, which we met above — to check if a particular item is **absent from** a particular list.

In [None]:
print(instructors)

In [None]:
"Karen" in instructors

In [None]:
"Rachel" in instructors

In [None]:
"Karen" not in instructors

In [None]:
"Kevin" not in instructors

In [None]:
text_words

In [None]:
"poetry" in text_words

## A few list methods and we're done :)

We need one new list method to finish our task... but we may as well use this as an opportunity to learn about a few other list methods, since they will come in handy down the line.

* `list.append(another_item)`: adds new item (a `str`, `int`, `float`, or `bool`) to end of list
* `list.extend(another_list)`: adds items from another_list (has to be a `list`) to list
* `list.remove(item)`: removes first instance of item from the list
* `list.sort()`: sorts the list alphabetically (for reverse alphabetical order, use `list.sort(reverse=True)`)
* `list.reverse()`: reverses current order of list

In [None]:
instructors = ["Nat", "Karen", "Kevin"]
print(instructors)

In [None]:
instructors.reverse()
print(instructors)

Note that unlike the string methods we met last time, these are all **mutating methods** (more awesome Python terminology!!) meaning they don't just spew things out — they actually go into the variable and change its contents.

The list method we're interested in right now in is `list.append(another_item)`. 

In [None]:
print(instructors)
instructors.append("Mitchell")
print(instructors)

Note what happens if we run the above line multiple times. 

This is an important point about Jupyter Notebooks: **when cells are run multiple times, they can yield different reults.**

Imagine a scenario in your homework gets the results you want... but only if someone runs a particular cell multiple times (which they won't know how to do). Since the autograding software only runs each cell once, this would yield an "incorrect" evaluation in an autograding situation. 

**To make sure your code runs correctly without depending on particular cells running more than once, you should regularly "Run All Cells"** (under the Cell menu).

Let's try using the `list.append()` method in a `for` loop.

Let's make a little loop that goes through a string and, for each of its letters, adds it to an empty list.

In [None]:
word = "plenipotentiary"

new_list = []

for letter in word:
    new_list.append(letter)

In [None]:
new_list

Now let's stick a conditional inside a loop, *and* use the `list.append()` method within that conditional statement. 

### **This will stack together all the skills we need to do today's task!**

Let's look through every word from our `text_words` variable (a line from *Poetry’s Data: Digital Humanities and the History of Prosody*, split up into words) and store all the ones that begin with a vowel in a new list variable called `vowel_words`. (The internet informs me that the "sometimes Y" very seldom applies to Ys at the beginning of words...)

In [None]:
vowel_words = []

for word in text_words:
    if word[0] == "a" or word[0] == "e" or word[0] == "i" or word[0] == "o" or word[0] == "u":
        vowel_words.append(word)

In [None]:
print(vowel_words)

## Okay, now we're ready to calculate the number of unique words in `text_words`

Read through the code below and try to figure out what every line does.

In [None]:
unique_words = []

for word in text_words:
    if word not in unique_words:
        unique_words.append(word)

In [None]:
unique_words

## Now we're ready to calculate a type-token ratio!

In [None]:
len(text_words)

In [None]:
len(unique_words)

In [None]:
(len(unique_words) / len(text_words)) * 100

Note that I wrapped (types / tokens) in `()` to make sure that the "order of operations" is calculated correctly. It doesn't actually matter in this case — but might as well get used to it!

## Shall we give this a try with an actual text??

In [None]:
sot4 = open("sign-of-four.txt", encoding="utf-8").read()

In [None]:
sot4[:20]

In [None]:
sot4_words = sot4.split()

In [None]:
sot4_words[:20]

In [None]:
sot4_unique_words = []

for word in sot4_words:
    if word not in sot4_unique_words:
        sot4_unique_words.append(word)

In [None]:
sot4_unique_words[:20]

In [None]:
sot4_ttr = len(sot4_unique_words) / len(sot4_words) * 100
print(sot4_ttr)

Let's have a peek inside our `sot4_unique_words` variable to see how well we're doing in finding unique words. Let's apply the `list.sort()` method to make our list more legible. (We'll lose word order, but that's okay in this case!)

In [None]:
sot4_unique_words.sort()

In [None]:
sot4_unique_words[:50]

In [None]:
sot4_unique_words[-50:]

How could we improve our list of unique words?
* remove punctuation
* remove capitalization

The former is tricky, but we already know how to do the latter. How could we add that to the for loop that looks for the number of unique words?

In [None]:
sot4_unique_words = []

for word in sot4_words:
    word = word.lower()
    if word not in sot4_unique_words:
        sot4_unique_words.append(word)

In [None]:
sot4_unique_words[:50]

In [None]:
sot4_ttr_lowered = (len(sot4_unique_words) / len(sot4_words)) * 100

Let's compare our two TTR results: `sot4_ttr` (capitalization present) and `sot4_ttr_lowered` (capitalization removed). Which do you think will be higher? Why? How much do you expect the two numbers to differ?

In [None]:
print(sot4_ttr)
print(sot4_ttr_lowered)