In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("lab09.ipynb")

<img src="data6.png" style="width: 15%; float: right; padding: 1%; margin-right: 2%;"/>

# Lab 9 â€“ Functions, Control, and Iteration

## Introduction to Computational Thinking with Data Science and Society

 Welcome to Lab 9! This week we will be covering boolean operators, functions, **control**, **iteration**, and **strings**.

In [None]:
# Just run this cell
from datascience import *
import numpy as np
import warnings
warnings.simplefilter('ignore')

import seaborn as sns
import matplotlib.pyplot as plt
plt.style.use("fivethirtyeight")
%matplotlib inline

<hr style="border: 1px solid #fdb515;" />

# [Tutorial] Part 1: Boolean Expressions

---

## [Tutorial] Comparison Operators

In Python, the boolean is a data type with only two possible values: `True` and `False`. Expressions containing comparison operators such as `<` (less than), `>` (greater than), and `==` (equal to) evaluate to Boolean values. A list of common comparison operators:

| Comparison	| Operator	| True Example	| False Example | 
| --- | --- | --- | --- |
| Less than | 	<	| 2 < 3 	 |2 < 2 | 
| Greater than | 	>	 | 3 > 2	 | 3 > 3 | 
| Less than or equal | 	<= | 	2 <= 2 | 	3 <= 2 | 
| Greater or equal | 	>= | 	3 >= 3 | 	2 >= 3 | 
| Equal | 	==	 | 3 == 3 | 	3 == 2 | 
| Not equal | 	!=	 | 3 != 2	 | 2 != 2 | 

Run the following series of cells to see some examples of these comparison operators in practice. For each cell, make sure you understand the cell output before continuing.

In [None]:
1 < 2

In [None]:
min(20, 30) <= max(-10, 20)

In [None]:
make_array(1, 2, 3, 4) != 3

In [None]:
"abc" < "def"

Note that == and = are **not** the same!
    
>`=` is used for **assignment**, i.e., to assign a Python name to a value.<br/>On the other hand, `==` is used as the **comparison operator** to compare two values. 

In [None]:
result = (3 + 4) == (3 * 4)
result

In [None]:
# Set the name 'a' to have value 5
a = 5

In [None]:
# Ask if it is equal to the value you think it should be
a == 5

---

## [Tutorial] Membership Operators: `in`
The keyword `in` allows you to check for membership: if an element is contained in a larger sequence of elements.

As one use case, `in` can be used to check for **substrings**, i.e., strings that are contained in larger strings:

In [None]:
"hello" in "hello world"

In [None]:
"i" in "rhythm"

In another use case, `in` can also check whether an array *contains* a certain element:

In [None]:
my_array = make_array(3,2,5,6)
5 in my_array

In [None]:
9 in my_array

---

## [Tutorial] Compound expressions with Boolean operators `and`, `or`, and `not`

`and`, `or`, and `not` are boolean operators that operate directly on booleans, creating **compound** boolean expressions. For two booleans `x` and `y`:

* `x and y`: Evaluates to `True` when both operands are `True`; otherwise, it evaluates to `False`.
* `x or y`: Evaluates to `False` when both operands are `False`; otherwise, it evaluates to `True`.
* `not x`: Evaluates to the other boolean, i.e., "flips" the value.

| `x` | `y` | `x and y` | `x or y` | `not x` |
| --- | --- | --- | --- | --- |
| `False` | `False` | `False` | `False` | `True` | 
| `False` | `True` | `False` | `True` | `True` |
| `True` | `False` | `False` | `True` | `False` | 
| `True` | `True` | `True` | `True` | `False` |
    
These three operators are useful when writing functions because they help control a function's logic. You will find them useful when writing more complicated functions in the course.

**Tip**: Use parentheses to ensure your boolean operators are being applied correctly and without ambiguity, as below:

In [None]:
(3 * 4 == 12) or (3 + 4 == 50)

In [None]:
(3 * 4 == 12) and (3 + 4 == 50)

In [None]:
not (3 + 4 == 50)

<hr style="border: 1px solid #fdb515;" />

# Part 2: Functions

Writing functions is useful when we want to use the same code repeatedly without wanting to write it out each and every time. When defined, function names by themselves do not execute any code. However, when called, functions operate on arguments and execute lines of code in the function body.

Below is an example of a Python function that doubles its input:

In [None]:
# just run this cell
def double(x):
    return 2 * x

Let's break down the different parts of this function:
- The `def` at the beginning tells Python that we are making a new function.
- The blue `double` text is the name of our function. If we want to use this function, we will call it using this name.
- The `x` inside the parentheses is the input to the function, also known as a **parameter**. When we call the function `double`, we pass in different **arguments** to the parameter `x`. A function can have many parameters (even zero), as long as they have different names.
- The `:` indicates we are done defining our function **signature**.
- All lines after this `def` line are the **function body**; these lines are **indented**.
    - You can have as many lines as you want in your functions, but in this case we only have 1.
- The `return` keyword indicates the output to the function, also know as the **return value**. In our case, we want to multiply the parameter `x` by 2 before returning.

In [None]:
# just run this cell
answer = double(5)
answer

---

## Question 2.1

Write code that defines the `satisfies` function, which returns `True` if the parameter `x` satisfies *at least one* of the following conditions:
1. Has a value in the range 24 and 32 (inclusive)
2. Is even

Otherwise, if `x` satisfies neither condition, `satisfies` returns `False`. Some examples:

```python
satisfies(27) # True
satisfies(64) # True
satisfies(32) # True
satisfies(33) # False
```

*Hint*: You should try using the modulo operator (%) to help check if a value is odd or even! The `%` operator will provide you with the *remainder* after dividing by a certain number.

Feel free to adjust the example value `val` below to test your function.

In [None]:
def satisfies(x):
    ...

val = 33
satisfies(val)

In [None]:
grader.check("q2_1")

## Scope

We can also call values that are assigned **outside** of the function, including other pre-existing functions or named values, **inside** a new function that we define. We know this through our current use of Jupyter notebooks: We can run cells sequentially and access values assigned to names in previously-executed cells.

For example, the named `outside_value` can be accessed and read inside the function `greeting`:

In [None]:
outside_value = "Hello, "

def greeting(place):
    return outside_value + place

In [None]:
greeting("World!")

However, any names assigned **inside** a function cannot be used **outside** the function; this is because these names only exist in the **local scope** of the function, while it is executing.

The locally named `current_year` is only accessible inside the function `age`, and inaccessible otherwise:

In [None]:
def age(birth_year):
    """ This is called a docstring. This explains what the function does:
    Return the age of a person, given their birth year."""
    current_year = 2025
    return current_year - birth_year

In [None]:
age(2004)

In [None]:
# uncomment the line below, which errors. re-comment if you want to run all cells.
# birth_year 

---

## Question 2.2

Complete the function `triple_double`, so that it uses the function `double` we used earlier to double an input. Then, take that doubled input and triple it and return the triple-doubled value. 

For example: if we called `triple_double(2)`, the input 2 should be doubled to 4, and then tripled to 12. The function would then output 12.  


In [None]:
def triple_double(x):
    ...

In [None]:
grader.check("q2_2")

<hr style="border: 1px solid #fdb515;" />

# Part 3: Conditional Statements

When writing functions in Python, we may want the function to behave differently depending on the input---a concept in computer science called **control**. We could choose to write several similar functions to accomplish this, but that would require copying and pasting much of the same code over and over again, only making small changes. Instead, we need a way to tell one singular function to execute different code for different inputs.

A conditional statement is a multi-line statement that allows Python to choose among different alternatives based on the truth value of an expression.

Here is a basic example:

In [None]:
# just run this cell
def sign(x):
    if x > 0:
        return 'Positive'
    else:
        return 'Negative'

If the input `x` is greater than `0`, we return the string `'Positive'`. Otherwise, we return `'Negative'`.

In [None]:
sign(3)

In [None]:
sign(-2)

If we want to test multiple conditions at once, we use the following general format.

```python
if <if expression>:
    <if body>
elif <elif expression 0>:
    <elif body 0>
elif <elif expression 1>:
    <elif body 1>
...
else:
    <else body>
```
        
Only the body for the first conditional expression that is true will be evaluated. Each if and elif expression is evaluated and considered in order, starting at the top. elif can only be used if an if clause precedes it. As soon as a true value is found, the corresponding body is executed, and the rest of the conditional statement is skipped. If none of the if or elif expressions are true, then the else body is executed.

---

## Question 3.1(a)

Water has a different state of matter at different temperatures:

<img src='water.jpeg' width=300>

For a given temperature (in Farenheit), the function `state_of_matter` should return:
* `'ice'`, the solid state of water, below 32Â°F
* `'water'`, the liquid state of water, between 32Â°F and 212Â°F, inclusive
* `'steam'`, the gaseous state of water, above 212Â°F

Implement the function `state_of_matter` below:

In [None]:
def state_of_matter(temperature):
    if ...:
        ...
    elif ...:
        ...
    else:
        ...

In [None]:
grader.check("q3_1_a")

We can call the function on different inputs and the function deals with each input differently based on the control logic you gave it. This will be very important when you write functions that deal with large, sometimes unpredictable datasets where the logic of your function will deal with inputs you may not have directly prepared the function for.

Let's use this `state_of_matter` function you defined above to determine the state of water in the following locations during winter:

In [None]:
# just run this cell
city_temps = Table().with_columns(
    "City", make_array("Berkeley", "New York", "Miami", "Earth's Core"),
    "Temperature", make_array(58, 24, 78, 10800)
)
city_temps

---

## Question 3.1(b): `apply`

Create a table `city_temps_water`, a new table that copies the two columns of `city_temps` and adds a  third column, `State of Matter`, with the state of water in each location.

_Hint_: Consider using the `apply` method, which returns an array.

In [None]:
city_temps_water = ...
city_temps_water

In [None]:
grader.check("q3_1_b")

---
## Question 3.2: Hailstone

The Hailstone sequence from Hofstadter is as follows:
1. Pick a positive integer n as the start value.
2. If n is even, divide it by 2.
3. If n is odd, multiply it by 3 and add 1.
4. If you continue this process, n will eventually reach 1. 

Write a function that uses if/else statements to do **one step** of the hailstone sequence. It should take in an integer n and then return the corresponding Hailstone number for that value. For example, `hailstone(10)` should return 5 and `hailstone(9)` should return 28. 


In [None]:
def hailstone(n):
    if ...:
        ...
    else:

In [None]:
grader.check("q3_2")

In [None]:
# Run this cell to see the Hailstone sequence in action!
print("Starting value: 10")
n1 = hailstone(10)
print("Step 1:", n1)
n2 = hailstone(n1)
print("Step 2:", n2)
n3 = hailstone(n2)
print("Step 3:", n3)
n4 = hailstone(n3)
print("Step 4:", n4)
n5 = hailstone(n4)
print("Step 5:", n5)
n6 = hailstone(n5)
print("Step 6:", n6)

---

## Question 3.3: XOR

The XOR operation on two boolean values is an "exclusive or." The function `xor(x, y)` returns:
* `True` only if exactly one of `x` or `y` is `True` (or truthy).
* `False` otherwise.

Some examples (note that the second-to-last example differentiates an XOR from the `or` operator):

| `x` | `y` | `xor(x, y)` |
| --- | --- | --- |
| `False` | `False` | `False` |
| `False` | `True` | `True` |
| `True` | `False` | `True` | 
| `True` | `True` | `False` |
| `None` | `3` | `True` | 

Implement the `xor` function below.
* As a first pass, try using conditional statements.
* As a challenge, try using only parentheses and boolean operators `and`, `or`, and `not`!

In [None]:
def xor(x, y):
    ...

In [None]:
grader.check("q3_3")

_Comment_: XOR is a classic paradigm in computer science and hardware design, but implementing it requires a solid understanding of boolean operations. You should feel very proud of yourself for successfully implementing this core function! :-)

<hr style="border: 1px solid #fdb515;" />

# Part 4: Apply-Filter-Drop

Until now, we have filtered table rows using Boolean predicates defined by the `where` predicates in the `datascience` package: `are.equal_to`, `are.above`, `are.containing`, and so on. Now that we know how to define our own functions, we can create our own custom row filters with the following multi-step process (we'll call it the unoriginal **"apply-filter-drop"** approach):

1. Define a function that returns a boolean value based on a row's values.
1. `apply` this function to create a new (but temporary) boolean column of`True`s and `False`s.
1. Filter (use `where`) rows based on this new Boolean column.
1. `drop` the temporary boolean column so that our resulting table has the original table's columns.

The resulting table will have the original table's columns but will have only the rows that satisfy the given filtering condition. This is very likely the **algorithm** that the original `datascience` library creators used to implement the `are` predicates...!

In this part, we will:

1. See a tutorial that implements the same filtering approach with multiple strategies, including some advanced method chaining; and
1. Implement our own custom filter.

We will use the classic [iris flower dataset](https://en.wikipedia.org/wiki/Iris_flower_data_set). The `iris` dataset measures flowers of three different species of the iris plant. The petal and sepal are different parts of a flower. Each row in the `iris` table records one flower's species, its petal dimensions, and its sepal dimensions.

In [None]:
# just run this cell
iris = Table().read_table("iris.csv")
iris

---

## [Tutorial] Method-chaining strategies

There are three different species of iris represented in the `iris` table: _iris versicolor_, _iris setosa_, and _iris virginica_. The below cells are all ways to filter out the _setosa_ species, keeping only _versicolor_ and _virginica_ flowers.

For each approach below, verify that you understand the code procedures being executed. **It may take you several passes through the code; we encourage you to discuss with a partner.** _You may find it useful to comment out different parts of the chain to view temporary tables!_

### Approach 1

In [None]:
approach1 = iris.where("species", are.contained_in(["versicolor", "virginica"]))
approach1.group("species")

### Approach 2

In [None]:
def either_species(species):
    return species == "versicolor" or species == "virginica"

approach2 = (iris.with_columns("temp_col", iris.apply(either_species, "species"))
             .where("temp_col", True)
             .drop("temp_col")
            )
approach2.group("species")

### Approach 3

In [None]:
def is_setosa(species):
    return species == "setosa"

approach3 = (iris.with_columns("temp_col", iris.apply(is_setosa, "species"))
             .where("temp_col", False)
             .drop("temp_col")
            )
approach3.group("species")

**Discuss 1**, no response needed: Which approach do you prefer? Why?

**Discuss 2**, no response needed: Why would method chaining two `where` calls *not* achieve our desired filter?

```python
# the below doesn't work!
iris.where("species", are.contained_in(["versicolor", "virginica"]))
```

---

The _setosa_ species of iris has the smallest petal and sepal lengths of the three species in the `iris` dataset:

<img src='iris.png' width=800>

In this part, you will find the flowers in `iris` that have above-average petal length **or** sepal length.

---

## Question 4.1

First, compute the mean petal length and sepal lengths across all flowers in the `iris` table. Assign these values to `mean_petal_length` and `mean_sepal_length`, respectively.

In [None]:
mean_petal_length = ...
mean_sepal_length = ...

# these lines are for display
print("mean petal length", mean_petal_length)
print("mean sepal length", mean_sepal_length)

In [None]:
grader.check("q4_1")

---

## Question 4.2

Next, assign `big_flowers` to a table with only the flowers in `iris` that have above-average petal length _**or**_ above-average sepal length. "Above average" is defined as _at least as large as_ the mean value. Your new table `big_flowers` should have the same columns as the original `iris` table but will likely have fewer rows.

_Hints_:
* You may want to implement the custom function `is_big_flower` and follow our apply-filter-drop approach.
* The custom function `is_big_flower` might need to take multiple arguments. Consider how this changes your function signature **and** the number of arguments you pass to `apply`.
* The `mean_petal_length` and `mean_sepal_length` you defined above are accessible within the scope of `is_big_flower`, so you don't have to recompute these values.

In [None]:
def is_big_flower(...):
    ...

big_flowers = ...


big_flowers

In [None]:
grader.check("q4_2")

Analysis: When we look at what these bigger irises are, none are _setosa_â€”they're all _versicolor_ or _virginica_!

In [None]:
# just run this cell
big_flowers.scatter("petal_length", "sepal_length", group="species")

# Done! ðŸ˜‡

## Pet of the Day

You too can be a hero! **Hero** just wanted to let you know!!!

<img src="hero.jpeg" width="50%" alt="Cute dog named hero"/>

---

To double-check your work, the cell below will rerun all of the autograder tests.

In [None]:
grader.check_all()

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export(pdf=False)