# Control Flow and Functions

* * * 

### Icons Used In This Notebook
🔔 **Question**: A quick question to help you understand what's going on.<br>
🥊 **Challenge**: Interactive exercise. We'll work through these in the workshop!<br>
💭 **Reflection**: Helping you think about programming.<br>
⚠️ **Warning**: Heads-up about tricky stuff or common mistakes.<br>
💡 **Tip**: How to do something a bit more efficiently or effectively.<br>
🎬 **Demo**: Showing off something more advanced – so you know what Python can be used for!<br>

### Learning Objectives
1. [Reflection: Vocabulary Recap](#recap) 
2. [Loops](#loops)
3. [Conditionals](#cond)
4. [Functions and Arguments](#func)

<a id='loops'></a>

# 💭 Reflection: Vocabulary Recap

**Variables** are names attached to particular values.
   * To create a variable, you assign it a value and then start using it.
   * Assignment is done with a single equals sign `=`.
   * When we write `n = 300`, we are assigning 300 to the variable `n` via the assignment operator `=`.

**Statements** are instructions that a Python interpreter can execute.
   * They end with a NEWLINE character. This means each line in Python is a statement.
   * For instance, `print('Hello')` is a statement; `x = 20` is a statement; and `x + y + z` is a statement. 
    
**Functions** perform actions on "things".
   * `print()` `len()`, and `type()`, are some of the most commonly used functions.
   * You can identify a function because of its trailing round parentheses.  

**Arguments** are the "things" we perform the action on within a function.
   * They can be variables, datasets, or even other functions!
   * Arguments go inside the trailing parentheses of functions when we call them.
   * Arguments are also called inputs or parameters.

**Methods** are type-specific functions.
   * Different data types and structures have functions that only apply to them.
   * For instance, strings have methods that only apply to them (lowercasing, uppercasing, etc.) that won't work with other data types.
   * Methods are accessed using dot notation – e.g. `some_string.lower()`.

Check out our [Python glossary](https://github.com/dlab-berkeley/Python-Fundamentals/blob/main/glossary.md) for definitions to other key vocabulary.

<a id='loops'></a>

# Loops

Let's say we have three tire pressure measurements: `40.9`, `35.2`, and `28.4`. We want to round each measurement to a whole number. How would we do this using the tools available to us?

Here's one way:

In [None]:
tire1 = 40.9
tire2 = 35.2
tire3 = 28.4

print(round(tire1))
print(round(tire2))
print(round(tire3))

Here's another:

In [None]:
# We could also use a list
tires = [40.9, 35.2, 28.4]
print(round(tires[0]), round(tires[1]), round(tires[2]))

🔔 **Question:** Which method do you prefer?

Let's say that we have 1000 tire pressures in a list. Using either method would require rounding each value separately.

Our current approach is not particularly scalable. It's also not very flexible. For example, what if you want to round every tire pressure to two decimal places?

## Loops for Repeated Computation

The strength of using computers is their speed. We can leverage this by facilitating repeated computation with **loops**. In programming, there are generally two kinds of loops: for loops and while loops. 

A **[for loop](https://www.w3schools.com/python/python_for_loops.asp)** executes some statements once *for* each value in an interable (like a list or a string). It says: "*for* each thing in this group, *do* these operations".

A **[while loop](https://www.w3schools.com/python/python_while_loops.asp)** says: "*while* Condition A is true, *do* these operations".  We don't use these loops frequently in this type of programming so we won't cover them here.

Let's take a look at the syntax of a for loop using the above example:

In [None]:
# We use a variable containing a list with the values to be iterated through
tires = [41, 35, 28]

# Initialize the loop
for pressure in tires:
    print(round(pressure))

# This will only be printed when the loop has ended!
print('The loop has ended.')

Note that the above example is pretty easy to read:

"**for** each pressure **in** tires, print out the rounded pressure".

Of course, this requires that you use good names for your variables, though!

## Loop Syntax

Let's break down the syntax of the `for` loop more closely.

*   The colon at the end of the first line signals the start of a *block* of statements.
*   The indented line(s) following the colon indicate the lines to run as a part of the loop (also known as the body).
*   Unindented lines following the loop will execute **after** all iterations of the for loop are complete.
*   `for loop_variable in collection:` The loop variable is what gets plugged into the calculations in the body of the loop, and the collection is the group of values being looped through.
*   Loop variables:
    *   Are created on demand.
    *   Can have any name (though your code is more readable if these names are meaningful!).
    *   Act as placeholders for the loop.

## 🥊 Challenge: Fixing Loop Syntax

The following block of code contains **three errors** that are preventing it from running properly. What are the errors? How would you fix them?

In [None]:
for number in [2, 3, 5]
print(n)


## Loops with Strings, Series, and `range`

Loops can loop over any iterable data type. An **iterable** is any data type that can be iterated over, like a sequence. Generally, anything that can be indexed (e.g. accessed with `values[i]`) is an iterable.

For example, a string is iterable, so it is possible to loop through a string!

Let's take a look at an example:

In [None]:
example_string = 'yosemite'

for char in example_string:
    # Use the upper() method on char
    print(char.upper())

## 🥊 Challenge: Looping Through a Series

We can loop through a `pandas` Series just like we can through a list.

Let's do this for the following DataFrame, which contains a bunch of mountains and their elevation in feet. Proceed as follows:

1. Extract the column `elevation` as a Series.
2. Loop through the series using a `for` loop.
3. Convert each value to meters using the conversion: 1 ft = .304 m.
4. Print the results. 

In [None]:
import pandas as pd

mountains_df = pd.DataFrame(
    {'mountain': ['Mt. Whitney',
                  'Mt. Williamson',
                  'White Mountain Peak',
                  'North Palisade',
                  'Mt. Shasta',
                  'Mt. Humphreys'],
     'range': ['Sierra Nevada',
               'Sierra Nevada',
               'White Mountains',
               'Sierra Nevada',
               'Cascade Range',
               'Sierra Nevada'],
     'elevation': [14505, 14379, 14252, 14248, 14179, 13992]}
)

mountains_df

In [None]:
# YOUR CODE HERE


💡 **Tip**: Using for-loops in Pandas is generally not advisable. We can loop over data much more efficiently using something called [**vectorization**](https://www.geeksforgeeks.org/vectorized-operations-in-numpy/). We can tell Pandas to multiple the whole column, and it will multiply each element individually! 

In [None]:
mountains_df.elevation * .304

## Aggregating Values with Loops

A common strategy in programs is to:
1.  Initialize an *accumulator* variable appropriate to the datatype of the output:
    * `int` : `0`
    * `str` : `''`
    * `list` : `[]`
2.  Update the variable with values from a collection through a for loop. Typical update operations are:
    * `int` : `+`
    * `str` : `+`
    * `list` : `.append()`
    
The result of this is a single list, number, or string with a summary value for the entire collection being looped over.

Returning to the tire pressure example, we can make a new list with all of the tire pressures rounded:

In [None]:
rounded_pressures = []

for pressure in tires: 
    rounded = round(pressure)
    rounded_pressures.append(rounded)

print('Rounded tire pressures:', rounded_pressures)

## 🥊 Challenge: Aggregation Practice

Below are a few examples showing the different types of quantities you might aggregate using a for loop. These loops are partially filled out. Finish them and test that they work!

1. Find the total length of the strings in the given list. Store this quantity in a variable called `total`.

In [None]:
total = 0
words = ['red', 'green', 'blue']

for w in words:
    ____ = ____ + len(w)

print(total)

2. Find the length of each word in the list, and store these lengths in another list called `lengths`.

In [None]:
lengths = ____
words = ['red', 'green', 'blue']

for w in words:
    lengths.____(____)

print(lengths)

3. Concatenate all words into a single string called `result`.

In [None]:
words = ['red', 'green', 'blue']
result = ____

for ____ in ____:
    ____

print(result)

4. Create an acronym, as a single string, representing the list of words. Each part of the acronym should consist of the first letter of each word, capitalized. For example, your loop should output `"RGB"` for the input `["red", "green", "blue"]`. For this one, write the entire loop yourself!

In [None]:
words = ['red', 'green', 'blue']

# YOUR CODE HERE


💡 **Tip**: Python runs loops without showing you all the steps it takes. If you want to visualize all steps, check out [pythontutor.com](https://pythontutor.com/python-debugger.html#mode=edit). Try copy-pasting one of your answers in the last challenge!

<a id='cond'></a>

# Conditionals

**Booleans** are a fundamental data type in programming. Booleans are variables that are **binary**: they can either be `True` or `False` (written with capital letters).

Why do we use these? They're very useful for **control flow**: changing the course of a program depending on certain conditions. Booleans allow decision making in these contexts.

## Boolean Masks

A **boolean mask** allows you to use Booleans in data frames. It returns a `Series` object containing `True` and `False` values you can then use for other purposes. 

Let's say we're working with a `DataFrame` containing information about mountains.

In [None]:
import pandas as pd

mountains_df = pd.DataFrame(
    {'mountain': ['Mt. Whitney',
                  'Mt. Williamson',
                  'White Mountain Peak',
                  'North Palisade',
                  'Mt. Shasta',
                  'Mt. Humphreys'],
     'range': ['Sierra Nevada',
               'Sierra Nevada',
               'White Mountains',
               'Sierra Nevada',
               'Cascade Range',
               'Sierra Nevada'],
     'elevation': [14505, 14379, 14252, 14248, 14179, 13992]}
)

Let's use some boolean masks with different **comparison operators**. These are operators than are used to compare two values.

First, equality. This is signaled in Python (and many other languages) by the double equals sign `==`. It's distinct from the assignment operator (single equals sign `=`) used in variable assignment. 

In [None]:
mountains_df['range'] == 'Sierra Nevada'

We can also use `<` (smaller than), `>` (greater than), and `!=` (unequal to). For instance:

In [None]:
# Select the elevation column and apply a boolean mask
mountains_df['elevation'] > 14200

Let's add this last `Series` as a column to `mountains_df`. We can add a column by assigning a series to a new column name in bracket notation. 

In [None]:
mountains_df['over_14200'] = mountains_df['elevation'] > 14200
mountains_df

🔔 **Question**: Do you understand the code below?

In [None]:
sum(mountains_df['elevation'] > 14200) / len(mountains_df['elevation'])

💡 **Tip**: Python also has "logic operators" such as `and` and `or` than can be used to compare Boolean values with logic. See [here](https://www.w3schools.com/python/python_operators.asp) for a list of all operators!

## If-Statements

A fundamental structure in programming is the **conditional**. These blocks allow different blocks of code to run, *conditional* on specific things being true.

The most widely used conditional is the **if-statement**. An if-statement controls whether some block of code is executed or not. Its structure is similar to that of a for loop: 

*   The first line opens with the `if` keyword and contains a Boolean variable or expression. It ends with a colon. If the expression evaluates to `True`, the block of code will run.
*   The body, containing whatever code to execute if the condition is met, is indented.

So, if the Boolean expression is `True`, the body of an if-statement is run. If not, it's skipped. Let's look at an example:

In [None]:
number = 105

In [None]:
# Body is executed
if number > 100:
    print(number, 'is greater than 100.')

In [None]:
# Body is not executed
if number > 110:
    print(number, 'is greater than 110.')

## Conditionals and Loops

Conditionals are particularly useful when we're iterating through a list, and want to perform some operation only on specific components of that list that satisfy a certain condition.

🔔 **Question**: what will the output of the following code be?

In [None]:
numbers = [12, 20, 43, 88, 97, 100, 105, 110]

for number in numbers:
    if number > 100:
        print(number, 'is greater than 100.')

## Conditionals: Else-statements

Else-statements supplement if-statements. They allow us to specify an alternative block of code to run if the if-statement's conditional evaluates to `False`.

🔔 **Question**: What is the difference between the following cell and the previous if statement. How will that affect the output?

In [None]:
numbers = [12, 20, 43, 88, 97, 100, 105, 110]

for number in numbers:
    if number > 100:
        print(number, 'is greater than 100.')
    else:
        print(number, 'is less than or equal to 100.')

## Conditionals: Else-if Statements

We may want to check several conditionals at the same time. **Else-if (Elif-)** statements allow us to specify as many conditional checks as we'd like in the same block.

Elif-statements must follow an if-statement. They only are checked if the if-statement fails. Then, each elif-statement is checked, with their corresponding bodies run when the conditional evaluates to `True`.

An else statement at the end can act as a "catch all", when the if statement and all following else-if statements fail.

In Python, else if statements are indicated by the `elif` keyword. Consider the following conditional cell.

In [None]:
numbers = [12, 20, 43, 88, 97, 100, 105, 110]

for number in numbers:
    if number > 100:
        print(number, 'is greater than 100.')
    elif number > 50:
        print(number, 'is greater than 50.')
    elif number > 25:
        print(number, 'is greater than 25.')
    else:
        print(number, 'is less than or equal to 25.')

The order of the if and elif statements matters. When one if/elif statement is met, all following statements are skipped.  If there are multiple if statements, then each statement is evaluated separately. These kinds of errors won't give errors in the code, but they will give results that might not make sense, which can take longer to find and debug.

## 🥊 Challenge: Conditionals and Aggregation
Below, we've created a list of US Presidents. Create a new list containing all Presidents whose first name starts with the letter J. How many presidents are on this list?

**Hint:** The `.split()` string function will be useful for this. Also, remember that strings are indexed: you can access any character of the string using bracket notation!

In [None]:
presidents = [
    "George Washington",
    "John Adams",
    "Thomas Jefferson",
    "James Madison",
    "James Monroe",
    "John Quincy Adams",
    "Andrew Jackson",
    "Martin Van Buren",
    "William Henry Harrison",
    "John Tyler",
    "James K. Polk",
    "Zachary Taylor",
    "Millard Fillmore",
    "Franklin Pierce",
    "James Buchanan",
    "Abraham Lincoln",
    "Andrew Johnson",
    "Ulysses S. Grant",
    "Rutherford B. Hayes",
    "James A. Garfield",
    "Chester A. Arthur",
    "Grover Cleveland",
    "Benjamin Harrison",
    "Grover Cleveland",
    "William McKinley",
    "Theodore Roosevelt",
    "William Howard Taft",
    "Woodrow Wilson",
    "Warren G. Harding",
    "Calvin Coolidge",
    "Herbert Hoover",
    "Franklin D. Roosevelt",
    "Harry S. Truman",
    "Dwight D. Eisenhower",
    "John F. Kennedy",
    "Lyndon B. Johnson",
    "Richard Nixon",
    "Gerald Ford",
    "Jimmy Carter",
    "Ronald Reagan",
    "George H. W. Bush",
    "Bill Clinton",
    "George W. Bush",
    "Barack Obama",
    "Donald Trump",
    "Joe Biden"]

In [None]:
last_name_b = ___
for p in presidents:
    if ___
        ____.append(___)
print(last_name_b)

<a id='func'></a>

# Functions and Arguments

As you might have noticed, **Functions** are a core part of programming that allows us to run complex operations over and over without needing to write the code over and over again. **Arguments**, or values passed to a function, allow us to be more specific about how functions work.

🔔 **Question**: Look at the error below. From the error message, how many arguments does the function take?

In [None]:
len()

Below, we use a function name without parentheses. What does the output say? 

In [None]:
round

💡 **Tip**: This is referring to the stored function in memory. In order for the function to actually run, it must be called with `()`.

## Default Arguments

⚠️ **Warning**: A function without the proper number of arguments will give an error, which will give some information about what arguments you need for the function to be successful.

Luckily, some functions do not require you to enter a value for each argument. In these cases, it will use a **default argument** specified in the function.

*   For example, `round()` will round a number. It accepts two arguments: the number, and the number of decimal places to round off to.
*   By default, it rounds to a whole number.

In [None]:
round(3.712)

We can specify the number of decimal places we want:

In [None]:
round(3.712, 1)

The order of arguments matters if we do not specify the so-called **keywords**. Let's have a look at the documentation.

In [None]:
?round

The **keywords** are the parameter names in between the brackets before the `=` sign. In this case, these are `number` and `ndigits`.

We can't just reverse the order of the arguments in `round()`: this will result in an error.

In [None]:
# This works
round(3.000, 2)

In [None]:
# This doesn't
round(2, 3.000)

However, if we specify the **keywords** that we can find in the documentation, we can use any order we want.

In [None]:
round(ndigits=2, number=3.000)

⚠️ **Warning**: If you specify one keyword for one argument when calling the function, you need to specify the keywords for all arguments!

💡 **Tip:** You can use the Built-in Function `help` to Get Help for a Function. You can also use a question mark:

In [None]:
help(round)

In [None]:
round?

## Every Function Returns a Value

*   Every [function call](https://github.com/dlab-berkeley/Python-Fundamentals/blob/main/glossary.md) produces some result.
*   If the function doesn't have a useful result to return,
    it usually returns the special value `None`.
* Unless the goal of the function is to print results, you usually want to save the output so you can refer to it later

In [None]:
output = len('Getting the length of a sentence can be pretty useful!')
print('The length is', output)