# Python Intermediate: Control Flow and Functions

* * * 

<div class="alert alert-success">  
    
### Learning Objectives 
    
* Create conditions in a dataframe.
* Understand if-statements.
* Understand how arguments work in functions.
* Build your own functions.
    
</div>


### Icons Used In This Notebook
🔔 **Question**: A quick question to help you understand what's going on.<br>
🥊 **Challenge**: Interactive exercise. We'll work through these in the workshop!<br>
⚠️ **Warning**: Heads-up about tricky stuff or common mistakes.<br>
💡 **Tip**: How to do something a bit more efficiently or effectively.<br>
🎬 **Demo**: Showing off something more advanced – so you know what Python can be used for!<br>

### Sections
1. [Recap](#recap) 
2. [Conditionals](#cond)
3. [Writing Your Own Functions](#write)

<a id='loops'></a>

# Recap

**Variables** are names attached to particular values.
   * To create a variable, you assign it a value and then start using it.
   * Assignment is done with a single equals sign `=`.
   * When we write `n = 300`, we are assigning 300 to the variable `n` via the assignment operator `=`.

**Functions** perform actions on "things".
   * `print()` `len()`, and `type()`, are some of the most commonly used functions.
   * You can identify a function because of its trailing round parentheses.  

**Arguments** are the "things" we perform the action on within a function.
   * They can be variables, datasets, or even other functions!
   * Arguments go inside the trailing parentheses of functions when we call them.
   * Arguments are also called inputs or parameters.

**Methods** are type-specific functions.
   * Different data types and structures have functions that only apply to them.
   * For instance, strings have methods that only apply to them (lowercasing, uppercasing, etc.) that won't work with other data types.
   * Methods are accessed using dot notation – e.g. `some_string.lower()`.

Check out our [Python glossary](https://github.com/dlab-berkeley/Python-Fundamentals/blob/main/glossary.md) for definitions to other key vocabulary.

# This workshop
The best way to learn how to code is to do something useful, so this workshop is built around data analysis.

`pandas` is the most common package used in data analysis, with a focus on data manipulation and processing. We will work some more with `pandas` here, and work towards visualizing our data.

The data we will be using in this workshop comes from [Gapminder](https://www.gapminder.org/data/), an independent educational non-proﬁt ﬁghting global misconceptions. The dataset contains data for 142 countries, with values for life expectancy, GDP per capita, and population, every five years, from 1952 to 2007.

In [None]:
# Recall that pandas is frequently imported with the alias pd
import pandas as pd
import numpy as np

In [26]:
df = pd.read_csv('../data/gapminder-FiveYearData.csv')
df.head()

Unnamed: 0,country,year,pop,continent,lifeExp,gdpPercap
0,Afghanistan,1952,8425333.0,Asia,28.801,779.445314
1,Afghanistan,1957,9240934.0,Asia,30.332,820.85303
2,Afghanistan,1962,10267083.0,Asia,31.997,853.10071
3,Afghanistan,1967,11537966.0,Asia,34.02,836.197138
4,Afghanistan,1972,13079460.0,Asia,36.088,739.981106


🔔 **Question**: How many rows are in the data set?

## Aggregating Values with Loops

A common strategy in programs is to:
1.  Initialize an *accumulator* variable appropriate to the datatype of the output:
    * `int` : `0`
    * `str` : `''`
    * `list` : `[]`
2.  Update the variable with values from a collection through a for loop. Typical update operations are:
    * `int` : `+`
    * `str` : `+`
    * `list` : `.append()`
    
The result of this is a single list, number, or string with a summary value for the entire collection being looped over.

Returning to the tire pressure example, we can make a new list with all of the tire pressures rounded:

In [None]:
rounded_pressures = []

for pressure in tires: 
    rounded = round(pressure)
    rounded_pressures.append(rounded)

print('Rounded tire pressures:', rounded_pressures)

💡 **Tip**: Remember: indenting matters in Python! Jupyter automatically indents for you – but if you want to move multiple lines of code at once, you can select them and then hit `Control + ]` to indent them (move to the right), or `Control + [` to dedent them (move to the left). If you are on a Mac, use `Command` instead of `Control`.

## 🥊 Challenge 1: Aggregation Practice

Below are a few examples showing the different types of quantities you might aggregate using a for loop. These loops are partially filled out. Finish them and test that they work!

1. Find the total length of the strings in the given list. Store this quantity in a variable called `total`.

In [None]:
total = 0
words = ['red', 'green', 'blue']

for w in words:
    ... = ... + len(w)

print(total)

2. Find the length of each word in the list, and store these lengths in another list called `lengths`.

In [None]:
lengths = ...
words = ['red', 'green', 'blue']

for w in words:
    lengths....(...)

print(lengths)

3. Concatenate all words into a single string called `result`.

In [None]:
words = ['red', 'green', 'blue']
result = ...

for ... in ...:
    ...

print(result)

4. Create an acronym, as a single string, representing the list of words. Each part of the acronym should consist of the first letter of each word, capitalized. For example, your loop should output `"RGB"` for the input `["red", "green", "blue"]`. For this one, write the entire loop yourself!

In [None]:
words = ['red', 'green', 'blue']

# YOUR CODE HERE


💡 **Tip**: Python runs loops without showing you all the steps it takes. If you want to visualize all steps, check out [pythontutor.com](https://pythontutor.com/python-debugger.html#mode=edit). Try copy-pasting one of your answers in the last challenge!

<a id='cond'></a>

# Conditionals

**Booleans** are a fundamental data type in programming. Booleans are variables that are **binary**: they can either be `True` or `False` (written with capital letters).

Why do we use these? They're very useful for **control flow**: changing the course of a program depending on certain conditions. Booleans allow decision making in these contexts.

## Boolean Masks

A **boolean mask** allows you to use Booleans in data frames. It returns a `Series` object containing `True` and `False` values you can then use for other purposes. 

Let's use some boolean masks with different **comparison operators**. These are operators than are used to compare two values.

First, equality. This is signaled in Python (and many other languages) by the double equals sign `==`. It's distinct from the assignment operator (single equals sign `=`) used in variable assignment. 

In [39]:
df['country'] == 'Afghanistan'

0        True
1        True
2        True
3        True
4        True
        ...  
1699    False
1700    False
1701    False
1702    False
1703    False
Name: country, Length: 1704, dtype: bool

## 🥊 Challenge 2: Working With Comparison Operators 

We can also use `<` (smaller than), `>` (greater than), and `!=` (unequal to). 
Select the `gdpPercap` column and apply a boolean mask to select all values higher than 800.

In [40]:
# YOUR CODE HERE


0       False
1        True
2        True
3        True
4       False
        ...  
1699    False
1700    False
1701    False
1702    False
1703    False
Name: gdpPercap, Length: 1704, dtype: bool

Let's add this last `Series` as a column to our data frame. We can add a column by assigning a series to a new column name in bracket notation. 

In [41]:
df['gdpPercap_over_800'] = df['gdpPercap'] > 800
df

Unnamed: 0,country,year,pop,continent,lifeExp,gdpPercap,gdpPercap_over_800
0,Afghanistan,1952,8425333.0,Asia,28.801,779.445314,False
1,Afghanistan,1957,9240934.0,Asia,30.332,820.853030,True
2,Afghanistan,1962,10267083.0,Asia,31.997,853.100710,True
3,Afghanistan,1967,11537966.0,Asia,34.020,836.197138,True
4,Afghanistan,1972,13079460.0,Asia,36.088,739.981106,False
...,...,...,...,...,...,...,...
1699,Zimbabwe,1987,9216418.0,Africa,62.351,,False
1700,Zimbabwe,1992,10704340.0,Africa,60.377,693.420786,False
1701,Zimbabwe,1997,11404948.0,Africa,46.809,792.449960,False
1702,Zimbabwe,2002,11926563.0,Africa,39.989,672.038623,False


🔔 **Question**: Do you understand the code below?

In [42]:
sum(df['gdpPercap'] > 800) / len(df['gdpPercap'])

0.8039906103286385

💡 **Tip**: Python also has "logic operators" such as `and` and `or` than can be used to compare Boolean values with logic. See [here](https://www.w3schools.com/python/python_operators.asp) for a list of all operators!

## If-Statements

A fundamental structure in programming is the **conditional**. These blocks allow different blocks of code to run, *conditional* on specific things being true.

The most widely used conditional is the **if-statement**. An if-statement controls whether some block of code is executed or not. Its structure is similar to that of a for loop: 

*   The first line opens with the `if` keyword and contains a Boolean variable or expression. It ends with a colon. If the expression evaluates to `True`, the block of code will run.
*   The body, containing whatever code to execute if the condition is met, is indented.

So, if the Boolean expression is `True`, the body of an if-statement is run. If not, it's skipped. Let's look at an example:

In [None]:
number = 105

In [None]:
# Body is executed
if number > 100:
    print(number, 'is greater than 100.')

In [None]:
# Body is not executed
if number > 110:
    print(number, 'is greater than 110.')

## Conditionals and Loops

Conditionals are particularly useful when we're iterating through a list, and want to perform some operation only on specific components of that list that satisfy a certain condition.

🔔 **Question**: what will the output of the following code be?

In [None]:
numbers = [12, 20, 43, 88, 97, 100, 105, 110]

for number in numbers:
    if number > 100:
        print(number, 'is greater than 100.')

## Conditionals: Else-statements

Else-statements supplement if-statements. They allow us to specify an alternative block of code to run if the if-statement's conditional evaluates to `False`.

🔔 **Question**: What is the difference between the following cell and the previous if statement. How will that affect the output?

In [None]:
numbers = [12, 20, 43, 88, 97, 100, 105, 110]

for number in numbers:
    if number > 100:
        print(number, 'is greater than 100.')
    else:
        print(number, 'is less than or equal to 100.')

## Conditionals: Else-if Statements

We may want to check several conditionals at the same time. **Else-if (Elif-)** statements allow us to specify as many conditional checks as we'd like in the same block.

Elif-statements must follow an if-statement. They only are checked if the if-statement fails. Then, each elif-statement is checked, with their corresponding bodies run when the conditional evaluates to `True`.

An else statement at the end can act as a "catch all", when the if statement and all following else-if statements fail.

In Python, else if statements are indicated by the `elif` keyword. Consider the following conditional cell.

In [None]:
numbers = [12, 20, 43, 88, 97, 100, 105, 110]

for number in numbers:
    if number > 100:
        print(number, 'is greater than 100.')
    elif number > 50:
        print(number, 'is greater than 50.')
    elif number > 25:
        print(number, 'is greater than 25.')
    else:
        print(number, 'is less than or equal to 25.')

The order of the if and elif statements matters. When one if/elif statement is met, all following statements are skipped.  If there are multiple if statements, then each statement is evaluated separately. These kinds of errors won't give errors in the code, but they will give results that might not make sense, which can take longer to find and debug.

## 🥊 Challenge 3: Conditionals 

IDEA 1: 
We will be using the Gapminder dataset. Create a new empty list called `gdp_bin`. Next, write an if-elif-else statement that checks `gdpPercap`:

- If `gdpPercap` is >= 10000, add a 5 to `gdp_bin`.
- If `gdpPercap` is >= 7000, add a 4 to `gdp_bin`.
- If `gdpPercap` is >= 4000, add a 3 to `gdp_bin`.
- If `gdpPercap` is >= 1000, add a 2 to `gdp_bin`.
- Else, add a 1 to `gdp_bin`.

Add this list as a new column to the dataframe called `gdp_bin`.

In [52]:
# YOUR CODE HERE

gdp_bin = []

for number in df['gdpPercap']:
    if number >= 10000:
        gdp_bin.append(5)
    elif number >= 7000:
        gdp_bin.append(4)
    elif number >= 5000:
        gdp_bin.append(3)
    elif number >= 1000:
        gdp_bin.append(2)
    else:
        gdp_bin.append(1)

df['gdp_bin'] = gdp_bin

IDEA 2: Write a script that prompts the user to enter a country, and then prints the average `gdpPercap` for that country, if it exists in the dataframe. If the country is not found in the dataframe, the script should print an error message.

In [68]:
# YOUR CODE HERE

i = input("Enter a country")
if i.capitalize() in df.country.unique():
    print(df.gdpPercap.loc[df.country == i].mean())
else:
    print("Country not found")

Enter a country iiiiii


Country not found


<a id='func'></a>

# Functions and Arguments

Recall that arguments are information that goes into a function. The order of arguments matters if we do not specify the so-called **keywords**. For instance, let's see the documentation of the `round()` function:

In [43]:
?round

[0;31mSignature:[0m [0mround[0m[0;34m([0m[0mnumber[0m[0;34m,[0m [0mndigits[0m[0;34m=[0m[0;32mNone[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Round a number to a given precision in decimal digits.

The return value is an integer if ndigits is omitted or None.  Otherwise
the return value has the same type as the number.  ndigits may be negative.
[0;31mType:[0m      builtin_function_or_method


The **keywords** are the parameter names in between the brackets before the `=` sign. In this case, these are `number` and `ndigits`.

We can't just reverse the order of the arguments in `round()`: this will result in an error.

In [None]:
# This works
round(3.000, 2)

In [None]:
# This doesn't
round(2, 3.000)

However, if we specify the **keywords** that we can find in the documentation, we can use any order we want.

In [None]:
round(ndigits=2, number=3.000)

⚠️ **Warning**: If you specify one keyword for one argument when calling the function, you need to specify the keywords for all arguments!

<a id='write'></a>

# Write Your Own Functions

Remember, functions are pieces of code that we expect to use over and over again.

One of the most useful programming structures in Python is to write our own functions with a custom functionality that is specific to our goals.

## Basic Function Syntax

Writing a function in Python is pretty easy! You need to know a few things:

*   Functions begin with the keyword `def`.
*   This keyword is followed by the function *name*.
    *   The name must obey the same rules as variable names.
*   The **arguments** or **parameters** are defined in parentheses as variable names.
    *   Use empty parentheses if the function doesn't take any inputs.
*   A colon indicates the end of the function *signature* (the first line).
*   An indented block of code denotes the start of the *body*.
*   The final line should be a `return` statement with the value(s) to be returned from the function.

Let's take a look at a simple function:

In [None]:
def feet_to_meters(feet):
    meters = feet * .304
    return meters

Notice how there is **no output** from running the block of code above. This is because defining a function does not run it. The function needs to be **called**, or run, with appropriate arguments to execute the code it contains. 

Let's run this function. We can save the output to a variable and print the result.

In [None]:
meters = feet_to_meters(100)
print(meters)

## Variables and "Scope"

Note how we've used the name `meters` twice above: both within the function definition, and for the variable that takes the output of the function. What's going on here?

Arguments and variables created within the function **only exist within the scope of the function!** So `meters` within the function definition is a *different variable* than `meters` which now holds `30.4`.

In fancy words, the variable `meters` in the function definition only exists **within the scope of** that function definition. This is very important to remember!

## 🥊 Challenge 4: My First Function

Write a function that converts Celsius temperatures to Fahrenheit. The formula for this conversion is:

$$F = 1.8 * C + 32$$

You can name this function whatever you want. But it makes sense to name it something sensible!

In [None]:
def ...:
    # YOUR CODE HERE
    return ...

## 🥊 Challenge 5: DataFrame Function

IDEA 1: Let's say that in our Gapminder DataFrame, we want to create a function that selects only the rows where the `year` is greater than a certain value. The function:
1. takes in a `DataFrame` object with a keyword argument.
2. Selects only the rows where the `year` is greater than 2000.
3. returns a `DataFrame` with those values.

In [None]:
# YOUR CODE HERE



IDEA 2: Using `apply()` with if-statement. 

Create a function that integrates the if-elif-else statement you wrote above. Instead of putting values in a list, use `df.apply()`: 


In [None]:
def assign_bins(number):
    if number >= 10000:
        return 5
    elif number >= 7000:
        return 4
    elif number >= 5000:
        return 3
    elif number >= 1000:
        return 2
    else:
        return 1

# apply the function to each score in the dataframe
df['gdp_bin'] = df['gdpPercap'].apply(assign_bins)


<div class="alert alert-success">

## ❗ Key points

* Booleans (`bool`) are binary variables: they can be either `True` or `False`.
* "Boolean masks" are used when we apply comparison operators such as `==` in Pandas; they allow us to retrieve data based on some condition. 
* `if` and `else` statements allow us to control whether parts of our code are being run.
* Writing a function in Python begins with the keyword `def`, followed by the function name, parameters in parentheses, and a colon.
* Functions end with a `return` statement: this is the output value of the function.
    
</div>