# Python Intermediate: Control Flow and Functions

* * * 

<div class="alert alert-success">  
    
### Learning Objectives 
    
* Understand how to handle conditions in Python and Pandas.
* Practice with arguments when calling functions.
* Writing your own functions, and applying them to a Dataframe.
    
</div>


### Icons Used in This Notebook
🔔 **Question**: A quick question to help you understand what's going on.<br>
🥊 **Challenge**: Interactive exercise. We'll work through these in the workshop!<br>
💡 **Tip**: How to do something a bit more efficiently or effectively.<br>
⚠️ **Warning:** Heads-up about tricky stuff or common mistakes.<br>

### Sections
1. [Recap](#recap) 
2. [Conditionals](#cond)
3. [Writing Your Own Functions](#write)

<a id='loops'></a>

# Recap

**Variables** are names attached to particular values.
   * To create a variable, you assign it a value and then start using it.
   * Assignment is done with a single equals sign `=`.
   * When we write `n = 300`, we are assigning 300 to the variable `n` via the assignment operator `=`.

**Functions** perform actions on "things".
   * `print()` `len()`, and `type()`, are some of the most commonly used functions.
   * You can identify a function because of its trailing round parentheses.  

**Arguments** are the "things" we perform the action on within a function.
   * They can be variables, datasets, or even other functions!
   * Arguments go inside the trailing parentheses of functions when we call them.
   * Arguments are also called inputs or parameters.

**Methods** are type-specific functions.
   * Different data types and structures have functions that only apply to them.
   * For instance, strings have methods that only apply to them (lowercasing, uppercasing, etc.) that won't work with other data types.
   * Methods are accessed using dot notation – e.g. `some_string.lower()`.

Check out our [Python glossary](https://github.com/dlab-berkeley/Python-Fundamentals/blob/main/glossary.md) for definitions to other key vocabulary.

# This Workshop
The best way to learn how to code is to do something useful, so this workshop is built around data analysis.

`pandas` is the most common package used in data analysis, with a focus on data manipulation and processing. We will work some more with `pandas` here, and work towards visualizing our data.

The data we will be using in this workshop comes from [Gapminder](https://www.gapminder.org/data/), an independent educational non-proﬁt ﬁghting global misconceptions. The dataset contains data for 142 countries, with values for life expectancy, GDP per capita, and population, every five years, from 1952 to 2007.

In [None]:
# Recall that pandas is frequently imported with the alias pd
import pandas as pd
import numpy as np

In [None]:
df = pd.read_csv('../data/gapminder_gni.csv')
df.head()

🔔 <span style="color:purple"> **Question**: How many rows are in the data set?</span>

<a id='cond'></a>

# Conditionals

**Booleans** are a fundamental data type in programming. Booleans are variables that are **binary**: they can either be `True` or `False` (written with capital letters).

Why do we use these? They're useful for **control flow**: changing the course of a program depending on certain conditions. Booleans allow decision making in these contexts.

## Boolean Masks

A **boolean mask** allows you to use Booleans in data frames. It returns a `Series` object containing `True` and `False` values you can then use for other purposes. 

Let's use some boolean masks with different **comparison operators**. These are operators than are used to compare two values.

First, equality. This is signaled in Python (and many other languages) by the double equals sign `==`. It's distinct from the assignment operator (single equals sign `=`) used in variable assignment. 

In [None]:
df['country'] == 'Afghanistan'

## 🥊 <span style="color:purple">Challenge 1: Working With Comparison Operators </span>

We can also use `<` (less than), `>` (greater than), and `!=` (unequal to). 
Select the `gdpPercap` column and apply a boolean mask to select all values higher than 800.

In [None]:
# YOUR CODE HERE


Let's add this last `Series` as a column to our data frame. We can add a column by assigning a series to a new column name in bracket notation. 

In [None]:
df['gdpPercap_over_800'] = df['gdpPercap'] > 800
df

🔔 **Question**: Do you understand the code below?

In [None]:
sum(df['gdpPercap'] > 800) / len(df['gdpPercap'])

💡 **Tip**: Python also has "logic operators" such as `and` and `or` than can be used to compare Boolean values with logic. See [here](https://www.w3schools.com/python/python_operators.asp) for a list of all operators!

## If-Statements

A fundamental structure in programming is the **conditional**. These blocks allow different blocks of code to run, *conditional* on specific things being true.

The most widely used conditional is the **if-statement**. An if-statement controls whether some block of code is executed or not. Its structure is similar to that of a for loop: 

*   The first line opens with the `if` keyword and contains a Boolean variable or expression. It ends with a colon. If the expression evaluates to `True`, the block of code will run.
*   The body, containing whatever code to execute if the condition is met, is indented.

So, if the Boolean expression is `True`, the body of an if-statement is run. If not, it's skipped. Let's look at an example:

In [None]:
number = 105

In [None]:
# Body is executed
if number > 100:
    print(number, 'is greater than 100.')

In [None]:
# Body is not executed
if number > 110:
    print(number, 'is greater than 110.')

## Conditionals and Loops

Conditionals are particularly useful when we're iterating through a list, and want to perform some operation only on specific components of that list that satisfy a certain condition.

🔔 **Question**: How would you explain the output of the following code?

In [None]:
numbers = [12, 20, 43, 88, 97, 100, 105, 110]

for number in numbers:
    if number > 100:
        print(number, 'is greater than 100.')

## Conditionals: Else-statements

Else-statements supplement if-statements. They allow us to specify an alternative block of code to run if the if-statement's conditional evaluates to `False`.

🔔 **Question**: Look at the difference between the following cell and the previous if-statement. How will this else-statement affect the output?

In [None]:
numbers = [12, 20, 43, 88, 97, 100, 105, 110]

for number in numbers:
    if number > 100:
        print(number, 'is greater than 100.')
    else:
        print(number, 'is less than or equal to 100.')

## Conditionals: Else-if Statements

We may want to check several conditionals at the same time. **Else-if (Elif-)** statements allow us to specify as many conditional checks as we'd like in the same block.

Elif-statements must follow an if-statement. They only are checked if the if-statement fails. Then, each elif-statement is checked, with their corresponding bodies run when the conditional evaluates to `True`.

An else statement at the end can act as a "catch all", when the if statement and all following else-if statements fail.

In Python, else if statements are indicated by the `elif` keyword. Consider the following conditional cell.

In [None]:
numbers = [12, 20, 43, 88, 97, 100, 105, 110]

for number in numbers:
    if number > 100:
        print(number, 'is greater than 100.')
    elif number > 50 and number <= 100:
        print(number, 'is greater than 50 and less than or equal to 100.')
    elif number > 25 and number <= 50:
        print(number, 'is greater than 25 and less than or equal to 50.')
    else:
        print(number, 'is less than or equal to 25.')

🔔 **Question**: What is the `and` operator doing here?

The order of the if and elif statements matters. When one if/elif statement is met, all following statements are skipped.  If there are multiple if statements, then each statement is evaluated separately. These kinds of errors won't give errors in the code, but they will give results that might not make sense, which can take longer to find and debug.

<a id='func'></a>

# Functions and Arguments

Recall that arguments are information that goes into a function. The order of arguments matters if we do not specify the so-called **keywords**. For instance, let's see the documentation of the `round()` function:

In [None]:
?round

The **keywords** are the parameter names in between the brackets before the `=` sign. In this case, these are `number` and `ndigits`.

We can't just reverse the order of the arguments in `round()`: this will result in an error.

In [None]:
# This works
round(3.000, 2)

In [None]:
# This doesn't
round(2, 3.000)

However, if we specify the **keywords** that we can find in the documentation, we can use any order we want.

In [None]:
round(ndigits=2, number=3.000)

⚠️ **Warning**: If you specify one keyword for one argument when calling the function, you need to specify the keywords for all arguments!

<a id='write'></a>

# Writing Your Own Functions

Remember, functions are pieces of code that we expect to use over and over again.

One of the most useful programming structures in Python is to write our own functions with a custom functionality that is specific to our goals.

## Basic Function Syntax

Writing a function in Python is pretty easy! You need to know a few things:

*   Functions begin with the keyword `def`.
*   This keyword is followed by the function *name*.
    *   The name must obey the same rules as variable names.
*   The **arguments** or **parameters** are defined in parentheses as variable names.
    *   Use empty parentheses if the function doesn't take any inputs.
*   A colon indicates the end of the function *signature* (the first line).
*   An indented block of code denotes the start of the *body*.
*   The final line should be a `return` statement with the value(s) to be returned from the function.

Let's take a look at a simple function:

In [None]:
def feet_to_meters(feet):
    meters = feet * .304
    return meters

Notice how there is **no output** from running the block of code above. This is because defining a function does not run it. The function needs to be **called**, or run, with appropriate arguments to execute the code it contains. 

Let's run this function. We can save the output to a variable and print the result.

In [None]:
meters = feet_to_meters(100)
print(meters)

## Variables and "Scope"

Note how we've used the name `meters` twice above: both within the function definition, and for the variable that takes the output of the function. What's going on here?

Arguments and variables created within the function **only exist within the scope of the function!** So `meters` within the function definition is a *different variable* than `meters` which now holds `30.4`.

## 🥊 Challenge 2: My First Function

Write a function that converts Celsius temperatures to Fahrenheit. It takes in an argument, which is expected to be a temperature in Celcius. The formula for the conversion is:

$$F = 1.8 * C + 32$$

You can name this function whatever you want. But it makes sense to name it something sensible!

In [None]:
def ...:
    # YOUR CODE HERE
    return ...

## How to `apply()` a Function

The Pandas `.apply()` method allows you to apply a function over the axis of a dataframe. Here's an example: 

In [None]:
def add_10(x):
    return x + 10

df['gniPercap'].apply(add_10)

In the code above, we create a function that adds 10 to whatever comes into it.

🔔 <span style="color:purple"> **Question**: What happens when we `apply()` our function over the `gniPercap` column?</span>

## 🥊 Challenge 3: `apply()` a Function

Let's put everything we've learned together.

Say that we want to create a new column in our dataset that classifies our datapoints in terms of the level of development, as measured by per capita gross national income (GNI). [This UN document](https://www.un.org/en/development/desa/policy/wesp/wesp_current/2014wesp_country_classification.pdf) outlines some rules for this.

Here's what you need to do:

1. Start a function called `assign_level` that takes in one parameter, `i`.
2. Write an if-elif-else statement that checks `i`, based on the following rules:
    - If it is more than 12615, `return` the string `high-income`. 
    - If it is more than 4086 and lower or equal to 12615, `return` the string `upper middle income`. 
    - If it is more than 1035 and lower or equal to 4086, `return` the string `upper middle income`. 
    - If it is less than 1035, `return` the string `low-income`. 
    - Else, return `np.nan` (this is a NaN value).
3. Use `.apply()` on the `gniPercap` column, using your new `assign_level` function as the argument. Assign the output to a new column in our DataFrame, called `income_level`.

In [None]:
# YOUR CODE HERE


If you've done this correctly, the following code should produce a barplot of the different income levels in our data.

In [None]:
df['income_level'].value_counts().plot(kind='bar')

<div class="alert alert-success">

## ❗ Key Points

* Booleans (`bool`) are binary variables: they can be either `True` or `False`.
* "Boolean masks" are used when we apply comparison operators such as `==` in Pandas; they allow us to retrieve data based on some condition. 
* `if` and `else` statements allow us to control whether parts of our code are being run.
* Writing a function in Python begins with the keyword `def`, followed by the function name, parameters in parentheses, and a colon.
* Functions end with a `return` statement: this is the output value of the function.
* The `.apply()` method in Pandas allows you to apply a function over the axis of a DataFrame.
    
</div>