# Session 14

[![Open and Execute in Google Colaboratory](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/astrojuanlu/ie-mbd-python-data-analysis-i/blob/main/sessions/Session%2014.ipynb)

- `if` conditionals: controlling where the execution goes based on conditions
- Gotchas of conditionals and pandas DataFrames
- Nested conditionals
- Debugging step by step

## `if` conditionals

Conditionals are used to "branch" the code depending on a boolean value. If the condition is true, part of the code will execute:

In [None]:
age = 16
age

In [None]:
if age >= 18:
    print("You can drive")

# Nothing gets printed

Notice that the Python syntax mandates that the statement "inside" the conditional is indented:

In [None]:
if age >= 18:
print("You can drive")  # IndentationError

Similarly, to "exit" the conditional, you remove one level of indentation. Whatever is written outside the conditional will execute regardless of the condition:

In [None]:
if age >= 18:
    print("You can drive")
print("This gets printed regardless")

More branches can be added with `elif` (to check further conditions) and a final branch can be added with `else` (that will execute if no other branch got executed):

In [None]:
if age >= 18:
    print("You can drive")
elif age < 0:
    print("The age is invalid")
else:
    print("You cannot drive yet")
print("Out of the conditional")

Anything that evaluates to `True` or `False` can be used for a condition:

In [None]:
condition = (2 + 2 == 4) and ("hello" == "he" + "llo")
if condition:
    print("Addition and string concatenation work as expected")

## Exercises

### 1. Length check

Load the wildfires dataset. Write a conditional block that prints "Long dataset" if the number of rows is larger than 20 000, otherwise make it print "Short dataset".

In [None]:
WILDFIRES_URL = (
    "https://github.com/astrojuanlu/ie-mbd-python-data-analysis-i/raw/main/"
    "data/fires-subset.csv"
)

## Gotchas of conditionals and pandas DataFrames

We mentioned that "anything that evaluates to `True` or `False` can be used for a condition".

But a `Series` or `DataFrame` does not have a single `True` or `False` value:

In [None]:
import pandas as pd

In [None]:
if pd.Series([1, 2, 3, 4]) > 2:  # This fails!
    print("?")

As the error message say, in these situations you will probably need to pick `.any()`, `.all()`, or some other aggregation function.

In [None]:
if (pd.Series([1, 2, 3, 4]) > 2).any():
    print("Some values are larger than 2")

## Exercises

### 2. Presence of a value

Load the big landlords dataset. Write a conditional block that prints "Yes" if there are more than 1 rows with `Matriz = "Aena"`, make it print "No" otherwise.

In [None]:
BIG_LANDLORDS_URL = (
    "https://github.com/astrojuanlu/ie-mbd-python-data-analysis-i/raw/main/"
    "data/grandes-tenedores-madrid.csv"
)

## Nested conditionals

Conditionals can be arbitrarily nested according to our needs.

In [None]:
age = 16

if age < 0:
    print("The age is invalid")
else:
    print("Age is valid")
    if age >= 18:
        print("You can drive")
    else:
        print("You cannot drive yet")

print("Out of the conditional")

## Debugging step by step

To visualize the execution of a block of Python code line by line, use the [Jupyter debugger](https://jupyterlab.readthedocs.io/en/4.4.x/user/debugger.html).

![Jupyter debugger](../img/jupyter-debugger.gif)

## Exercises

### 3. Outlier strategy

Using the wildfires dataset, implement an automatic decision algorithm for outlier treatment strategy. For that, write a conditional block that follows this logic:

- If there are rows with a value for `superficie` above the 99th percentile:
   - If there are more than 10 such values: print "Cap at 99th percentile"
   - Otherwise: print "Remove outliers individually"
- If no outliers: print "No outliers detected"