# Controlling Repetition with Iteration Statements

<a href="https://colab.research.google.com/github/bradleyboehmke/uc-bana-4080/blob/main/example-notebooks/16_iteration_statements.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook accompanies [this textbook chapter](https://bradleyboehmke.github.io/uc-bana-4080/17-iteration-statements.html) and allows you to run the code examples interactively.

## Prerequisites

In [None]:
import glob
import os
import random
import pandas as pd

## `for` loop

In [None]:
for number in range(10):
    print(number)

We can add multiple lines to our `for` loop; we just need to ensure that each line follows the same indentation patter:

In [None]:
for number in range(10):
    squared = number * number
    print(f'{number} squared = {squared}')

Rather than just print out some result, we can also assign the computation to an object. For example, say we wanted to assign the squared result in the previous `for` loop to a dictionary where the key is the original number and the value is the squared value.

In [None]:
squared_values = {}

for number in range(10):
    squared = number * number
    squared_values[number] = squared

squared_values

## Controlling sequences

There are two ways to control the progression of a loop:

* `continue`: terminates the current iteration and advances to the next.
* `break`: exits the entire for loop.

Both are used in conjunction with if statements. For example, this for loop will iterate for each element in `year`; however, when it gets to the element that equals the year of `covid` (2020) it will `break` out and end the for loop process.

In [None]:
# range will produce numbers starting at 2018 and up to but not include 2023
years = range(2018, 2023)
list(years)

In [None]:
covid = 2020

for year in years:
    if year == covid: break
    print(year)

In [None]:
for year in years:
    if year == covid: continue
    print(year)

## List comprehensions

List comprehensions offer a shorthand syntax for `for` loops and are very common in the Python community. Although a little odd at first, the way to think of list comprehensions is as a backward `for` loop where we state the expression first, and then the sequence.

In [None]:
squared_values = []
for number in range(5):
    squared = number * number
    squared_values.append(squared)

squared_values

A list comprehension allows us to condense this pattern to a single line:

In [None]:
squared_values = [number * number for number in range(5)]
squared_values

List comprehensions even allow us to add conditional statements. For example, here we use a conditional statement to skip even numbers:

In [None]:
squared_odd_values = [number * number for number in range(10) if number % 2 != 0]
squared_odd_values

For more complex conditional statements, or if the list comprehension gets a bit long, we can use multiple lines to make it easier to digest:

In [None]:
squared_certain_values = [
    number * number for number in range(10)
    if number % 2 != 0 and number != 5
    ]

squared_certain_values

There are other forms of comprehensions as well. For example, we can perform a dictionary comprehension where we follow the same patter; however, we use dict brackets (`{`) instead of list brackets (`[`):

In [None]:
squared_values_dict = {number: number*number for number in range(10)}
squared_values_dict

## `while` loop

For example, the probability of flipping 10 coins and getting all heads or tails is $(\frac{1}{2})^{10} = 0.0009765625$ (1 in 1024 tries). Let's implement this and see how many times it'll take to accomplish this feat.

The following `while` statement will check if the number of unique values for 10 flips are 1, which implies that we flipped all heads or tails. If it is not equal to 1 then we repeat the process of flipping 10 coins and incrementing the number of tries. When our condition statement `ten_of_a_kind == True` then our while loop will stop.

In [None]:
# create a coin
coin = ['heads', 'tails']

# we'll use this to track how many tries it takes to get 10 heads or 10 tails
n_tries = 0

# signals if we got 10 heads or 10 tails
ten_of_a_kind = False

while not ten_of_a_kind:
    # flip coin 10 times
    ten_coin_flips = [random.choice(coin) for flip in range(11)]

    # check if there
    ten_of_a_kind = len(set(ten_coin_flips)) == 1

    # add iteration to counter
    n_tries += 1


print(f'After {n_tries} flips: {ten_coin_flips}')

## Iterables

Python strongly leverages the concept of _iterable objects_. An object is considered _iterable_ if it is either a physically stored sequence, or an object that produces one result at a time in the context of an interation tool like a `for` loop. Up to this point, our example looping structures have primarily iterated over a DataFrame or a list.

When our `for` loop iterates over a DataFrame, underneath the hood it is first accessing the iterable object, and then iterating over each item. As the following illustrates, the default iterable components of a DataFrame are the columns:

In [None]:
df = pd.DataFrame({'col1': [1, 2, 3], 'col2': [3, 4, 5], 'col3': [6, 6, 6]})

I = df.__iter__() # access iterable object
print(next(I))    # first iteration
print(next(I))    # second iteration
print(next(I))    # third iteration

When our `for` loop iterates over a list, the same procedure unfolds. Note that when no more items are available to iterate over, a `StopIteration` is thrown which signals to our `for` loop that no more itertions should be performed.

In [None]:
names = ['Robert', 'Sandy', 'John', 'Patrick']

I = names.__iter__() # access iterable object
print(next(I))       # first iteration
print(next(I))       # second iteration
print(next(I))       # third iteration
print(next(I))       # fourth iteration
print(next(I))       # no more items

Dictionaries and tuples are also iterable objects. Iterating over dictionary automatically returns one key at a time, which allows us to have the key and index for that key at the same time:

In [None]:
D = {'a':1, 'b':2, 'c':3}

I = D.__iter__()  # access iterable object
print(next(I))    # first iteration
print(next(I))    # second iteration
print(next(I))    # third iteration

In [None]:
for key in D:
    print(key, D[key])

Although using these iterables in a for loop is quite common, you will often see two other approaches which include the iterables `range()` and `enumerate()`. range is often used to generate indexes in a for loop but you can use it anywhere you need a series of integers. However, range is an iterable that generates items on demand:

In [None]:
values = range(5)

I = values.__iter__()
print(next(I))
print(next(I))
print(next(I))

So if you wanted to iterate over each column in our DataFrame, an alternative is to use range. In this example, range produces the numeric index for each column so we simply use that value to index for the column within the for loop:

In [None]:
unique_values = []
for col in range(len(df.columns)):
  value = df.iloc[:, col].nunique()
  unique_values.append(value)

unique_values

Another common iterator you will see is `enumerate`. Actually, the `enumerate` function returns a **generator object**, which also supports this iterator concept. The benefit of `enumerate` is that it returns a (index, value) tuple each time through the loop:

In [None]:
E = enumerate(df) # access iterable object
print(next(E))    # first iteration
print(next(E))    # second iteration
print(next(E))    # third iteration

The `for` loop steps through these tuples automatically and allows us to unpack their values with tuple assignment in the header of the `for` loop. In the following example, we unpack the tuples into the variables `index` and `col` and we can now use both of these values however necessary in a for loop.

In [None]:
for index, col in enumerate(df):
    print(f'{index} - {col}')

## Exercise: Practicing Looping and Iteration Patterns

In this exercise set, you’ll practice using `for` loops, `while` loops, conditional logic, and comprehensions. These tasks will help you build fluency with the iteration patterns that show up frequently in data wrangling and automation tasks.a

## 1. Filter Capitalized Names with a Comprehension

Use the list of names below to write a **list comprehension** that returns only the values that start with a capital letter (i.e. a "title case" word).

In [None]:
python
names = ['Steve Irwin', 'koala', 'kangaroo', 'Australia', 'Sydney', 'desert']

*Hint: Try using the `.istitle()` method.*

Which names are included in the result?

## 2. Generate the Fibonacci Sequence

The **Fibonacci Sequence** starts with the numbers 0 and 1, and each subsequent number is the sum of the two previous numbers. For example:
`[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, ...]`

Write a `for` loop that generates the **first 25 Fibonacci numbers** and stores them in a list.

## 3. Sum with Conditional Skip

Write a `for` loop that computes the sum of all numbers from 0 through 100, **excluding** the numbers in the list below:

In [None]:
python
skip_these_numbers = [8, 29, 43, 68, 98]

Tip: Use a `continue` statement to skip over those values. What is the resulting sum?