# CDCS Summer School
# A Gentle Introduction to Coding for Data Analysis
## Session 9: All FOR One, One FOR All

---------------

### Learning objectives for this session:

At the end of this notebook you will know:

1. What a 'FOR' loop is and why we use it.
2. The need for a 'WHILE' style loop.
3. How to implement control statements like 'BREAK' and 'CONTINUE' within loops.
4. An introduction to the package 'itertools'.

--------

## 1. What is a 'FOR' loop?

The `for` loop in Python is a control flow statement that is used to repeat a block of code a certain number of times or over a sequence. This loop is particularly useful when you know in advance how many times you want to execute a block of code.

In the example below, we iterate through a list of coffee orders. Each order is a dictionary that might include the type of coffee, the kind of milk used, and how many sugars are added.

In [None]:
coffee_orders = [
    {"type": "Latte", "milk": "Cow", "sugar": 1},
    {"type": "Espresso", "milk": "None"},
    {"type": "Cappuccino", "milk": "Almond", "sugar": 0}
]

for order in coffee_orders:
    print(order)

When accessing dictionary keys that might not exist in every element (such as 'milk' or 'sugar' in our coffee orders), it's useful to handle potential errors or defaults gracefully. Python's `.get()` method is excellent for this because it allows you to specify a default value if the key is missing.

In [None]:
# Here, we extract and print specific details from each coffee order.
for order in coffee_orders:
    coffee_type = order['type']
    milk_type = order.get('milk', 'No milk')  # Using .get() to provide a default value if 'milk' key is missing
    print(f"Order: {coffee_type}, Milk: {milk_type}")

This loop will print each order's type and milk preference. If an order does not specify the 'milk' key, it defaults to "No milk".

Next, this segment demonstrates how to preprocess our data by adding default values directly to our dictionaries when certain keys are missing. In this case, we add a default value for 'sugar' if it is not already included in the order. This preprocessing step is vital for ensuring data consistency before performing further operations or analyses.

In [None]:
# Suppose we want to add a default number of sugars if none is specified.
for order in coffee_orders:
    if 'sugar' not in order:
        order['sugar'] = 1  # Adding a default sugar value

# Print the updated list to see the changes.
for order in coffee_orders:
    print(f"Order: {order['type']}, Sugar: {order['sugar']}")

If we are dealing with more complex nested structures, like lists within dictionaries, 'for' loops are indispensable. This allows us to extract data from complex structures based on some conditions.

In this code block, we iterate through a list of dictionaries where each dictionary represents data about a penguin. The loop prints selected information for each penguin, demonstrating how to access multiple fields from dictionaries within a `for` loop.

In [None]:
penguin_data = [
    {"species": "Adelie", "island": "Torgersen", "bill_length_mm": 38.9, "body_mass_g": 3750},
    {"species": "Chinstrap", "island": "Dream", "bill_length_mm": 48.7, "body_mass_g": 3550},
    {"species": "Gentoo", "island": "Biscoe", "bill_length_mm": 47.2, "body_mass_g": 4950}
]

# Displaying basic information about each penguin
for penguin in penguin_data:
    print(f"Species: {penguin['species']}, Island: {penguin['island']}, Body Mass: {penguin['body_mass_g']} grams")

In [None]:
# Calculating the average bill length and body mass of the penguins.
total_bill_length = 0
total_body_mass = 0
count = 0

for penguin in penguin_data:
    total_bill_length += penguin['bill_length_mm']
    total_body_mass += penguin['body_mass_g']
    count += 1

average_bill_length = total_bill_length / count
average_body_mass = total_body_mass / count

print(f"Average Bill Length: {average_bill_length:.2f} mm")
print(f"Average Body Mass: {average_body_mass:.2f} g")

In [None]:
# Using 'for' loops to filter penguins by species and print their details.
print("Details of Gentoo Penguins:")
for penguin in penguin_data:
    if penguin['species'] == "Gentoo":
        print(f"Island: {penguin['island']}, Bill Length: {penguin['bill_length_mm']} mm, Body Mass: {penguin['body_mass_g']} g")

In [None]:
# Safely accessing a key that might not exist in every dictionary.
print("Penguin Sex Information:")
for penguin in penguin_data:
    # Assume 'sex' key may not be present in all records
    sex = penguin.get('sex', 'Unknown')
    print(f"Species: {penguin['species']}, Sex: {sex}")

It is important to ensure that you understand the anatomy of function.

`for`: The keyword that initiates the loop.

`variable`: A temporary variable that holds the value of the item in the sequence during each iteration.

`in`: A keyword that is used before the sequence being iterated over.

`sequence`: The sequence to iterate over (list, dictionary, tuple, etc.).

`:`: The colon that ends the for line.

Note that everything within the 'for' loop is then indented. This is also a key part of the syntax.

----

## 2. 'While' style loops.

Sometimes we might not have a sequence to iterate through. Instead we may want to keep doing something until a condition is met (or indeed broken). For this we introduce a 'while' style loop.

In [None]:
user_response = ''
while user_response.lower() != 'no':
    user_response = input("Do you want to continue? (yes/no): ")

In [None]:
# Suppose we collect penguin data in the field and continue until no more penguins are found.
penguin_count = 0
more_penguins = 'yes'
while more_penguins.lower() == 'yes':
    penguin_count += 1
    more_penguins = input("Did you find another penguin? (yes/no): ")

print(f"Total penguins counted: {penguin_count}")

In [None]:
# Monitoring the habitat temperature until it falls outside an acceptable range.
temperature = 15  # degrees Celsius
while 10 <= temperature <= 20:
    print(f"Current temperature is suitable for penguins: {temperature}°C")
    # Create a change in temperature
    temperature_change = float(input("Enter temperature change: "))
    temperature += temperature_change

print("Temperature is no longer suitable for penguins.")

We can start to get quite complex with the 'while' loop function. Consider the following example, set in the coffee shop setting as before. Here we have some constraints and want to keep making lattes until there's 'none left'.

In [None]:
coffee_orders = ['Latte', 'Espresso', 'Cappuccino', 'Mocha', 'Latte', 'Latte']
available_lattes = 2  # Only two Lattes can be served

index = 0
while index < len(coffee_orders) and available_lattes > 0:
    order = coffee_orders[index]
    if order == 'Latte':
        if available_lattes > 0:
            print(f"Serving {order}")
            available_lattes -= 1
        else:
            print("No more Lattes available.")
    else:
        print(f"Serving {order}")
    index += 1

----

## 3. 'Break' and 'Continue' within loops.

Sometimes we may not want a loop to continue forever, or to terminate after a certain condition is met. This is where the `break` and `continue` commands come in very useful within a loop.

**break**: Used to exit a loop prematurely. It is particularly useful in while loops when an external condition triggers the end of the loop, and in for loops when a specific condition within the loop suggests stopping all operations.

**continue**: Skips the current iteration of the loop and proceeds to the next cycle of the loop. This can be used to skip over certain data points or conditions that do not require the same handling as others.

```for item in collection:
    if some_condition:
        break  # Exit the loop
    if another_condition:
        continue  # Skip to the next iteration```

In [None]:
coffee_orders = ['Latte', 'Espresso', 'Black Coffee', 'Latte', 'Mocha', 'Espresso']
available_espressos = 1

for order in coffee_orders:
    if order == 'Espresso':
        if available_espressos > 0:
            print(f"Serving {order}")
            available_espressos -= 1
        else:
            print("Out of Espresso. Cannot fulfill further orders.")
            break
    else:
        print(f"Serving {order}")

In [None]:
for order in coffee_orders:
    if order == 'Black Coffee':
        print("Skipping Black Coffee")
        continue
    print(f"Serving {order}")

In [None]:
coffee_temp = 70  # initial temperature
while True:
    print(f"Current coffee temperature: {coffee_temp} degrees")
    if coffee_temp >= 85:  # Optimal temperature for serving coffee
        print("Coffee is ready to serve!")
        break
    coffee_temp += 5  # simulate the heating process

In [None]:
coffee_orders = ['Latte', 'Decaf Espresso', 'Cappuccino', 'Decaf Mocha']
index = 0
while index < len(coffee_orders):
    order = coffee_orders[index]
    index += 1
    if "Decaf" in order:
        print(f"Skipping {order}")
        continue
    print(f"Processing {order}")

In [None]:
coffee_orders = [('Espresso', 2), ('Latte', 0), ('Black Coffee', 5), ('Espresso', 1)]
for beverage, stock in coffee_orders:
    while stock > 0:
        if beverage == 'Latte' and stock == 0:
            print("Out of Lattes. Skipping...")
            continue
        elif beverage == 'Espresso' and stock == 1:
            print("Only one Espresso left, processing last order now...")
            print("No more Espresso to serve after this.")
            break
        print(f"Serving {beverage}")
        stock -= 1

In [None]:
penguin_data = [
    {"species": "Adelie", "island": "Torgersen", "body_mass_g": 3750},
    {"species": "Gentoo", "island": "Biscoe", "body_mass_g": 4950},
    {"species": "Chinstrap", "island": "Dream", "body_mass_g": 3550}
]

for penguin in penguin_data:
    if penguin['species'] == 'Chinstrap':
        continue  # Skip all Chinstrap penguins
    print(f"Processing data for {penguin['species']} on {penguin['island']}")

----

## 4. Using 'itertools' with loops.

'itertools' is a package which is automatically installed with Python's standard utility modules and is used for creating and using iterators. It provides a collection of tools for handling iterators efficiently and succinctly --  helping make the Python code faster, more memory-efficient, and cleaner.

In [2]:
# Although the package is installed with Python by default, we still need to import it.
import itertools

In [None]:
orders_day1 = ['Espresso', 'Latte']
orders_day2 = ['Cappuccino', 'Mocha']
all_orders = itertools.chain(orders_day1, orders_day2)

for order in all_orders:
    print(order)

In [None]:
count = 0
for item in itertools.cycle(['yes', 'no']):
    if count > 5:  # prevent infinite loop for demonstration
        break
    print(item)
    count += 1

**Don't panic point!**

Below you will see some new notation, that is introduced by the `itertools` package. This is known as lambda notation. We use this, as a different form of shorthand for looping through a list. Have a look at the example below and see if you can work out what it is doing...

In [9]:
data = [('penguin', 'Adelie'), ('penguin', 'Chinstrap'), ('penguin', 'Adelie')]
for penguin_type, entry in itertools.groupby(sorted(data), lambda x: x[1]):  # Sort needed for meaningful grouping
    print(penguin_type, list(entry))

Adelie [('penguin', 'Adelie')]
Chinstrap [('penguin', 'Chinstrap')]
Adelie [('penguin', 'Adelie')]


The way to read the lambda notation is, for each object (in this case a list) that lambda can take, pick the second -- notated in Python as 1, since Python begins counting at 0 -- entry and apply the function to it, here `sorted()`.

In [16]:
for perm in itertools.permutations(['A', 'B', 'C'], 2):
    print(''.join(perm))

AB
AC
BA
BC
CA
CB


------

## ⭐️⭐️⭐️💥 What you learned in this session: Three stars and a wish.
**In your own words** write in the Markdown cell below:

- 3 things you would like to remember from this notebook.
- 1 thing you wish to understand better in the future or a question you'd like to ask.

*Add your reflections here.*

--------------

## Topic Overview

In [None]:
# For loops can be used to iterate over a collection of things in a list.
# Print each penguin species from a list
penguin_species = ['Adelie', 'Chinstrap', 'Gentoo']
for species in penguin_species:
    print(species)

In [None]:
# Alternatively we can use a while loop to repeat a processs until a condition is met (or breached)
# Keep serving coffee until all orders are processed
coffee_orders = ['Espresso', 'Latte']
index = 0
while index < len(coffee_orders):
    print(f"Serving {coffee_orders[index]}")
    index += 1

In [None]:
# It is possible to manipulate our loops to stop and continue based on some condition too.
# Skip serving 'Latte' and break after serving 'Espresso'
for order in ['Espresso', 'Latte', 'Mocha']:
    if order == 'Latte':
        continue
    print(f"Serving {order}")
    if order == 'Espresso':
        break

In [None]:
# itertools is a built in package which can aid looping.
import itertools
# Chain multiple lists of coffee orders together
orders_day1 = ['Espresso', 'Cappuccino']
orders_day2 = ['Latte', 'Flat White']
for order in itertools.chain(orders_day1, orders_day2):
    print(order)

----

# ⛏ Exercise: Group the penguins.

You are tasked with analyzing a dataset of penguin observations. Your goal is to group the penguins by species, then for each group, summarize the data, including counting members and identifying if any members exceed certain size thresholds.

- Use itertools.groupby() to group penguin data by species.
- Use a 'for' loop to process each group, applying continue to skip over any data entries that are incomplete or improperly formatted.
- Use nested loops to process each penguin in the groups, summarizing data.

In [None]:
penguins = [
    {'species': 'Adelie', 'bill_length_mm': 39.1, 'body_mass_g': 3750},
    {'species': 'Gentoo', 'bill_length_mm': 47.2, 'body_mass_g': 4975},
    {'species': 'Adelie', 'bill_length_mm': 38.5, 'body_mass_g': 3500},
    {'species': 'Gentoo', 'bill_length_mm': None, 'body_mass_g': 4500},  # Incomplete data
]

In [None]:
# try to solve the task here

# ⛏ Exercise: Dance Monkey

Earlier this week we saw functions put to use in recreating the penguin song. Legend has it that long before penguins were singing, monkeys were dancing. In this exercise investigate the lyrics of the song 'Dance Monkey' by Tones and I. With this song try to identify where using a loop may help in printing sections of the song. Maybe attempt to do this in multiple ways with both 'for' and 'while' loops.

In [None]:
# try to solve the task here