# 06: Loops

**Forfatter:** Benedikt Goodman \
**Medhjelpere:** Mistral Large, ChatGPT-4

Hovedpoenget med å lage programmer er ofte å automatisere noe. Vi er gjerne ute etter at programmet gjør mest mulig for oss, med minst mulig effort. Det er litt sånn når man programmerer også. Jo mer konsis koden din er, jo lettere er den å forstå og vedlikeholde. Du slipper også å skrive så mye kode. I tillegg til enkapsulering og bruk av vektoriserte operasjoner så er loops et veldig godt verktøy for å oppnå dette. Hvorfor gjøre samme operasjon på 10 datasett manuelt når du kan gjøre det på et par linjer med en loop, liksom? Før vi dundrer i gang med en leksjon, her er litt mer konseptuell info om hvor og når man ofte bruker loops innen programmering (også i python).


## A brief introduction to loops


Loops are a fundamental programming construct in Python that allow us to execute a block of code repeatedly based on a certain condition or a specified number of times. Loops are an essential tool for processing collections of data, such as lists or strings, and for automating repetitive tasks. Here are some common use cases for loops:


1. Iterating over collections of data, such as lists, tuples, and dictionaries
2. Automating repetitive tasks, such as data entry or file processing
3. Generating sequences of values, such as numbers or dates
4. Searching for specific values or patterns in data
5. Implementing algorithms that require repeated calculations or transformations
6. **Anywhere you want to repeatedly apply a function or an object**


In Python, there are two main types of regular loops: `for` loops and `while` loops. Both types of loops can be used to iterate over objects, such as `lists`, `tuples`, and `dictionaries`, and to execute code repeatedly based on a condition.



### What do we mostly apply loops to?

We mostly tend to loop over inbuilt datatypes, as many of the other objects we manipulate (like `DataFrames` and `numpy.ndarrays`) tend to have methods which either do the looping for you, or make looping redundant because they apply vectorised operations instead.

- `str`: A sequence of characters enclosed in single, double, or triple quotes (i.e. a `docstring)`.
- `list`: A sequence of values enclosed in square brackets, where each value can be of any data type.
- `tuple`: A sequence of values enclosed in parentheses, where each value can be of any data type. Tuples are immutable, meaning that their values cannot be changed once they are created.
- `set`: A sequence of unique values enclosed in curly braces, where each value must be of an immutable data type. Sets are unordered and do not allow duplicate values.
- `dict`: A sequence of key-value pairs enclosed in curly braces, where each key must be of an immutable data type and each value can be of any data type. Dictionaries are unordered and allow duplicate values, but not duplicate keys.
- `range`: A sequence of integers generated by the `range()` function, which can be used to generate a sequence of numbers for loops and other operations.
- `bytes` and `bytearray`: A sequence of bytes, which can be used to represent binary data. Byte sequences are typically used for low-level operations, such as network communication or file I/O. Rarely used for the type of programming we do at SSB.


**NB:** *Loops must be applied to objects that are iterable. This means that the object we are looping over needs to contain the `__iter__`method. Most of the time we apply loops to objects that already contain this method and as such it is not something we need to cornern ourselves with. But it explains why we can loop through `strings`, `lists`, `tuples`, `arrays` and so forth, but not atomic types like `floats` and `ints`.*

And lastly, the reason that looping through container object is such an intergral part of programming is that we can put whatever we like in these container objects. This is a *really* powerful feature that will make for concise code and less work on your part :)


In [6]:
help(list.__iter__)

Help on wrapper_descriptor:

__iter__(self, /)
    Implement iter(self).



In [7]:
import pandas as pd
import numpy as np

# Any object with the __iter__ method is iterable and can as such be looped through
{str(item): '__iter__' in dir(item) for item in (tuple, list, dict, np.ndarray, pd.Series, pd.DataFrame, str, int, float)}


{"<class 'tuple'>": True,
 "<class 'list'>": True,
 "<class 'dict'>": True,
 "<class 'numpy.ndarray'>": True,
 "<class 'pandas.core.series.Series'>": True,
 "<class 'pandas.core.frame.DataFrame'>": True,
 "<class 'str'>": True,
 "<class 'int'>": False,
 "<class 'float'>": False}

In [8]:
# Will work
text = "a"
for t in text:
    print(t)


a


In [9]:
# Will not work
number = 1
for num in number:
    print(num)


TypeError: 'int' object is not iterable

## `for` loops

For our purposes in SSB, these are perhaps the most handy loop types to use. `for` loops are used to iterate over a sequence or other iterable object. The syntax for a `for` loop is as follows:

```python
for variable in sequence:
    # code to be executed
```


In [14]:
# A list is a sequence
my_list = [1, 2, 3, 4, 5]

# Here we'll simply iterate through a list of number
for kamelåså in my_list:
    print('ææææææææææææææ')
    

ææææææææææææææ
ææææææææææææææ
ææææææææææææææ
ææææææææææææææ
ææææææææææææææ


In [12]:
new_item

6

### Looping with nested sequences

A tuple is also a sequence, even if it is nested. The same applies to lists or any other container which is also a sequence and nestable.

In [15]:
my_tuple = (('Eirik: Magnus, du er sabla dyktig.'), ('Magnus: Bløsj, du er god med meg.'))

# Note how we can call the item we extract from the sequence anything as long as we are consistent with the names
for whatever in my_tuple:
    print(whatever)

Eirik: Magnus, du er sabla dyktig.
Magnus: Bløsj, du er god med meg.


If we wanted to loop through every item inside the inner tuples we could do so with a nested loop. Note how we must refer to the temporary variable in the outer loops when we make nested loops. The syntax works like this:

```python
for inner_sequence in sequence:
    for variable in inner_sequence:
        for item in variable:
            for i in item:
                # and so forth...
```

*Sidenote*: 'It is good practice to name what you are looping through descriptively. The above would be much more informative than the following:

```python
for i in sequence:
    for j in i:
        for k in j:
            for l in k:
                for m, n in l:
                    # and so forth...
```

In [16]:
# How it works in action
for inner_tuple in my_tuple:
    for letter in inner_tuple:
        print(letter)
        

E
i
r
i
k
:
 
M
a
g
n
u
s
,
 
d
u
 
e
r
 
s
a
b
l
a
 
d
y
k
t
i
g
.
M
a
g
n
u
s
:
 
B
l
ø
s
j
,
 
d
u
 
e
r
 
g
o
d
 
m
e
d
 
m
e
g
.


In [17]:
# A list is a sequence
my_list = [0, 1, 2, 3, 4]

# Normally we carry out operations in loops, and usually with a bit of logic added to it.
for item in my_list:
    # Makes a new item
    new_item = item + item
    if new_item >= 6:
        print(new_item)

6
8


In [18]:
str_list = ['a', 'b', 'c']


# We can also loop through things with enumerate()
for idx, item in enumerate(str_list):
    print(idx, item)
    

0 a
1 b
2 c


In [19]:

# It is handy to have access to the index of an iterable if you want to access several parts of the iterable at a given time
result = []

for idx, item in enumerate(my_list.copy()):
    
    # Notice how we can now extract whichever items we want using slicing
    item_1 = my_list[-idx + 3]
    item_2 = item
    
    if item_1 ** item_2 == 0:
        result.append(item_2**item_1)
        
    elif item_2 - item_1 > 0:
        result.append(item_2 + item_1)
        
    else:
        result.append(item_1 * item_2)
        
        
result
    

[0, 2, 3, 1, 16]

In [22]:
# Whats slicing ...its just accessing an item via its index number. This works for list-like objects (i.e. not dicts) which are indexed.
str_list[1]

'b'

### Looping with dictionaries

Looping with dictionaries is somewhat different to just looping through regular sequences as you need to clarify whether you want to iterate through the keys or the values ...or both in the dictionary.

The main difference between a dict and a regular sequence like a list or a tuple is that we need something to access both the keys and the values in the dict. For this we use the `.items()` method in order to access both the keys and the values in the dictionary.



In [23]:
# Lets make a simple dictionary
my_dict = {'a': 1, 'b': 2, 'c': 3}

In [24]:
# will not work
for key, value in my_dict:
    print(key, value)

ValueError: not enough values to unpack (expected 2, got 1)

In [29]:
# will work
for key, value in my_dict.items():
    print(key, value)

a 1
b 2
c 3


In [30]:
# Accessing just the keys
for key in my_dict.keys():
    print(key)

# Will also work, but its less clear that we are accessing the keys.
for key in my_dict:
    print(key)

a
b
c
a
b
c


In [31]:
# If we wanted to access just the values
for value in my_dict.values():
    print(value)

1
2
3


In [32]:
# An example with nested dicts
d = {'dict_1': {'a': 1}, 'dict_2': {'b': 2}}

for key, inner_dict in d.items():
    print('This is in the outer dict:', key, inner_dict)
    for inner_key, inner_value in inner_dict.items():
        print('This is in the inner dict:',inner_key, inner_value)

This is in the outer dict: dict_1 {'a': 1}
This is in the inner dict: a 1
This is in the outer dict: dict_2 {'b': 2}
This is in the inner dict: b 2


## `while` loops

We dont use this as much in the programming we are doing here at SSB. But... its still handy to know about. You should consider using a while-loop in the following scenarios:

1. **Indeterminate Iteration**: When you don't know in advance how many times you'll need to iterate, a while-loop is often a better choice. This loop continues executing as long as a condition remains true, making it suitable for situations where the loop must run until a certain state is reached or changed.

2. **Waiting for Events**: If your loop needs to keep running until a specific event occurs (like user input or a change in data received from a network), a while-loop is typically used because it can efficiently wait for the condition to change without iterating over a set sequence.

3. **Conditional Logic Based on Complex State**: Sometimes, the decision to continue looping depends on multiple conditions or a more complex state evaluation that doesn’t neatly map to a sequence of numbers or iterable items. In such cases, a while-loop can provide clearer and more direct control over when and why the loop should continue or terminate.

Below is an example of indeterminate iteration as we dont know exactly how many tries the user needs to guess the right number

In [33]:
import random

target_number = random.randint(1, 6)
guess = None

while guess != target_number:
    guess = int(input("Guess a number between 1 and 5: "))
    if guess < target_number:
        print("Too low!")
    elif guess > target_number:
        print("Too high!")

print("Congratulations, you guessed the number!")

Guess a number between 1 and 5:  1


Too low!


Guess a number between 1 and 5:  1


Too low!


Guess a number between 1 and 5:  1


Too low!


Guess a number between 1 and 5:  1


Too low!


Guess a number between 1 and 5:  1


Too low!


Guess a number between 1 and 5:  1


Too low!


Guess a number between 1 and 5:  1


Too low!


Guess a number between 1 and 5:  2


Too low!


Guess a number between 1 and 5:  3


Too low!


Guess a number between 1 and 5:  4


Too low!


Guess a number between 1 and 5:  5


Congratulations, you guessed the number!


## List comprehensions

A list comprehension in Python is a concise way to create lists. It is also a very handy way to loop through items. It's a single line of code that replaces what could otherwise require multiple lines of loops and conditional logic. List comprehensions provide a way to define and construct a list based on an existing iterable (like a list, tuple, string, etc.). The syntax includes an expression followed by a for clause, and optionally, if-elif-else clauses for filtering.

The basic syntax of a list-comprehension is:
```python
# With condition
[do_something for item in iterable if condition]

# Without condition
[do_something for item in iterable]
```

### Why are list-comps (and dict-comps) so powerful?

- **Simplicity and Readability**: It reduces the complexity of code that involves loops with a more readable and straightforward syntax. This makes the code easier to write and easier to understand at a glance.

- **Versatility**: You can perform complex tasks (like filtering and applying functions) within a single line of code, which can otherwise take multiple lines. This includes creating sublists, transforming list contents, and integrating conditional logic seamlessly.

In [34]:
numbers = [1, 2, 3, 4, 5]

# This will square all numbers
[n ** 2 for n in numbers]

[1, 4, 9, 16, 25]

In [35]:
# filter for even numbers
[n for n in numbers if n % 2 == 0]

[2, 4]

In [38]:
# Can also work with range items
[n for n in range(30) if n % 2 == 0]

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28]

We can also apply a function to a sequence in a list-comp. This is perhaps one of the handiest features.

In [39]:
def some_func(x):
    if not isinstance(x, (int, float)):
        return f'I can only do math on numbers, you gave me a {type(x)} and now I am sad :('
    else:
        return x ** x + 42

l = numbers.copy()
l.append('6')

# Becomes really concise :)
[some_func(n) for n in l]

[43,
 46,
 69,
 298,
 3167,
 "I can only do math on numbers, you gave me a <class 'str'> and now I am sad :("]

## Dict-comprehensions

Dict-comprehensions behave quite similarly to list-comprehensions and are useful in the same ways aside from that we now creating a dictionary of the results as opposed to a list.

The basic syntax is:

```python
# With condition from a dict
{key: value for key, value in dictionary.items() if condition}

# Without condition from a dict
{key: value for key, value in dictionary.items()}

```
Dict-comprehensions are a great way to filter existing dictionaries

Note that like when accessing items in dictionaries in a regular loop, we need to call the .items() method on the dict in order to access both the keys and the values in the dict.

In [40]:
# Dictionary of stock prices
prices = {'apple': 240, 'banana': 150, 'cherry': 180}

# Filter to include only items with price greater than 160
{key: value for key, value in prices.items() if value > 160}

{'apple': 240, 'cherry': 180}

They are also a great option to combine lists

In [47]:
# Lists for keys, values, and conditions
keys = ['a', 'b', 'c', 'd']
values = [1, 2, 3]
conditions = [True, False, True, False]

# Create dictionary where condition is True
{k: v for k, v, cond in zip(keys, values, conditions) if cond}

{'a': 1, 'c': 3}

But where i think they shine the most is where you need to batch-treat objects (like dataframes) but where you want to for example retain filename information through the process.

In [49]:
import pandas as pd
import numpy as np
import os

# Create a random dataframe with 100 rows and 5 columns
df1 = pd.DataFrame(np.random.randn(100, 5), columns=list('ABCDE'))

# Write the dataframe to a parquet file
df1.to_parquet('df1.parquet')

# Create another random dataframe with 50 rows and 3 columns
df2 = pd.DataFrame(np.random.randn(50, 3), columns=list('XYZ'))

# Write the dataframe to a parquet file
df2.to_parquet('df2.parquet')

# Create a third dataframe with some string data
df3 = pd.DataFrame({'name': ['Alice', 'Bob', 'Charlie', 'Dave'],
                    'age': [25, 30, 35, 40],
                    'city': ['New York', 'Los Angeles', 'Chicago', 'Houston']})

# Write the dataframe to a parquet file
df3.to_parquet('df3.parquet')

In [50]:
# Read all the dataframes in as dictionaries
file_dict = {filename: pd.read_parquet(filename) for filename in os.listdir() if '.parquet' in filename}

In [51]:
# We now have a dictionary where filenames are keys and the dataframes are values
file_dict.keys()

dict_keys(['df1.parquet', 'nyt1f_2020.parquet', 'df3.parquet', 'df2.parquet'])

Unnamed: 0,A,B,C,D,E
0,-0.980686,0.521604,-0.000412,0.109531,0.341736
1,-0.408090,-0.402243,0.175182,-1.611058,-0.551513
2,0.654587,0.973432,0.495205,-0.701070,-0.723417
3,0.806391,0.606994,0.212919,-0.902387,-1.791289
4,0.157832,0.275169,0.513370,-0.877116,-1.011929
...,...,...,...,...,...
95,2.024806,0.327327,1.779660,0.254995,-1.180320
96,-0.733216,-0.094099,0.895118,-0.255883,1.089144
97,-0.739327,0.420199,-1.007526,-0.245742,-0.146391
98,0.521399,0.720201,0.821715,0.783412,0.807434


In [53]:
# Select numerical columns, then calculate column sums
num_col_sums = {filename: df.select_dtypes('number').sum() for filename, df in file_dict.items()}

In [54]:
# Show results stored in dictionary
num_col_sums.get('df1.parquet')

A    -0.381915
B   -10.299051
C    13.105788
D     9.112410
E   -10.972707
dtype: float64