# Welcome to the Intermediate Python Workshop

## Loops beyond for and while

This notebooks will give you an intermediate introduction to Loops in Python.
Here is a [beginners guide to For and While loops](https://www.youtube.com/watch?v=6iF8Xb7Z3wQ). We won't cover this here!

Eoghan O'Connell, Guck Division, MPL, 2023

In [1]:
# notebook metadata you can ignore!
info = {"topic": ["list comprehensions", "numpy", "tqdm"],
        "version" : "0.0.1"}

### How to use this notebook

- Click on a cell (each box is called a cell). Hit "shift+enter", this will run the cell!
- You can run the cells in any order!
- The output of runnable code is printed below the cell.
- Check out this [Jupyter Notebook Tutorial video](https://www.youtube.com/watch?v=HW29067qVWk).

See the help tab above for more information!


# What is in this Workshop?
In this notebook we cover:

- List comprehensions
   - Generator expressions
   - dict comprehensions

- When not to use loops
- TQDM progress bar

-----------
## List comprehensions

In Python we can loop over an iterable with a `for` or `while` loop.

We can also loop over lists with list comprehesions.

- There is a similar syntax used for generator expressions (which we can convert to tuple comprehensions).
- There is also dict comprehensions!

List comprehensions can be quite fast and look compact! We will see their speed below.

### Example List comprehension

Imagine we have a list of number and want to square each value. We would usually use a `for` loop...

In [2]:
my_lst = [2, 3, 5]

my_sq_lst = []
for i in my_lst:
    my_sq_lst.append(i**2)

print(my_sq_lst)

[4, 9, 25]


In [3]:
# let's do this with a list comprehension

my_sq_lst = [i**2 for i in my_lst]  # break this down!
print(my_sq_lst)

[4, 9, 25]


In the list comprehension, you can read it as the `for i in my_lst` part first, then do `i**2` to every `i`

Of course, we can add extra stuff which makes it slightly more complicated...

In [4]:
# using if else statements in a list comprehension

my_sq_lst = [i**2 if i>2 else i-5 for i in my_lst]  # break this down!
print(my_sq_lst)

[-3, 9, 25]


Let's break that down... How do we make sense of it???

Read it like this:
- `for i in my_lst`
- do `i**2` to every `i`
- then it feels like a normal if else clause `if i>3 else i-1`

It can take a while to get used to this syntax. But it is super fun, compact and can be very fast.

Let's look at how fast it is...

In [5]:
# speed of for loop vs. list comprehension
my_lst = list(range(10000))
my_sq_lst = []

%timeit for i in my_lst: my_sq_lst.append(i**2)

2.64 ms ± 240 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [6]:
my_lst = list(range(10000))
my_sq_lst = []

%timeit my_sq_lst = [i**2 for i in my_lst]

2.19 ms ± 74 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


Wow, 3 times faster! That's great!

We can also use similar sytax for generator expressions (which we can convert to tuples or lists easily if need be).


### Generator expressions

A generator is an iterable but really fast. It is fast because the generator doesn't hold all of its information in one place. It "generates" it on the fly when you need it.

Let's compare syntax and speed of a generator expression to a list comprehension.

In [7]:
# create generator and list

my_generator = (i for i in range(10000))
my_lst = [i for i in range(10000)]

In [8]:
# loop over the generator and list in different ways...

print("\nloop with list comprehension")
my_sq_lst1 = []
%timeit my_sq_lst1 = [i**2 for i in my_lst]  # loop with list comprehension

print("\nloop with generator within list compr")
my_sq_lst2 = []
%timeit my_sq_lst2 = [i**2 for i in my_generator]  # loop with generator within list compr

print("\nloop with generator within generator expression")
my_sq_lst3 = []
%timeit my_sq_lst3 = (i**2 for i in my_generator)  # loop with generator within generator expression

print("\nloop with generator in a for loop")
my_sq_lst4 = []
%timeit for i in my_generator: my_sq_lst4.append(i**2)  # loop with generator in a for loop

assert my_sq_lst1 == my_sq_lst2 == my_sq_lst3 == my_sq_lst4  # make sure they are all the same result


loop with list comprehension
2.16 ms ± 67.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

loop with generator within list compr
104 ns ± 3.91 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

loop with generator within generator expression
184 ns ± 10.6 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

loop with generator in a for loop
18.6 ns ± 0.408 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)


So it looks like using a generator speeds up our looping by quite a lot!

Lesson learned, if you can use a generator to loop over an iterable, do!

### Dict comprehensions

The syntax for a dict comprehension is slightly more complicated than in a list comprehension because have both key and value!
Here is the syntax: `{key: value for (key, value) in iterable}`

**HOWEVER**, you might be able to use the `dict` constructor directly for your use-case. Let's look at them all!


In [9]:
# I have two lists and want to make a dictionary. We can do this with a for loop or a dict comprehension

my_keys = ["name", "age", "appearance"]
my_values = ["Earth", 4.54e9, ["tiny", "blue", "dot"]]

# use a for loop
my_dict = {}
for key, value in zip(my_keys, my_values):
    my_dict[key] = value
print(f"for loop:           {my_dict}")


# dict comprehension
my_dict = {key:value for (key, value) in zip(my_keys, my_values)}
print(f"dict comprehension: {my_dict}")


# dict constructor called directly
my_dict = dict(zip(my_keys, my_values))
print(f"dict constructor:   {my_dict}")


for loop:           {'name': 'Earth', 'age': 4540000000.0, 'appearance': ['tiny', 'blue', 'dot']}
dict comprehension: {'name': 'Earth', 'age': 4540000000.0, 'appearance': ['tiny', 'blue', 'dot']}
dict constructor:   {'name': 'Earth', 'age': 4540000000.0, 'appearance': ['tiny', 'blue', 'dot']}


In [10]:
# and the speeds...

print("\nfor loop")
my_dict = {}
%timeit for key, value in zip(my_keys, my_values): my_dict[key] = value

print("\ndict comprehension")
my_dict = {}
%timeit my_dict = {key:value for (key, value) in zip(my_keys, my_values)}

print("\ndict constructor")
my_dict = {}
%timeit my_dict = dict(zip(my_keys, my_values))



for loop
258 ns ± 2.05 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

dict comprehension
360 ns ± 2.65 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

dict constructor
285 ns ± 11.9 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)


Looks like the for loop and dict constructor are very fast.
This is because in Python looping over and accessing a dict is very fast.

The for loop and the list comprehension add extra options. For example, you can manipuate the keys or values during dict creation.

## When not to use loops

Sometimes you don't need to use loops, or you might find that your code is slow because of many loops.

If you are working with lists with numbers, arrays or matrixes, you can use numpy to very fast matrix calculations.

**Example**

You have 5 images and want to find the maximum pixel value in each image.

- First we will do this with a for loop
- Second we will do this purely with numpy


In [11]:
# example images
import cv2
import random

my_image = cv2.imread("../../data/channel_example.png", cv2.IMREAD_GRAYSCALE) + random.randint(10, 30)

images = []
for i in range(5):  # pretend we have 5 images
    images.append(my_image)

print(len(images))

5


In [12]:
# get the maximum pixel value of each image

max_pixel = []
for im in images:
    max_pixel.append(im.sum())

print(max_pixel)

[12326509, 12326509, 12326509, 12326509, 12326509]


In [13]:
# use numpy to do this without a loop
import numpy as np

images_arr = np.array(images)  # wouldn't be needed if we loaded as a stack
max_pixel = images_arr.sum(axis=(1, 2))  # we have to "squish" two dimensions
print(max_pixel)

[12326509 12326509 12326509 12326509 12326509]


Let's see how fast they are...

In [14]:
# we make a large list/array for more relaistic timing

my_image = cv2.imread("../../data/channel_example.png", cv2.IMREAD_GRAYSCALE) + random.randint(10, 30)

images = []
for i in range(500):
    images.append(my_image)


print(f"For loop with sum():")
sum_pixel = []
%timeit for im in images: sum_pixel.append(sum(im))
    

print(f"\nFor loop with np.sum():")
sum_pixel = []
%timeit for im in images: sum_pixel.append(np.sum(im))
    
    
print(f"\nNumpy only:")
images_arr = np.array(images)
%timeit sum_pixel = np.sum(images_arr, axis=(1, 2))

    
print(f"\nList comprehension:")
%timeit sum_pixel = [sum(im) for im in images]


print(f"\nList comprehension with np.sum:")
%timeit sum_pixel = [im.sum() for im in images]


For loop with sum():
44.2 ms ± 863 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

For loop with np.sum():
23.6 ms ± 436 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Numpy only:
21.9 ms ± 174 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

List comprehension:
43.7 ms ± 1.01 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

List comprehension with np.sum:
22.4 ms ± 137 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


**Outcome**

We can see that the numpy implementation is about 20% ms faster for 500 images, but that the list comprehension is about as fast!

This **doesn't mean that list comprehensions or numpy should always be used. It just depends on your use-case!**

If you need to do many matrix or high-dimensional calculations, numpy is often very easy to use and fast.


So... there is no definitive rule??? Well, no, it involves some trial and error. This [answer on stack overflow is very good](https://stackoverflow.com/questions/41325427/numpy-ufuncs-speed-vs-for-loop-speed).


## TQDM progress bar

One can add a nifty progress to any iterable in Python with the TQDM Python package. Just install with `pip install tqdm` and use as below...

Import with 
- `from tqdm import tqdm`
- `from tqdm.notebook import tqdm` when using a notebook

See documentation here: https://tqdm.github.io/

In [15]:
# Normally we do: from tqdm import tqdm
# but in notebooks we do
from tqdm.notebook import tqdm

In [16]:
# using tqdm with for loop

for i in tqdm(range(1_000_000)):
    _ = i**6


  0%|          | 0/1000000 [00:00<?, ?it/s]

In [17]:
# use tqdm with list comprehensions

_ = [i**6 for i in tqdm(range(1_000_000))]

  0%|          | 0/1000000 [00:00<?, ?it/s]

In [18]:
# we can't use tqdm well with generators because they don't know their length! A generator only calls the "next" item...

my_generator = (i for i in range(1_000_000))

_ = (i**6 for i in tqdm(my_generator))

0it [00:00, ?it/s]