# Welcome to the Intermediate Python Workshop

## Loops beyond for and while

This notebooks will give you an introduction to looping with list comprehensions and numpy in Python, as well as some other tricks.
Here is a [beginners guide to For and While loops](https://www.youtube.com/watch?v=6iF8Xb7Z3wQ). We won't cover those here!

Eoghan O'Connell, Guck Division, MPL, 2023

In [None]:
# notebook metadata you can ignore!
info = {"topic": ["list comprehensions", "generator expressions", "timeit", "numpy", "tqdm"],
        "version" : "0.0.2"}

### How to use this notebook

- Click on a cell (each box is called a cell). Hit "shift+enter", this will run the cell!
- You can run the cells in any order!
- The output of runnable code is printed below the cell.
- Check out this [Jupyter Notebook Tutorial video](https://www.youtube.com/watch?v=HW29067qVWk).

See the help tab above for more information!


# What is in this Workshop?
In this notebook we cover:

- List comprehensions
   - Generator expressions
   - dict comprehensions

- When not to use loops (when you can use NumPy)
- TQDM progress bar

-----------
## List comprehensions

In Python we can loop over an iterable with a `for` or `while` loop.

We can also loop over lists with a special loop called a **list comprehesion**.

- There is a similar syntax used for generator expressions (which we can convert to tuple comprehensions).
- There is also dict comprehensions!

List comprehensions are compact and can be quite fast! We will compare their speed to other looping below.

### Example List comprehension

Imagine we have a list of numbers and want to square each value. We would usually use a `for` loop...

In [None]:
my_lst = [2, 3, 5]

my_sq_lst = []
for i in my_lst:
    my_sq_lst.append(i**2)

print(my_sq_lst)

In [None]:
# let's do this with a list comprehension

my_sq_lst = [i**2 for i in my_lst]

print(my_sq_lst)

In the list comprehension, you can read it as the `for i in my_lst` part first, then do `i**2` to every `i`.

It will return a new list.

Of course, we can add extra stuff which makes it slightly more complicated...

In [None]:
# using if else statements in a list comprehension.

my_sq_lst = [i**2 if i>2 else i-5 for i in my_lst]

print(my_sq_lst)

Let's break that down... How do we make sense of it???

Read it like this:
- `for i in my_lst`
- do `i**2` to every `i`
- then it feels like a normal if-else clause `if i>2 else i-5`

It can take a while to get used to this syntax. But it is super fun, compact and can be very fast.

Here is what it would look like as a normal for loop...

In [None]:
my_sq_lst = []
for i in my_lst:
    if i>2:
        i = i**2
    else:
        i = i-5
    my_sq_lst.append(i)

print(my_sq_lst)

So you can see that the list comprehension is much more compact!

Let's look at how fast list compreshensions are when compared to for loops...

*(you don't need to understand how this timing is happening)*

In [None]:
# speed of for loop
my_lst = list(range(10000))
my_sq_lst = []

%timeit for i in my_lst: my_sq_lst.append(i**2)

# speed of list comprehension
my_sq_lst = []
%timeit my_sq_lst = [i**2 for i in my_lst]

Wow, 15% faster (on my PC)! That's great!

We can also use similar sytax for generator expressions (which we can convert to tuples or lists easily if need be).


### Generator expressions

A generator is an iterable but really fast. It is fast because the generator doesn't hold all of its information in one place. It "generates" it on the fly when you need it.

Let's compare syntax and speed of a generator expression to a list comprehension.

**The timeit module will not save the variable, so looking at its type won't be reliable!**

In [None]:
# create a generator and list

my_generator = (i for i in range(10000))
my_lst = [i**2 for i in my_generator]

In [None]:
%%timeit

# use list in for loop
my_sq_lst = []
for i in my_lst: my_sq_lst.append(i**2)

In [None]:
%%timeit

# use generator in for loop
my_sq_lst = []
for i in my_generator: my_sq_lst.append(i**2)

In [None]:
%%timeit

# use range in for loop
my_sq_lst = []
for i in range(10000): my_sq_lst.append(i**2)

In [None]:
%%timeit

# use list in list comprehension
my_sq_lst = [i**2 for i in my_lst]

In [None]:
%%timeit

# use generator in list comprehension
my_sq_lst = [i**2 for i in my_generator]

In [None]:
%%timeit

# use range in list comprehension
my_sq_lst = [i**2 for i in range(10000)]

In [None]:
%%timeit

# use list in generator expression
my_sq_lst = (i**2 for i in my_lst)

In [None]:
%%timeit

# use generator in generator expression
my_sq_lst = (i**2 for i in my_generator)

In [None]:
%%timeit

# use range in generator expression
my_sq_lst = (i**2 for i in range(10000))

So it looks like using a **generator speeds up our looping by quite a lot**! Especially in a for loop.

Lesson learned, if you can use a generator to loop over an iterable, do! (However, there are always nuances with looping, so don't take this as gospel for all situations). Also, **USE PROFILING!**

### Dict comprehensions

The syntax for a dict comprehension is slightly more complicated than in a list comprehension because dict has both key and value!

Here is the syntax: `{key: value for (key, value) in iterable}`

**HOWEVER**, you might be able to use the `dict` constructor directly for your use-case. Let's look at them all with a little help from [Carl](https://www.exquisiteartz.co.uk/ekmps/shops/exquisiteartz/images/carl-sagan-quote-the-pale-blue-dot.-space-print-poster-canvas.-sizes-a1-a2-a3-a4-3813-p[ekm]1150x812[ekm].jpg)!


In [None]:
# I have two lists and want to make a dictionary. We can do this with a for loop or a dict comprehension

my_keys = ["name", "age", "appearance"]
my_values = ["Earth", 4.54e9, ["pale", "blue", "dot"]]

# use for loop
my_dict = {}
for key, value in zip(my_keys, my_values):
    my_dict[key] = value
print(f"for loop:           {my_dict}")


# dict comprehension
my_dict = {key:value for (key, value) in zip(my_keys, my_values)}
print(f"dict comprehension: {my_dict}")


# dict constructor called directly
my_dict = dict(zip(my_keys, my_values))
print(f"dict constructor:   {my_dict}")


In [None]:
# and the speeds...

print("\nfor loop")
my_dict = {}
%timeit for key, value in zip(my_keys, my_values): my_dict[key] = value

print("\ndict comprehension")
my_dict = {}
%timeit my_dict = {key:value for (key, value) in zip(my_keys, my_values)}

print("\ndict constructor")
my_dict = {}
%timeit my_dict = dict(zip(my_keys, my_values))


Looks like the `for` loop and dict constructor are very fast.
This is because in Python looping over and accessing a dict is very fast.

The `for` loop and the list comprehension add extra options. For example, you can manipuate the keys or values during dict creation.

## When not to use loops

Sometimes you don't need to use loops, or you might find that your code is slow because of many loops.

If you are working with numbers, arrays or matrixes, you can use numpy to do very fast matrix calculations.

**Example**

You have 5 images and want to find the maximum pixel value in each image.

- First we will do this with a `for` loop
- Second we will do this purely with `numpy`


In [None]:
# example images
import cv2
import random

my_image = cv2.imread("../../data/channel_example.png", cv2.IMREAD_GRAYSCALE) + random.randint(10, 30)

images = []
for i in range(5):  # pretend we have 5 images
    images.append(my_image)

print(len(images))

In [None]:
# get the maximum pixel value of each image

max_pixel = []
for im in images:
    max_pixel.append(im.sum())

print(max_pixel)

In [None]:
# use numpy to do this without a loop
import numpy as np

images_arr = np.array(images)  # wouldn't be needed if we loaded as a stack
max_pixel = images_arr.sum(axis=(1, 2))  # we have to "squish" two dimensions
print(max_pixel)

Let's see how fast they are...

In [None]:
# we make a large list/array for more relaistic timing

my_image = cv2.imread("../../data/channel_example.png", cv2.IMREAD_GRAYSCALE) + random.randint(10, 30)

images = []
for i in range(500):
    images.append(my_image)


print(f"For loop with sum():")
sum_pixel = []
%timeit for im in images: sum_pixel.append(sum(im))
    

print(f"\nFor loop with np.sum():")
sum_pixel = []
%timeit for im in images: sum_pixel.append(np.sum(im))
    
    
print(f"\nNumpy only:")
images_arr = np.array(images)
%timeit sum_pixel = np.sum(images_arr, axis=(1, 2))

    
print(f"\nList comprehension:")
%timeit sum_pixel = [sum(im) for im in images]


print(f"\nList comprehension with np.sum:")
%timeit sum_pixel = [im.sum() for im in images]


**Outcome**

We can see that the numpy implementation is about 20% ms faster for 500 images, but that the list comprehension is about as fast!

This **doesn't mean that list comprehensions or numpy should always be used. It just depends on your use-case!**

If you need to do many matrix or high-dimensional calculations, numpy is often very easy to use and fast.


So... there is no definitive rule??? Well, no, it involves some trial and error. This [answer on stack overflow is very good](https://stackoverflow.com/questions/41325427/numpy-ufuncs-speed-vs-for-loop-speed).

**"Don't blindly trust simplified statements and never optimize without profiling."**

-------------------------------------------------------------------------------------

## TQDM progress bar

One can add a nifty progress bar to any iterable in Python with the TQDM Python package. Just install with `pip install tqdm` and use as below...

Import with 
- `from tqdm import tqdm`
- `from tqdm.notebook import tqdm` when using a notebook

And just wrap the iterable with `tqdm`...

See documentation here: https://tqdm.github.io/

In [None]:
# Normally we do: from tqdm import tqdm
# but in notebooks we do
from tqdm.notebook import tqdm

In [None]:
# using tqdm with for loop

for i in tqdm(range(1_000_000)):
    _ = i**6


In [None]:
# use tqdm with list comprehensions

_ = [i**6 for i in tqdm(range(1_000_000))]

In [None]:
# we can't use tqdm with generators because they don't know their length! A generator only calls the "next" item...
# the calculation will be completed, but the progress bar is confused.

my_generator = (i for i in range(1_000_000))

_ = (i**6 for i in tqdm(my_generator))  # will NOT work