# **List Comprehension - a ```for``` loop in a single line of code**

[**Colab Version**](https://colab.research.google.com/drive/1kitdQhpUloBdrA4gCkF0UtOFIr4LLdnc?usp=sharing)

### Aim
* Introduce list comprehensions - compare to ```for``` loops
* Describe advantages of **list comprehensions**
* Describe syntax of **list comprehensions**
* **Exercises**

### Contents:
1. **Introduction to ```for``` loops:** - begginners' overview
2. **Introduction to ```list comprehensions```** - overview and syntax
3. **Exercises** - Convert ```for``` loops into  **list comprehensions**

# 1. **Introduction to ```for``` loops**

What can they do?
* ```for``` loops can be used to repeat a block of code $n$ times
* ```for``` loops can repeat an operation on each item in a set (e.g. ```list```, ```dictionary```, ```dataframe column```)

For example: ```for``` each ```.xlsx``` file in a folder: ```process_data``` ```plot_data```.

### Structure of a ```for``` loop:
```for``` loops have specific syntax and start like this:
```python
for i in x:
```
where ```x``` is your iterable set of objects and ```i``` is a temporary variable assigned to each item in each iteration. The end of line colon is essential. This statement is followed by an ***indented*** block of code that will repeat for each item ```i``` in ```x```. Below is an example that iterates over a list of files, check's if it's an excel file and then calls the functions ```process_data``` and ```plot_data``` on ```i```. 

```python
iterable = ['file1.xlsx', 'file1.xlsx', 'file3.xlsx', 'file4.xlsx', 'file5.doc']

for i in iterable: # colon is essential syntax
    # i is overwritten in each iteration
    # any name can be used, i is just convention
    # code block to be repeated is indented (tab or 4 spaces)
    if file_is_xlsx(i):
        # logic statements require further indentation
        process_data(i)
        plot_data(i)
    # code block ends when the indentation ends

# not in code block
print(i) # print last value of i - 'file4.xlsx'
```
Additional features can include:
* ```if``` and ```else``` statements - logical tests that control the flow of the program.
* ```break``` statements - ends the loop prematurely, often follows an ```if``` statement.
* Recursion - loops within loops 
* ```for i,j in zip(list1, list2):``` - loop over two iterables at once
* ```for idx, s in enumerate(list1):``` - start a counter ```idx``` that incraments in each  loop
* ```for i in list[::n]:``` - loop over every $n^{th}$ item in ```list1```

# 2. **Introduction to ```list comprehensions```** 

### **Purpose**
* List comprehensions create new lists by applying an operation to each item in an iterable using a ```for``` loop.

### Example
We'll compare using both a ```for``` loop and a **list comprehension** to acheive the same task: apply $y = x^2 + 2x +5$ to numbers 1 to 9:

In [1]:
iterable = [1,2,3,4,5,6,7,8,9]

def f(x):
    return x**2 + 2*x + 5

new_list = [] 
for x in iterable:
    new_list.append(f(x)) # add y to list
    
print(new_list)

# using list comprehension
new_list_2 = [f(x) for x in iterable] # more concise

print(new_list_2)

[8, 13, 20, 29, 40, 53, 68, 85, 104]
[8, 13, 20, 29, 40, 53, 68, 85, 104]


### Syntax of a **list comprhension**
```python
[ <expression> for <item> in <iterable> ]
```
Where ```<expression>``` can be the result of on operation on ```<item>```
List comprehensions can also contain logic statements:
```python
[ <expression> for <item> in <iterable> if <condition>]
[ <expression> for <item> in <iterable> if <condition> else <expression_2>]

```
Expressions will be included in the list if the ```<condition>``` evaluates to ```True```

In [2]:
# some examples
# Extract enzyme kinetics curves 
files = [ 'DataGenerator.ipynb',
 'Enzyme-vmax=3.28-km=35.29.xlsx',
 'Enzyme-vmax=4.08-km=31.71.xlsx',
 'FluoresenceRegulation.csv',
 'Enzyme-vmax=2.26-km=18.46.csv',
 'Enzyme-vmax=1.7-km=44.73.xlsx',
 'Enzyme-vmax=4.74-km=43.85.csv']

l1 = [i for i in files if 'Enzyme' in i] # <expression: return i> <iterable: files> <condition: if 'Enzyme' in name>
l2 = [i for i in files if 'Enzyme' in i and 'csv' in i] # also check for filetype:csv

print(l1)
print(l2)

['Enzyme-vmax=3.28-km=35.29.xlsx', 'Enzyme-vmax=4.08-km=31.71.xlsx', 'Enzyme-vmax=2.26-km=18.46.csv', 'Enzyme-vmax=1.7-km=44.73.xlsx', 'Enzyme-vmax=4.74-km=43.85.csv']
['Enzyme-vmax=2.26-km=18.46.csv', 'Enzyme-vmax=4.74-km=43.85.csv']


# Bonus: Dictionary Comprehensions
Just like list comprehensions, dictionary comprehensions are single line loops, but return a dictionary instead of a list. The syntax is very similar, only it uses ```{curly brackets}``` and requires an additional key for each expression: ```<key>:<expression>``` 

```python
{<key>:<expression> for <item> in <iterable>}
```
Like list comprehensions, dictionary comprehensions also support logical checks to control the contents of the output.

## Generating keys
Here are two simple ways of generating unique keys for a dictionary comprehension:
#### Enumeration
The ```enumerate(<iterable>)``` function will return a tuple ```(<index>, <item>)``` for each item in the iterable. The start index can be specified using ```enumerate(<iterable>, <start_index>)``` 
```python
{<index>:<expression> for <index><item> in enumerate(<iterable>)}
{idx:x**2 for idx,x in enumerate([1,2,3,4])}
```
#### ```zip()```
```zip(<iterable_1>, <iterable_2>)``` returns a list of **tuples** where the $i^{th}$ **tuple** contains the  $i^{th}$ element of both iterables.
Tuples are similar to lists - they contain comma separated values, but unlike lists they cannot be modified after they've been created ```(item_1,item_2 .... item_n)``` . See their behaviour in the cell below. Notice that the loop terminates after one list is exhausted.


In [3]:
l3 =  ['a', 'b', 'c', 'd', 'e', 'f', 'g']
l4 = [1,2,3,4,5,6,7,8]

for i in zip(l3,l4):
    print(i)

('a', 1)
('b', 2)
('c', 3)
('d', 4)
('e', 5)
('f', 6)
('g', 7)


In [4]:
# The tuples can be umpacked like this:
for i,j in zip(l3,l4):
    print(i,j)

a 1
b 2
c 3
d 4
e 5
f 6
g 7


Knowing how ```zip()``` behaves, we can use it to iterate over two iterables at once, unpack the variables, operate on them and store them in a dictionary.

In [5]:
d = {i:j**2 for i,j in zip(l3,l4)}
d

{'a': 1, 'b': 4, 'c': 9, 'd': 16, 'e': 25, 'f': 36, 'g': 49}

```pandas.Series``` (single column) can be constructed from a dictionary, which is useful in data analysis

In [6]:
import pandas as pd
pd.Series(d)

a     1
b     4
c     9
d    16
e    25
f    36
g    49
dtype: int64

# **Exercises**

These are translation exercises: I'll make a ```for``` loop that you need to translate into a comprehension. You can compare your answer to the output of the ```for``` loop with a logic statement:
```python
my_output == your_output
```

In [7]:
# ex 1 
out = []
for i in range(10):
    out.append(i)

# translate!
out

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [8]:
# ex 2 - parabola
def f2(x):
    return x**2 - 5*x

out = []
for i in range(-10,10):
    out.append(f2(i))
print(out)

[150, 126, 104, 84, 66, 50, 36, 24, 14, 6, 0, -4, -6, -6, -4, 0, 6, 14, 24, 36]


In [9]:
# ex 3 - get consonants using conditional
out = []
for i in 'abcdefghijklmnopqrstuvwxyz':
    if i not in 'aeiou':
        out.append(i)
print(out) # translate!

['b', 'c', 'd', 'f', 'g', 'h', 'j', 'k', 'l', 'm', 'n', 'p', 'q', 'r', 's', 't', 'v', 'w', 'x', 'y', 'z']


In [12]:
# ex 4 - dictionary comprehension
out = {}
for i,j in enumerate('abcdefghijklmnopqrstuvwxyz', 1): #start counting at 1
    out[i] = j
out = pd.Series(out)
print(out.head()) # translate!

1    a
2    b
3    c
4    d
5    e
dtype: object
