## Comprehensions, Generators, Useful functions
### BIOINF 575 - Fall 2021

### For loop RECAP

### for: the repetitive control structure with a known number of steps

To loop through a sequence of elements is to iterate

```python
for var in sequence:
    statements
```

___ 

### Python Comprehension Statements
Courtesy of Marcurs Sherman - partly adapted

First, the **purpose** of comprehensions:
> "\[...\] comprehensions provide a more concise way to create \[iterables\] in situations where `map()` and `filter()` and/or nested loops would currently be used" - Barry Warsaw, [PEP 202](https://www.python.org/dev/peps/pep-0202/)

Comprehensions are what we call "_syntactic sugar_". 
This means that they do not do anything you could not have done already. But, with them, you can do some operations easier.

<img src="venn_diagram2.png" width=400 />

---
### Comprehension Syntax

#### Legend

<img src="legendary.png" width=250 />

#### Examples
<img src="comprehensions.png" width=500 />

#### Alternate syntax of a comprehensions

<center><img src="http://python-3-patterns-idioms-test.readthedocs.io/en/latest/_images/listComprehensions.gif" width = "500"/></center>

---
#### The Comprehension Categories
1. `list` comprehensions - create a list
2. `dict`ionary comprehensions - create dictionaries
3. `set` comprehensions - create sets
4. `tuple`? comprehensions

In [2]:
sequences = ["ACTTG", "AAAGTC", "CCTAC", "AAACCT"]

In [3]:
sequences

['ACTTG', 'AAAGTC', 'CCTAC', 'AAACCT']

In [6]:
# list comprehensions

x = [len(seq) for seq in sequences]
type(x)

list

In [7]:
x

[5, 6, 5, 6]

In [8]:
# compute GC count

[(seq.count("G")+seq.count("C"))/len(seq) for seq in sequences]


[0.4, 0.3333333333333333, 0.6, 0.3333333333333333]

In [9]:
sequences

['ACTTG', 'AAAGTC', 'CCTAC', 'AAACCT']

In [10]:
# set comprehensions 

# get the first codon in each sequence
[seq[:3] for seq in sequences]


['ACT', 'AAA', 'CCT', 'AAA']

In [11]:
{seq[:3] for seq in sequences}

{'AAA', 'ACT', 'CCT'}

In [12]:
# dictionary comprehensions  
# sequence as key GC count as value

{seq:(seq.count("G")+seq.count("C"))/len(seq) for seq in sequences}


{'ACTTG': 0.4,
 'AAAGTC': 0.3333333333333333,
 'CCTAC': 0.6,
 'AAACCT': 0.3333333333333333}

In [13]:
[seq:(seq.count("G")+seq.count("C"))/len(seq) for seq in sequences]



SyntaxError: invalid syntax (4037520548.py, line 1)

In [14]:
{seq:(seq.count("G")+seq.count("C"))/len(seq) for seq in sequences if (seq.count("G")+seq.count("C"))/len(seq) >= 0.4}



{'ACTTG': 0.4, 'CCTAC': 0.6}

### Some pros of comprehensions
1. Concise - their use can easily distill multiple lines of code into a single, concise statement
1. Efficient (time and other resources) - _slightly_ more performant than regular loops
1. Flexible output - list, set, dictionary ...

### Some cons of comprehensions
1. The "imperative" syntax - the order in which you type things to make one is different from the rest of Python
1. Readability - comprehension statements get more unreadable as complexity is added

### RESOURCES

https://www.tutorialspoint.com/python-list-comprehension  
https://python-3-patterns-idioms-test.readthedocs.io/en/latest/Comprehensions.html  
https://realpython.com/list-comprehension-python/  
http://scipy-lectures.org/advanced/advanced_python/index.html   

In [16]:
# Now, try to make a `tuple` comprehension
(number * 2 for number in range(10))

<generator object <genexpr> at 0x112a12270>

In [15]:
list(range(10))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

### Python Generators
Courtesy of Marcurs Sherman - partly adapted

#### What was mentioned above as "comprehension statements" are actually called "generator expressions".

<img src="http://nvie.com/img/relationships.png" width=600 align='middle'/>


"Iterable is an object, which one can iterate over. It generates an Iterator when passed to iter() method. Iterator is an object, which is used to iterate over an iterable object using __next__() method. Iterators have __next__() method, which returns the next item of the object.

Note that every iterator is also an iterable, but not every iterable is an iterator. For example, a list is iterable but a list is not an iterator. An iterator can be created from an iterable by using the function iter(). To make this possible, the class of an object needs either a method __iter__, which returns an iterator, or a __getitem__ method with sequential indexes starting with 0."

https://www.geeksforgeeks.org/python-difference-iterable-iterator/



In [17]:
range(3)

range(0, 3)

In [18]:
dir(range)

['__bool__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'count',
 'index',
 'start',
 'step',
 'stop']

In [19]:
# is range an iterator?
next(range(3))

TypeError: 'range' object is not an iterator

In [21]:
next(iter(range(3)))

0

In [27]:
test_iter = iter(range(3))

In [28]:
next(test_iter)

0

#### and we can do next again and again ...

In [31]:
# and ...that's it ... 
# when we reach the end of the sequence 
# the generator gives an error on next
# we have to create it again to start from the beginning

next(test_iter)

StopIteration: 

In [32]:
test_gen = (number * 2 for number in range(10))

In [34]:
next(test_gen)

2

In [35]:
# retrieve all values - from where it left off
tuple(test_gen)

(4, 6, 8, 10, 12, 14, 16, 18)

___
#### Functions RECAP

```python

# DEFINITION - creating a function

def function_name(arg1, arg2, darg=None):
    # instructions to compute result
    return result

# CALL - running a function

function_result = function_name(val1, val2, dval)
```

___


A generator is just a special case of a function. The main difference is how it gives its output. 

How do you make a function give a result?

In [36]:
def number_one():
    number = 1
    return number

In [38]:
number_one()

1

In [39]:
# create a generator for an infinite sequence of numbers
# Note for generators we have yield instead of return

def infinite_sequence():
    number = 0
    while True:
        yield number
        number += 1

In [40]:
numbers_seq_gen = infinite_sequence()

In [41]:
numbers_seq_gen

<generator object infinite_sequence at 0x11262d660>

In [42]:
next(numbers_seq_gen)

0

#### and we can do next again and again ...

In [50]:
next(numbers_seq_gen)

8

In [51]:
next(numbers_seq_gen)

9

In [53]:
# a generator for a finite sequence of numbers
# this starts to look like range

def finite_sequence(limit):
    number = 0
    while number < limit:
        yield number
        number += 1

In [54]:
numbers_seq_gen = finite_sequence(3)

In [55]:
numbers_seq_gen

<generator object finite_sequence at 0x11262dac0>

In [56]:
next(numbers_seq_gen)

0

In [59]:
# and we can do next again and again ... and ...that's it

next(numbers_seq_gen)


StopIteration: 

In [61]:
# we can put all the results in a list

[i for i in finite_sequence(30)]


[0,
 1,
 2,
 3,
 4,
 5,
 6,
 7,
 8,
 9,
 10,
 11,
 12,
 13,
 14,
 15,
 16,
 17,
 18,
 19,
 20,
 21,
 22,
 23,
 24,
 25,
 26,
 27,
 28,
 29]

In [62]:
list(finite_sequence(10))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [63]:
# go through the elements of the generator

x = finite_sequence(10)
y = next(x)
while y < 5:
    print(y)
    y = next(x)

0
1
2
3
4


In [64]:
for i in x:
    print(i)

6
7
8
9


In [65]:
# generator to put a key and a values list together in a dictionary

def zip_2sequences(seq1, seq2):
    return {key:seq2[i] for i, key in enumerate(seq1)}

zip_2sequences([1,2,3], ["A", "B", "C"])

{1: 'A', 2: 'B', 3: 'C'}

In [66]:
zip_2sequences([1,2,3], ["A", "B", "C", "D"])

{1: 'A', 2: 'B', 3: 'C'}

In [67]:
zip_2sequences([1,2,3,4], ["A", "B", "C"])

IndexError: list index out of range

In [1]:
def zip_generator(seq_list1, seq_list2):
    i = 0
    n = min(len(seq_list1), len(seq_list2))
    while i < n:
        yield {seq_list1[i]:seq_list2[i]}
        i += 1

In [8]:
zipg = zip_generator([2,3,4], ("A","B","C"))
zipg

<generator object zip_generator at 0x114061270>

In [9]:
next(zipg)

{2: 'A'}

In [10]:
zipg = zip_generator([2,3,4,5], ("A","B","C"))
zipg

<generator object zip_generator at 0x1140614a0>

In [14]:
next(zipg)

StopIteration: 

In [26]:
zipg = zip_generator([2,3,4], ("A","B","C","D"))
zipg

<generator object zip_generator at 0x114061890>

In [27]:
i = next(zipg)
while(i):
    print(i)
    i = next(zipg)

{2: 'A'}
{3: 'B'}
{4: 'C'}


StopIteration: 

In [93]:
# get the elements and then move on
#so you can get theelement before reaching the end of the list

def zip_2sequences(seq1, seq2):
    i = 0
    while i < min(len(seq1),len(seq2)):
        item = (seq1[i],seq2[i])
        yield item
        i += 1

x = zip_2sequences([1,2,3], ["A","B","C"])
next(x)

(1, 'A')

In [94]:
list(x)

[(2, 'B'), (3, 'C')]

In [95]:
x = zip_2sequences([1,2,3], ["A","B","C"])
dict(x)

{1: 'A', 2: 'B', 3: 'C'}

In [96]:
# use a for loop instead of calling the next function (in a while loop for instance)
# Once the object went through all elements, it will not produce any more elements. 
# At this point, the generator mechanism is designed to raise a StopIteration if you attempt to iterate over it.
# For loops handle this error, however, next returns the exception as soon as it occured.

x = zip_2sequences([1,2,3], ["A","B","C"])

for i in x: 
    print(i)

(1, 'A')
(2, 'B')
(3, 'C')


SyntaxError: invalid syntax (2385826808.py, line 1)

In [97]:
# get the element after increasing i will lead to index out of range errors

def zip_2sequences(seq1, seq2):
    i = 0
    item = {seq1[i]:seq2[i]}
    while i < min(len(seq1),len(seq2)):
        yield item
        i += 1
        item = {seq1[i]:seq2[i]}

x = zip_2sequences([1,2,3], ["A","B","C"])
next(x)


{1: 'A'}

In [98]:
next(x)

{2: 'B'}

In [99]:
x = zip_2sequences([1,2,3], ["A","B","C"])
list(x)

IndexError: list index out of range

---
# Conclusion
Generators and generator expressions should be a standard tool in every bioinformaticist's tool belt. 

1. Generator expressions can compress simple for loops down to a single line
1. List comprehensions tend to be more efficient than standard for loops when the data is sufficiently large
1. The same syntax to make a list comprehension can be used to make dictionaries, sets, and generators
1. Generators are iterators that lazily evaluate the next value and `yield` it back
1. Once a generator (or any iterator) is consumed you need to recreate it

### Some pros of generators
1. Lazy evaluation: does not produce all the data at one time
1. Maintains state between steps: does not forget where it left off
1. Easily handles data of any size

### Some cons of generators
1. Hard to explain to someone that does not use Python
1. The data you are using is sufficiently small that the trade-off is not worth it

#### RESOURCES 
https://www.tutorialspoint.com/generators-in-python   
https://www.geeksforgeeks.org/generators-in-python/   
https://book.pythontips.com/en/latest/generators.html   


---
### Function Examples

___
##### <b>`*args`</b> - unkown no. of arguments - unpack collection of argument values
##### <b>`**kargs`</b> - unkown no. of arguments - unpack mapping of names and values 

In [28]:
x ,y ,z = [20,30,40]
print(x)
print(y)
print(z)

20
30
40


In [30]:
# what if the number of elements do not match?
x ,*y ,z = [20,30, 50,40]


In [31]:
x ,*y ,z = [20,30,50, "A", 40]
print(x)
print(y)
print(z)

20
[30, 50, 'A']
40


In [33]:
x ,*y ,z = [20, 40]
print(x)
print(y)
print(z)

20
[]
40


In [34]:
# if we use * we can provide an unknown number value of arguments

def test_arg(*args_list):
    for value in args_list:
        print("value = ", value)

In [35]:
test_arg(1,2,3, {"a":4}, [4,5])

value =  1
value =  2
value =  3
value =  {'a': 4}
value =  [4, 5]


In [36]:
test_arg(1,2,3, {"a":4}, [4,5], (5,6), "test")

value =  1
value =  2
value =  3
value =  {'a': 4}
value =  [4, 5]
value =  (5, 6)
value =  test


In [37]:
# no key=value arguments allowed
test_arg(args_list = 2)

TypeError: test_arg() got an unexpected keyword argument 'args_list'

In [39]:
# if we use * we can provide an unknown number value of arguments
# if we use ** we can provide an unknown number key = value of arguments

def test_karg(**keys_args_dict):
    for name,value in keys_args_dict.items():
        print("name = ", name)
        print("value = ", value)

In [40]:
test_karg(**{"gene":"EGFR", "expression": 20,"transcript_no": 4})

name =  gene
value =  EGFR
name =  expression
value =  20
name =  transcript_no
value =  4


In [42]:
test_karg(gene = "EGFR", expression = 20, transcript_no = 4, snp_no = 5, genes_regualted = {"TP53", "EGR"}, x = 2)

name =  gene
value =  EGFR
name =  expression
value =  20
name =  transcript_no
value =  4
name =  snp_no
value =  5
name =  genes_regualted
value =  {'EGR', 'TP53'}
name =  x
value =  2


In [43]:
# we can check for the key and perform computations with the value for that key
# or retrieve the value for a specific key

def test_karg(**keys_args_dict):
    for name,value in keys_args_dict.items():
        print("name = ", name)
        print("value = ", value)
        if (name == "expression"):
            print("new value", 2*keys_args_dict[name])
        

In [44]:
test_karg(gene = "EGFR", expression = 20, transcript_no = 4, snp_no = 5, genes_regualted = {"TP53", "EGR"})

name =  gene
value =  EGFR
name =  expression
value =  20
new value 40
name =  transcript_no
value =  4
name =  snp_no
value =  5
name =  genes_regualted
value =  {'EGR', 'TP53'}


In [45]:
test_karg(gene = "EGFR", Expression = 20, transcript_no = 4, snp_no = 5, genes_regualted = {"TP53", "EGR"})

name =  gene
value =  EGFR
name =  Expression
value =  20
name =  transcript_no
value =  4
name =  snp_no
value =  5
name =  genes_regualted
value =  {'EGR', 'TP53'}


In [46]:
# if we provide a dictionary then all our key value pairs have to be in the dictionary we create
def test_karg(keys_args_dict):
    for name,value in keys_args_dict.items():
        print("name = ", name)
        print("value = ", value)

In [47]:
test_karg({"gene":"EGFR", "expression": 20,"transcript_no": 4})

name =  gene
value =  EGFR
name =  expression
value =  20
name =  transcript_no
value =  4


In [48]:
# we cannot provide the dictionary items as independent arguments
test_karg(gene = "EGFR", Expression = 20, transcript_no = 4, snp_no = 5, genes_regualted = {"TP53", "EGR"})

TypeError: test_karg() got an unexpected keyword argument 'gene'

____
##### <b>`lambda` function</b> - anonymous function - it has no name
Should be used only with simple expressions

https://docs.python.org/3/reference/expressions.html#lambda<br>
https://www.geeksforgeeks.org/python-lambda-anonymous-functions-filter-map-reduce/<br>
https://realpython.com/python-lambda/<br>

`lambda arguments : expression`

A lambda function can take <b>any number of arguments<b>, but must always have <b>only one expression</b>.

In [49]:
help(compute_expression)

NameError: name 'compute_expression' is not defined

In [50]:
compute_expression = lambda x, y: x + y + x*y

In [51]:
help(compute_expression)

Help on function <lambda> in module __main__:

<lambda> lambda x, y



In [52]:
compute_expression(2, 3)

11

____
### Useful functions

#### Built-in functions
https://docs.python.org/3/library/functions.html

##### <b>`zip(*iterables)`</b> - make an iterator that aggregates respective elements from each of the iterables.   
https://docs.python.org/3/library/functions.html#zip

##### <b>`map(function, iterable, ...)`</b> - apply function to every element of an iterable - return iterable with results
https://docs.python.org/3/library/functions.html#map

##### <b>`filter(function, iterable)`</b> - apply function (bool result) to every element of an iterable - return the elements from the input iterable for which the function returns True
https://docs.python.org/3/library/functions.html#filter

##### <b>`functools.reduce(function, iterable[, initializer])`</b> - apply function to every element of an iterable to reduce the iterable to a single value
https://docs.python.org/3/library/functools.html#functools.reduce

____



<b>`zip(*iterables)`</b> - make an iterator that aggregates respective elements from each of the iterables.  


In [53]:
combined_res = zip([10,20,30],["ACT","GGT","AACT"],[True,False,True])
combined_res

<zip at 0x1134a5fc0>

In [54]:
for element in combined_res:
    print(element)

(10, 'ACT', True)
(20, 'GGT', False)
(30, 'AACT', True)


In [None]:
list(combined_res)

In [55]:
combined_res = zip([10,20,30],["ACT","GGT","AACT"],[True,False,True])
list(combined_res)

[(10, 'ACT', True), (20, 'GGT', False), (30, 'AACT', True)]

In [56]:
# if sizes do not match it goes to the smallest list  size

combined_res = zip([10,20,30,500],["ACT","GGT","AACT"],[True,False,True])
list(combined_res)

[(10, 'ACT', True), (20, 'GGT', False), (30, 'AACT', True)]

In [57]:
# unzip list - return each element from the tuple to the respective (position/index based) list
x, y, z = zip(*[(3,4,7), (12,15,19), (30,60,90)])
print(x, y, z)

(3, 12, 30) (4, 15, 60) (7, 19, 90)


In [58]:
x, y, z = zip(*[(3,4,7,8), (12,15,19), (30,60,90)])
print(x, y, z)

(3, 12, 30) (4, 15, 60) (7, 19, 90)


In [59]:
combined_res = zip(["ACT","GGT","AACT"], [10,20,30])
list(combined_res)

[('ACT', 10), ('GGT', 20), ('AACT', 30)]

In [61]:
combined_res = zip(["ACT","GGT","AACT"], [10,20,30])
dict(combined_res)

{'ACT': 10, 'GGT': 20, 'AACT': 30}

In [62]:
dict(zip(["ACT","GGT","AACT"], [10,20,30]))

{'ACT': 10, 'GGT': 20, 'AACT': 30}

In [63]:
combined_res = zip([10,20,30],["ACT","GGT","AACT"],[True,False,True])
dict(combined_res)

ValueError: dictionary update sequence element #0 has length 3; 2 is required

_____

<b>`map(function, iterable, ...)`</b> - apply function to every element of an iterable - return iterable with results

In [64]:
abs([1,2,3])

TypeError: bad operand type for abs(): 'list'

In [65]:
map(abs,[-2,0,-5,6,-7])

<map at 0x113bb2bb0>

In [66]:
list(map(abs,[-2,0,-5,6,-7]))

[2, 0, 5, 6, 7]

In [67]:
def compute_addition(x,y):
    return x + y


In [68]:
# goes to the smalles size of the iterables
list(map(compute_addition, [1,2,3,4], [50,60,70]))

[51, 62, 73]

In [69]:
def compute_addition(x,y = 10):
    return x + y

In [70]:
list(map(compute_addition, [1,2,3,4]))

[11, 12, 13, 14]

In [71]:
list(map(compute_addition, [1,2,3,4], [50,60,70]))

[51, 62, 73]

https://www.geeksforgeeks.org/python-map-function/

In [72]:
numbers1 = [1, 2, 3] 
numbers2 = [4, 5, 6] 
  
result = map(lambda x, y: x + y, numbers1, numbers2) 
list(result)

[5, 7, 9]

In [None]:
list(map(lambda x, y: x + y, [1,2,3,4], [50,60,70]) )

____
Use a lambda function and the map function to compute a result from the followimg 3 lists.<br>
If the element in the third list is divisible by 3 return 3*x, otherwise return 2*y.

In [73]:
numbers1 = [1, 2, 3, 4, 5, 6] 
numbers2 = [7, 8, 9, 10, 11, 12] 
numbers3 = [13, 14, 15, 16, 17, 18] 

result = map(lambda x, y, z: 3*x if z%3 ==0 else 2*y, \
             numbers1, numbers2, numbers3) 
list(result)



[14, 16, 9, 20, 22, 18]

In [74]:
def compute_res(x,y,z):
    res = None
    if z%3 == 0:
        res = 3*x
    else:
        res = 2*y
    return res


result = map(compute_res, numbers1, numbers2, numbers3) 
list(result)

[14, 16, 9, 20, 22, 18]

____
<b>`filter(function, iterable)`</b> - apply function (bool result) to every element of an iterable - return the elements from the input iterable for which the function returns True

In [75]:
test_list = [3,4,5,6,7]
result = filter(lambda x: x > 4, test_list)
result

<filter at 0x113e96cd0>

In [76]:
list(result)

[5, 6, 7]

In [77]:
test_list = [3,4,5,6,7]
result = filter(lambda x: x % 2 == 0 , test_list)
result

<filter at 0x113e785b0>

In [78]:
list(result)

[4, 6]

In [79]:
# Filter to remove empty structures or 0
test_list = [3, 0, 5, None, 7, "", "AACG", [], {}, {1:"one"}]
result = filter(bool, test_list)
list(result)

[3, 5, 7, 'AACG', {1: 'one'}]

In [83]:
bool([1])

True

____
<b>`functools.reduce(function, iterable[, initializer])`</b> - apply function to every element of an iterable to reduce the iterable to a single value



In [84]:
help(reduce)

NameError: name 'reduce' is not defined

In [85]:
from functools import reduce

In [86]:
help(reduce)

Help on built-in function reduce in module _functools:

reduce(...)
    reduce(function, sequence[, initial]) -> value
    
    Apply a function of two arguments cumulatively to the items of a sequence,
    from left to right, so as to reduce the sequence to a single value.
    For example, reduce(lambda x, y: x+y, [1, 2, 3, 4, 5]) calculates
    ((((1+2)+3)+4)+5).  If initial is present, it is placed before the items
    of the sequence in the calculation, and serves as a default when the
    sequence is empty.



In [87]:
reduce(lambda x,y: x+y, [47,11,42,13])

113

<img src = https://www.python-course.eu/images/reduce_diagram.png width=300/>

https://www.python-course.eu/lambda.php

https://www.geeksforgeeks.org/reduce-in-python/
https://www.tutorialsteacher.com/python/python-reduce-function

In [88]:
test_list = [1,2,3,4,5,6]
reduce(lambda x,y: x+y, test_list)

21

In [89]:
# compute factorial of n
n=5
reduce(lambda x, y: x*y, range(1, n+1))

120

In [90]:
list(range(n))

[0, 1, 2, 3, 4]

In [91]:
list(range(1, n+1))

[1, 2, 3, 4, 5]

In [92]:
reduce(lambda x,y: x+y, ["AACT", "AA", "C", "TTG"])

'AACTAACTTG'