## Comprehensions, Generators, NumPy, Pandas
### BIOINF 575 - Fall 2021

### For loop RECAP

### for: the repetitive control structure with a known number of steps

To loop through a sequence of elements is to iterate

```python
for var in sequence:
    statements
```

___ 

### Python Comprehension Statements
Courtesy of Marcurs Sherman - partly adapted

First, the **purpose** of comprehensions:
> "\[...\] comprehensions provide a more concise way to create \[iterables\] in situations where `map()` and `filter()` and/or nested loops would currently be used" - Barry Warsaw, [PEP 202](https://www.python.org/dev/peps/pep-0202/)

Comprehensions are what we call "_syntactic sugar_". 
This means that they do not do anything you could not have done already. But, with them, you can do some operations easier.

<img src="venn_diagram2.png" width=400 />

---
### Comprehension Syntax

#### Legend

<img src="legendary.png" width=250 />

#### Examples
<img src="comprehensions.png" width=500 />

#### Alternate syntax of a comprehensions

<center><img src="http://python-3-patterns-idioms-test.readthedocs.io/en/latest/_images/listComprehensions.gif" width = "500"/></center>

---
#### The Comprehension Categories
1. `list` comprehensions - create a list
2. `dict`ionary comprehensions - create dictionaries
3. `set` comprehensions - create sets
4. `tuple`? comprehensions

In [None]:
sequences = ["ACTTG", "AAAGTC", "CCTAC", "AAACCT"]

In [None]:
sequences

In [None]:
# list comprehensions

[len(seq) for seq in sequences]


In [None]:
# compute GC count



In [None]:
# set comprehensions 

# get the first codon in each sequence



In [None]:
# dictionary comprehensions  
# sequence as key GC count as value




### Some pros of comprehensions
1. Concise - their use can easily distill multiple lines of code into a single, concise statement
1. Efficient (time and other resources) - _slightly_ more performant than regular loops
1. Flexible output - list, set, dictionary ...

### Some cons of comprehensions
1. The "imperative" syntax - the order in which you type things to make one is different from the rest of Python
1. Readability - comprehension statements get more unreadable as complexity is added

### RESOURCES

https://www.tutorialspoint.com/python-list-comprehension  
https://python-3-patterns-idioms-test.readthedocs.io/en/latest/Comprehensions.html  
https://realpython.com/list-comprehension-python/  
http://scipy-lectures.org/advanced/advanced_python/index.html   

In [None]:
# Now, try to make a `tuple` comprehension
(number * 2 for number in range(10))

### Python Generators
Courtesy of Marcurs Sherman - partly adapted

#### What was mentioned above as "comprehension statements" are actually called "generator expressions".

<img src="http://nvie.com/img/relationships.png" width=600 align='middle'/>


"Iterable is an object, which one can iterate over. It generates an Iterator when passed to iter() method. Iterator is an object, which is used to iterate over an iterable object using __next__() method. Iterators have __next__() method, which returns the next item of the object.

Note that every iterator is also an iterable, but not every iterable is an iterator. For example, a list is iterable but a list is not an iterator. An iterator can be created from an iterable by using the function iter(). To make this possible, the class of an object needs either a method __iter__, which returns an iterator, or a __getitem__ method with sequential indexes starting with 0."

https://www.geeksforgeeks.org/python-difference-iterable-iterator/



In [None]:
range(3)

In [None]:
# dir(range)

In [None]:
# is range an iterator?
next(range(3))

In [None]:
next(iter(range(3)))

In [None]:
test_iter = iter(range(3))

In [None]:
next(test_iter)

#### and we can do next again and again ...

In [None]:
# and ...that's it ... 
# when we reach the end of the sequence 
# the generator gives an error on next
# we have to create it again to start from the beginning

next(test_iter)

In [None]:
test_gen = (number * 2 for number in range(10))

In [None]:
next(test_gen)

In [None]:
# retrieve all values
tuple(test_gen)

___
#### Functions RECAP

```python

# DEFINITION - creating a function

def function_name(arg1, arg2, darg=None):
    # instructions to compute result
    return result

# CALL - running a function

function_result = function_name(val1, val2, dval)
```

___


A generator is just a special case of a function. The main difference is how it gives its output. 

How do you make a function give a result?

In [None]:
def number_one():
    number = 1
    return number

In [None]:
number_one()

In [None]:
# create a generator for an infinite sequence of numbers
# Note for generators we have yield instead of return

def infinite_sequence():
    number = 0
    while True:
        yield number
        number += 1

In [None]:
numbers_seq_gen = infinite_sequence()

In [None]:
numbers_seq_gen

In [None]:
next(numbers_seq_gen)

#### and we can do next again and again ...

In [None]:
next(numbers_seq_gen)

In [None]:
# a generator for a finite sequence of numbers
# this starts to look like range

def finite_sequence(limit):
    number = 0
    while number < limit:
        yield number
        number += 1

In [None]:
numbers_seq_gen = finite_sequence(3)

In [None]:
numbers_seq_gen

In [None]:
next(numbers_seq_gen)

In [None]:
# and we can do next again and again ... and ...that's it




In [None]:
# we can put all the results in a list



In [None]:
# go through the elements of the generator

x = finite_sequence(10)
y = next(x)
while y < 5:
    print(y)
    y = next(x)

In [None]:
for i in x:
    print(i)

In [None]:
# generator to put a key and a values list together in a dictionary

def zip_2sequences(seq1, seq2):
    pass

---
# Conclusion
Generators and generator expressions should be a standard tool in every bioinformaticist's tool belt. 

1. Generator expressions can compress simple for loops down to a single line
1. List comprehensions tend to be more efficient than standard for loops when the data is sufficiently large
1. The same syntax to make a list comprehension can be used to make dictionaries, sets, and generators
1. Generators are iterators that lazily evaluate the next value and `yield` it back
1. Once a generator (or any iterator) is consumed when complete

### Some pros of generators
1. Lazy evaluation: does not produce all the data at one time
1. Maintains state between steps: does not forget where it left off
1. Easily handles data of any size

### Some cons of generators
1. Hard to explain to someone that does not use Python
1. The data you are using is sufficiently small that the trade-off is not worth it

#### RESOURCES 
https://www.tutorialspoint.com/generators-in-python   
https://www.geeksforgeeks.org/generators-in-python/   
https://book.pythontips.com/en/latest/generators.html   


---
### Function Examples

___
##### <b>`*args`</b> - unkown no. of arguments - unpack collection of argument values
##### <b>`**kargs`</b> - unkown no. of arguments - unpack mapping of names and values 

In [None]:
x ,y ,z = [20,30,40]
print(x)
print(y)
print(z)

In [None]:
# what if the number of elements do not match?



In [None]:
x ,*y ,z = [20,30,50, "A", 40]
print(x)
print(y)
print(z)

In [None]:
# if we use * we can provide an unknown number value of arguments

def test_arg(*args_list):
    for value in args_list:
        print("value = ", value)

In [None]:
test_arg(1,2,3, {"a":4}, [4,5])

In [None]:
# no key=value arguments allowed
test_arg(args_list = 2)

In [None]:
# if we use * we can provide an unknown number value of arguments
# if we use ** we can provide an unknown number key = value of arguments

def test_karg(**keys_args_dict):
    for name,value in keys_args_dict.items():
        print("name = ", name)
        print("value = ", value)

In [None]:
test_karg(**{"gene":"EGFR", "expression": 20,"transcript_no": 4})

In [None]:
test_karg(gene = "EGFR", expression = 20, transcript_no = 4, snp_no = 5, genes_regualted = {"TP53", "EGR"})

In [None]:
# we can check for the key and perform computations with the value for that key
# or retrieve the value for a specific key

def test_karg(**keys_args_dict):
    for name,value in keys_args_dict.items():
        print("name = ", name)
        print("value = ", value)
        if (name == "expression"):
            print("new value", 2*keys_args_dict[name])
        

In [None]:
test_karg(gene = "EGFR", expression = 20, transcript_no = 4, snp_no = 5, genes_regualted = {"TP53", "EGR"})

In [None]:
test_karg(gene = "EGFR", Expression = 20, transcript_no = 4, snp_no = 5, genes_regualted = {"TP53", "EGR"})

In [None]:
# if we provide a dictionary then all our eky value pairs have to be in the dictionary we create
def test_karg(keys_args_dict):
    for name,value in keys_args_dict.items():
        print("name = ", name)
        print("value = ", value)

In [None]:
test_karg({"gene":"EGFR", "expression": 20,"transcript_no": 4})

In [None]:
# we cannot provide the dictionary items as independent arguments
test_karg(gene = "EGFR", Expression = 20, transcript_no = 4, snp_no = 5, genes_regualted = {"TP53", "EGR"})

____
##### <b>`lambda` function</b> - anonymous function - it has no name
Should be used only with simple expressions

https://docs.python.org/3/reference/expressions.html#lambda<br>
https://www.geeksforgeeks.org/python-lambda-anonymous-functions-filter-map-reduce/<br>
https://realpython.com/python-lambda/<br>

`lambda arguments : expression`

A lambda function can take <b>any number of arguments<b>, but must always have <b>only one expression</b>.

In [None]:
help(compute_expression)

In [None]:
compute_expression = lambda x, y: x + y + x*y

In [None]:
help(compute_expression)

In [None]:
compute_expression(2, 3)

____
### Useful functions

#### Built-in functions
https://docs.python.org/3/library/functions.html

##### <b>`zip(*iterables)`</b> - make an iterator that aggregates respective elements from each of the iterables.   
https://docs.python.org/3/library/functions.html#zip

##### <b>`map(function, iterable, ...)`</b> - apply function to every element of an iterable - return iterable with results
https://docs.python.org/3/library/functions.html#map

##### <b>`filter(function, iterable)`</b> - apply function (bool result) to every element of an iterable - return the elements from the input iterable for which the function returns True
https://docs.python.org/3/library/functions.html#filter

##### <b>`functools.reduce(function, iterable[, initializer])`</b> - apply function to every element of an iterable to reduce the iterable to a single value
https://docs.python.org/3/library/functools.html#functools.reduce

____



<b>`zip(*iterables)`</b> - make an iterator that aggregates respective elements from each of the iterables.  


In [None]:
combined_res = zip([10,20,30],["ACT","GGT","AACT"],[True,False,True])
combined_res

In [None]:
for element in combined_res:
    print(element)

In [None]:
list(combined_res)

In [None]:
combined_res = zip([10,20,30],["ACT","GGT","AACT"],[True,False,True])
list(combined_res)

In [None]:
combined_res = zip([10,20,30,500],["ACT","GGT","AACT"],[True,False,True])
list(combined_res)

In [None]:
# unzip list
x, y, z = zip(*[(3,4,7), (12,15,19), (30,60,90)])
print(x, y, z)

In [None]:
x, y, z = zip(*[(3,4,7,8), (12,15,19), (30,60,90)])
print(x, y, z)

In [None]:
combined_res = zip(["ACT","GGT","AACT"], [10,20,30])
dict(combined_res)

In [None]:
dict(zip(["ACT","GGT","AACT"], [10,20,30]))

_____

<b>`map(function, iterable, ...)`</b> - apply function to every element of an iterable - return iterable with results

In [None]:
map(abs,[-2,0,-5,6,-7])

In [None]:
list(map(abs,[-2,0,-5,6,-7]))

In [None]:
def compute_addition(x,y):
    return x + y


In [None]:
list(map(compute_addition, [1,2,3,4], [50,60,70]))

In [None]:
def compute_addition(x,y = 10):
    return x + y

In [None]:
list(map(compute_addition, [1,2,3,4]))

In [None]:
list(map(compute_addition, [1,2,3,4], [50,60,70]))

https://www.geeksforgeeks.org/python-map-function/

In [None]:
numbers1 = [1, 2, 3] 
numbers2 = [4, 5, 6] 
  
result = map(lambda x, y: x + y, numbers1, numbers2) 
list(result)

In [None]:
list(map(lambda x, y: x + y, [1,2,3,4], [50,60,70]) )

____
Use a lambda function and the map function to compute a result from the followimg 3 lists.<br>
If the element in the third list is divisible by 3 return 3*x, otherwise return 2*y.

In [None]:
numbers1 = [1, 2, 3, 4, 5, 6] 
numbers2 = [7, 8, 9, 10, 11, 12] 
numbers3 = [13, 14, 15, 16, 17, 18] 

result = map(lambda x, y, z: 3*x if z%3 ==0 else 2*y, \
             numbers1, numbers2, numbers3) 
list(result)



In [None]:
def compute_res(x,y,z):
    res = None
    if z%3 == 0:
        res = 3*x
    else:
        res = 2*y
    return res


result = map(compute_res, numbers1, numbers2, numbers3) 
list(result)

____
<b>`filter(function, iterable)`</b> - apply function (bool result) to every element of an iterable - return the elements from the input iterable for which the function returns True

In [None]:
test_list = [3,4,5,6,7]
result = filter(lambda x: x > 4, test_list)
result

In [None]:
list(result)

In [None]:
# Filter to remove empty structures or 0
test_list = [3, 0, 5, None, 7, "", "AACG", [], {}, {1:"one"}]
result = filter(bool, test_list)
list(result)

____
<b>`functools.reduce(function, iterable[, initializer])`</b> - apply function to every element of an iterable to reduce the iterable to a single value



In [None]:
help(reduce)

In [None]:
from functools import reduce

In [None]:
help(reduce)

In [None]:
reduce(lambda x,y: x+y, [47,11,42,13])

<img src = https://www.python-course.eu/images/reduce_diagram.png width=300/>

https://www.python-course.eu/lambda.php

https://www.geeksforgeeks.org/reduce-in-python/
https://www.tutorialsteacher.com/python/python-reduce-function

In [None]:
test_list = [1,2,3,4,5,6]
reduce(lambda x,y: x+y, test_list)

In [None]:
# compute factorial of n
n=5
reduce(lambda x, y: x*y, range(1, n+1))

In [None]:
list(range(n))

In [None]:
list(range(1, n+1))

In [None]:
reduce(lambda x,y: x+y, ["AACT", "AA", "C", "TTG"])