## Comprehensions, Generators
### BIOINF 575 - Fall 2022

### For loop RECAP

### for: the repetitive control structure with a known number of steps

To loop through a sequence of elements is to iterate

```python
for var in sequence:
    statements
```

___ 

### Python Comprehension Statements
Courtesy of Marcurs Sherman - partly adapted

First, the **purpose** of comprehensions:
> "\[...\] comprehensions provide a more concise way to create \[iterables\] in situations where `map()` and `filter()` and/or nested loops would currently be used" - Barry Warsaw, [PEP 202](https://www.python.org/dev/peps/pep-0202/)

Comprehensions are what we call "_syntactic sugar_". 
This means that they do not do anything you could not have done already.     
But, with them, you can do some operations easier.

<img src="venn_diagram2.png" width=420 />

---
### Comprehension Syntax

#### Legend

<img src="legendary.png" width=250 />

#### Examples
<img src="comprehensions.png" width=500 />

#### Alternate syntax of a comprehensions

<center><img src="http://python-3-patterns-idioms-test.readthedocs.io/en/latest/_images/listComprehensions.gif" width = "500"/></center>

---
#### The Comprehension Categories
1. `list` comprehensions - create a list
2. `dict`ionary comprehensions - create dictionaries
3. `set` comprehensions - create sets
4. `tuple`? comprehensions

In [2]:
sequences = ["ACTTGCCC", "AAAGTC", "CCTAC", "AAACCTA"]

In [3]:
sequences

['ACTTGCCC', 'AAAGTC', 'CCTAC', 'AAACCTA']

#### Basic list comprehension
* Compute simple expression for each element

In [4]:
[seq.count("A")  for seq in sequences]

[1, 3, 1, 4]

In [5]:
[len(seq) for seq in sequences]


[8, 6, 5, 7]

In [4]:
len_list = []
for seq in sequences:
    len_list.append(len(seq))
    
len_list

[8, 6, 5, 7]

#### List comprehension - use [ ]
* Compute complex expression for each element



In [6]:
# compute GC count

[100*(seq.count("C") + seq.count("G"))/len(seq) for seq in sequences]

[62.5, 33.333333333333336, 60.0, 28.571428571428573]

In [7]:
[len(seq) for seq in sequences]

[8, 6, 5, 7]

In [8]:
[seq.count("C")/len(seq) for seq in sequences]

[0.5, 0.16666666666666666, 0.6, 0.2857142857142857]

#### List comprehension with predicate
* Compute complex expression for specific elements
    * add a predicate - an if expression 
    * if expression - similar to the to the if statement but with no statements after the header line
        * e.g.: if "#" not in item


In [6]:
# compute GC content only for sequences that contain "AC"

sequences

['ACTTGCCC', 'AAAGTC', 'CCTAC', 'AAACCTA']

In [7]:
[100*(seq.count("C") + seq.count("G"))/len(seq) for seq in sequences]

[62.5, 33.333333333333336, 60.0, 28.571428571428573]

In [12]:
[100*(seq.count("C") + seq.count("G"))/len(seq) for seq in sequences if "AC" in seq]

[62.5, 60.0, 28.571428571428573]

#### If the comprehension becomes to complex - use a regular for loop

In [None]:
# compute GC content only for sequences that contain "AC"

sequences

In [13]:
GC_list = []
for seq in sequences:
    
    GC_content = 100*(seq.count("C") + seq.count("G"))/len(seq)
    if "AC" in seq:
        GC_list.append(GC_content)
    
GC_list

[62.5, 60.0, 28.571428571428573]

#### Set comprehensions - use { } 
* Use when you want unique elements and the order does not matter

In [14]:
# get the first codon in each sequence  
{seq[:3] for seq in sequences}


{'AAA', 'ACT', 'CCT'}

In [15]:
sequences

['ACTTGCCC', 'AAAGTC', 'CCTAC', 'AAACCTA']

#### Dictionary comprehensions - use { }
* must start with something like: key_expression:value_expression
* Use when you want key:value pairs and the order does not matter

In [8]:
# sequence as key GC count as value

[100*(seq.count("C") + seq.count("G"))/len(seq) for seq in sequences]


[62.5, 33.333333333333336, 60.0, 28.571428571428573]

In [9]:
{seq:100*(seq.count("C") + seq.count("G"))/len(seq) for seq in sequences}

{'ACTTGCCC': 62.5,
 'AAAGTC': 33.333333333333336,
 'CCTAC': 60.0,
 'AAACCTA': 28.571428571428573}

In [10]:
{s:len(s) for s in sequences}

{'ACTTGCCC': 8, 'AAAGTC': 6, 'CCTAC': 5, 'AAACCTA': 7}

#### <font color = "red">Exercise:</font>   

* Create a list comprehension where we store if the corresponding sequence can code for the amino acid Tyrosine (TAT and TAC codons code for this amino acid).
* Change this into a dictionary comprehension where the key is the "Seq pos", where pos is the position of the sequence on the `sequences` list.


In [11]:
sequences

['ACTTGCCC', 'AAAGTC', 'CCTAC', 'AAACCTA']

In [12]:
["TAT" in s for s in sequences]

[False, False, False, False]

In [13]:
[(("TAT" in s) or ("TAC" in s)) for s in sequences]

[False, False, True, False]

In [14]:
[i for i in range(len(sequences))]

[0, 1, 2, 3]

In [15]:
{i:("TAT" in sequences[i])  for i in range(len(sequences))}

{0: False, 1: False, 2: False, 3: False}

In [16]:
{i+1:("TAT" in sequences[i])  for i in range(len(sequences))}

{1: False, 2: False, 3: False, 4: False}

In [18]:
{i+1:(("TAT" in sequences[i]) or ("TAC" in sequences[i]))  for i in range(len(sequences))}

{1: False, 2: False, 3: True, 4: False}

In [20]:
{"Seq " + str(i+1):(("TAT" in sequences[i]) or ("TAC" in sequences[i]))  for i in range(len(sequences))}

{'Seq 1': False, 'Seq 2': False, 'Seq 3': True, 'Seq 4': False}

In [22]:
# using enumerate

{idx:s for idx,s in enumerate(sequences)}

{0: 'ACTTGCCC', 1: 'AAAGTC', 2: 'CCTAC', 3: 'AAACCTA'}

In [23]:
list(enumerate(sequences))

[(0, 'ACTTGCCC'), (1, 'AAAGTC'), (2, 'CCTAC'), (3, 'AAACCTA')]

In [24]:
{idx +1:(("TAC" in s) or ("TAT" in s)) for idx,s in enumerate(sequences)}

{1: False, 2: False, 3: True, 4: False}

In [25]:
{"Seq " + str(idx +1):(("TAC" in s) or ("TAT" in s)) for idx,s in enumerate(sequences)}

{'Seq 1': False, 'Seq 2': False, 'Seq 3': True, 'Seq 4': False}

In [26]:
x = 2

(x == 2) * 5 + (x == 3) * 10

5

In [27]:
x = 3

(x == 2) * 5 + (x == 3) * 10

10

### Some pros of comprehensions
1. Concise - their use can easily distill multiple lines of code into a single, concise statement
1. Efficient (time and other resources) - _slightly_ more performant than regular loops
1. Flexible output - list, set, dictionary ...

### Some cons of comprehensions
1. The "imperative" syntax - the order in which you type things to make one is different from the rest of Python
1. Readability - comprehension statements get more unreadable as complexity is added

### RESOURCES

https://www.tutorialspoint.com/python-list-comprehension  
https://python-3-patterns-idioms-test.readthedocs.io/en/latest/Comprehensions.html  
https://realpython.com/list-comprehension-python/  
http://scipy-lectures.org/advanced/advanced_python/index.html   

#### Did we miss the tuple comprehensions?

In [28]:
# Try to make a `tuple` comprehension
# this will not return a tuple

(number * 2 for number in range(10))

<generator object <genexpr> at 0x7f856062c430>

In [29]:
[number * 2 for number in range(10)]

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

### Python Generators
Courtesy of Marcurs Sherman - partly adapted

#### What was mentioned above as "comprehension statements" are actually called "generator expressions".

<img src="http://nvie.com/img/relationships.png" width=600 align='middle'/>


* Iterable is an object, which one can iterate over.
    * It generates an Iterator when passed to `iter()` method.       
* Iterator is an object, which is used to iterate over an iterable object using `__next__()` method. 
    * Iterators have `__next__()` method, which returns the next item of the object.       

* Note that **every iterator** is also an **iterable**, but **_not every iterable is an iterator_**.    
    * For example, a list is iterable but a list is not an iterator.        
* An iterator can be created from an iterable by using the function `iter()`. 
    * To make this possible, the class of an object needs either a method `__iter__`, which returns an iterator, or a `__getitem__` method with sequential indexes starting with 0.           

https://www.geeksforgeeks.org/python-difference-iterable-iterator/



In [30]:
range(3)

range(0, 3)

In [31]:
dir(range)

['__bool__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'count',
 'index',
 'start',
 'step',
 'stop']

In [32]:
# is range an iterator?
next(range(3))

TypeError: 'range' object is not an iterator

In [33]:
iter(range(3))

<range_iterator at 0x7f85415f1150>

In [36]:
next(iter(range(3)))

0

In [38]:
test_iter = iter(range(3))
test_iter

<range_iterator at 0x7f85605788d0>

In [42]:
next(test_iter)

StopIteration: 

#### and we can do next again and again ...

In [43]:
# and ...that's it ... 
# when we reach the end of the sequence 
# the generator gives an error on next
# we have to create it again to start from the beginning

next(test_iter)

StopIteration: 

In [44]:
test_gen = (number * 2 for number in range(10))

In [47]:
next(test_gen)

4

In [48]:
# retrieve all values
tuple(test_gen)

(6, 8, 10, 12, 14, 16, 18)

___
#### Functions RECAP

```python

# DEFINITION - creating a function

def function_name(arg1, arg2, darg=None):
    # instructions to compute result
    return result

# CALL - running a function

function_result = function_name(val1, val2, dval)
```

___


A generator is just a special case of a function. The main difference is how it gives its output. 

How do you make a function give a result?

In [49]:
def number_one():
    number = 1
    return number

In [50]:
number_one()

1

In [66]:
# create a generator for an infinite sequence of numbers
# Note for generators we have yield instead of return

def infinite_sequence():
    number = 0
    while True:
        yield "Seq " + str(number)
        number += 1

In [67]:
numbers_seq_gen = infinite_sequence()

In [68]:
numbers_seq_gen

<generator object infinite_sequence at 0x7f85416c3a50>

In [69]:
next(numbers_seq_gen)

'Seq 0'

#### and we can do next again and again ...

In [71]:
next(numbers_seq_gen)

'Seq 2'

In [65]:
next(numbers_seq_gen)

11

In [72]:
# a generator for a finite sequence of numbers
# this starts to look like range

def finite_sequence(limit):
    number = 0
    while number < limit:
        yield number
        number += 1

In [73]:
numbers_seq_gen = finite_sequence(3)

In [74]:
numbers_seq_gen

<generator object finite_sequence at 0x7f85416cb0b0>

In [75]:
next(numbers_seq_gen)

0

In [76]:
# and we can do next again and again ... and ...that's it
next(numbers_seq_gen)



1

In [77]:
# we can put all the results in a list

next(numbers_seq_gen)

2

In [78]:
next(numbers_seq_gen)

StopIteration: 

In [79]:
# go through the elements of the generator

x = finite_sequence(10)
y = next(x)
while y < 5:
    print(y)
    y = next(x)

0
1
2
3
4


In [80]:
for i in x:
    print(i)

6
7
8
9


In [81]:
x = finite_sequence(20)
list(x)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

In [82]:
# generator to put a key and a values list together in a dictionary

def zip_2sequences(seq1, seq2):
    n1 = len(seq1)
    n2 = len(seq2)
    n = min(n1, n2)
    idx = 0
    while idx < n:
        yield (seq1[idx], seq2[idx])
        idx = idx + 1
    

In [83]:
result = zip_2sequences([1,2,3], "ABC")
result

<generator object zip_2sequences at 0x7f85416cb190>

In [84]:
list(result)

[(1, 'A'), (2, 'B'), (3, 'C')]

In [88]:
result = zip_2sequences([1,2,3], "ABC")
result

<generator object zip_2sequences at 0x7f85416cbba0>

In [89]:
dict(result)

{1: 'A', 2: 'B', 3: 'C'}

In [91]:
result = zip_2sequences([1,2,3], "ABC")
result

<generator object zip_2sequences at 0x7f85416cbb30>

In [92]:
set(result)

{(1, 'A'), (2, 'B'), (3, 'C')}

#### <font color = "red">Exercise:</font>   

* Create a generator of n nucleotides that keeps giving us a nucleotide in the order A,C,G,T and then starts again from A until it reaches n nucleotides. 


In [93]:
def get_nucleotide(n):
    nucleotides = ("A","C","G","T")
    i = 0
    yield nucleotides[i]

In [94]:
g_nc = get_nucleotide(10)

In [95]:
g_nc

<generator object get_nucleotide at 0x7f8560449040>

In [96]:
list(g_nc)

['A']

In [99]:
def get_nucleotide(n):
    nucleotides = ("A","C","G","T")
    i = 0
    while i < n:
        yield nucleotides[i]
        i = i + 1

In [102]:
g_nc = get_nucleotide(5)

In [103]:
list(g_nc)

IndexError: tuple index out of range

In [104]:
def get_nucleotide(n):
    nucleotides = ("A","C","G","T")
    i = 0
    while i < n:
        yield nucleotides[i % 4]
        i = i + 1

In [106]:
g_nc = get_nucleotide(15)
list(g_nc)

['A', 'C', 'G', 'T', 'A', 'C', 'G', 'T', 'A', 'C', 'G', 'T', 'A', 'C', 'G']

---
# Conclusion
Generators and generator expressions should be a standard tool in every bioinformaticist's tool belt. 

1. Generator expressions can compress simple for loops down to a single line
1. List comprehensions tend to be more efficient than standard for loops when the data is sufficiently large
1. The same syntax to make a list comprehension can be used to make dictionaries, sets, and generators
1. Generators are iterators that lazily evaluate the next value and `yield` it back
1. Once a generator (or any iterator) is consumed you need to recreate it to get the values again

### Some pros of generators
1. Lazy evaluation: does not produce all the data at one time
1. Maintains state between steps: does not forget where it left off
1. Easily handles data of any size

### Some cons of generators
1. Hard to explain to someone that does not use Python
1. The data you are using is sufficiently small that the trade-off is not worth it

#### RESOURCES 
https://www.tutorialspoint.com/generators-in-python   
https://www.geeksforgeeks.org/generators-in-python/   
https://book.pythontips.com/en/latest/generators.html   


---
### Function Examples

___
##### <b>`*args`</b> - unkown no. of arguments - unpack collection of argument values
##### <b>`**kargs`</b> - unkown no. of arguments - unpack mapping of names and values 

In [107]:
x ,y ,z = [20,30,40]
print(x)
print(y)
print(z)

20
30
40


In [None]:
# what if the number of elements do not match?



In [108]:
x ,*y ,z = [20,30,50, "A", 40]
print(x)
print(y)
print(z)

20
[30, 50, 'A']
40


In [109]:
x ,y ,*z = [20,30,50, "A", 40]
print(x)
print(y)
print(z)

20
30
[50, 'A', 40]


In [110]:
# if we use * we can provide an unknown number value of arguments

def test_arg(*args_list):
    for value in args_list:
        print("value = ", value)

In [114]:
test_arg(1,2,3, {"a":4}, [4,5], (7,8), "ACGT")

value =  1
value =  2
value =  3
value =  {'a': 4}
value =  [4, 5]
value =  (7, 8)
value =  ACGT


In [115]:
# no key=value arguments allowed
test_arg(args_list = 2)

TypeError: test_arg() got an unexpected keyword argument 'args_list'

In [116]:
# if we use * we can provide an unknown number value of arguments
# if we use ** we can provide an unknown number key = value of arguments

def test_karg(**keys_args_dict):
    for name,value in keys_args_dict.items():
        print("name = ", name)
        print("value = ", value)

In [117]:
test_karg(**{"gene":"EGFR", "expression": 20,"transcript_no": 4})

name =  gene
value =  EGFR
name =  expression
value =  20
name =  transcript_no
value =  4


In [121]:
test_karg(x = {"gene":"EGFR", "expression": 20,"transcript_no": 4})

name =  x
value =  {'gene': 'EGFR', 'expression': 20, 'transcript_no': 4}


In [119]:
test_karg(gene = "EGFR", expression = 20, transcript_no = 4, snp_no = 5, genes_regualted = {"TP53", "EGR"})

name =  gene
value =  EGFR
name =  expression
value =  20
name =  transcript_no
value =  4
name =  snp_no
value =  5
name =  genes_regualted
value =  {'EGR', 'TP53'}


In [122]:
# we can check for the key and perform computations with the value for that key
# or retrieve the value for a specific key

def test_karg(**keys_args_dict):
    for name,value in keys_args_dict.items():
        print("name = ", name)
        print("value = ", value)
        if (name == "expression"):
            print("new value", 2*keys_args_dict[name])
        

In [123]:
test_karg(gene = "EGFR", expression = 20, transcript_no = 4, snp_no = 5, genes_regualted = {"TP53", "EGR"})

name =  gene
value =  EGFR
name =  expression
value =  20
new value 40
name =  transcript_no
value =  4
name =  snp_no
value =  5
name =  genes_regualted
value =  {'EGR', 'TP53'}


In [124]:
test_karg(gene = "EGFR", Expression = 20, transcript_no = 4, snp_no = 5, genes_regualted = {"TP53", "EGR"})

name =  gene
value =  EGFR
name =  Expression
value =  20
name =  transcript_no
value =  4
name =  snp_no
value =  5
name =  genes_regualted
value =  {'EGR', 'TP53'}


In [125]:
# if we provide a dictionary then all our key value pairs have to be in the dictionary we create
def test_karg(keys_args_dict):
    for name,value in keys_args_dict.items():
        print("name = ", name)
        print("value = ", value)

In [126]:
test_karg({"gene":"EGFR", "expression": 20,"transcript_no": 4})

name =  gene
value =  EGFR
name =  expression
value =  20
name =  transcript_no
value =  4


In [127]:
# we cannot provide the dictionary items as independent arguments
test_karg(gene = "EGFR", Expression = 20, transcript_no = 4, snp_no = 5, genes_regualted = {"TP53", "EGR"})

TypeError: test_karg() got an unexpected keyword argument 'gene'

____
##### <b>`lambda` function</b> - anonymous function - it has no name
Should be used only with simple expressions

https://docs.python.org/3/reference/expressions.html#lambda<br>
https://www.geeksforgeeks.org/python-lambda-anonymous-functions-filter-map-reduce/<br>
https://realpython.com/python-lambda/<br>

`lambda arguments : expression`

A lambda function can take <b>any number of arguments<b>, but must always have <b>only one expression</b>.

In [128]:
help(compute_expression)

NameError: name 'compute_expression' is not defined

In [129]:
compute_expression = lambda x, y: x + y + x*y

In [130]:
help(compute_expression)

Help on function <lambda> in module __main__:

<lambda> lambda x, y



In [131]:
compute_expression(2, 3)

11

____
### Useful functions

#### Built-in functions
https://docs.python.org/3/library/functions.html

##### <b>`zip(*iterables)`</b> - make an iterator that aggregates respective elements from each of the iterables.   
https://docs.python.org/3/library/functions.html#zip

##### <b>`map(function, iterable, ...)`</b> - apply function to every element of an iterable - return iterable with results
https://docs.python.org/3/library/functions.html#map

##### <b>`filter(function, iterable)`</b> - apply function (bool result) to every element of an iterable - return the elements from the input iterable for which the function returns True
https://docs.python.org/3/library/functions.html#filter

##### <b>`functools.reduce(function, iterable[, initializer])`</b> - apply function to every element of an iterable to reduce the iterable to a single value
https://docs.python.org/3/library/functools.html#functools.reduce

____



<b>`zip(*iterables)`</b> - make an iterator that aggregates respective elements from each of the iterables.  


In [132]:
combined_res = zip([10,20,30],["ACT","GGT","AACT"],[True,False,True])
combined_res

<zip at 0x7f85604c5140>

In [133]:
for element in combined_res:
    print(element)

(10, 'ACT', True)
(20, 'GGT', False)
(30, 'AACT', True)


In [136]:
combined_res = zip([10,20,30],["ACT","GGT","AACT"],[True,False,True])
combined_res
list(combined_res)

[(10, 'ACT', True), (20, 'GGT', False), (30, 'AACT', True)]

In [137]:
combined_res = zip([10,20,30],["ACT","GGT","AACT"],[True,False,True, True])
list(combined_res)

[(10, 'ACT', True), (20, 'GGT', False), (30, 'AACT', True)]

In [138]:
combined_res = zip([10,20,30,500],["ACT","GGT","AACT"],[True,False,True])
list(combined_res)

[(10, 'ACT', True), (20, 'GGT', False), (30, 'AACT', True)]

In [139]:
# unzip list
x, y, z = zip(*[(3,4,7), (12,15,19), (30,60,90)])
print(x, y, z)

(3, 12, 30) (4, 15, 60) (7, 19, 90)


In [140]:
x, y, z = zip(*[(3,4,7,8), (12,15,19), (30,60,90)])
print(x, y, z)

(3, 12, 30) (4, 15, 60) (7, 19, 90)


In [141]:
combined_res = zip(["ACT","GGT","AACT"], [10,20,30])
dict(combined_res)

{'ACT': 10, 'GGT': 20, 'AACT': 30}

In [142]:
dict(zip(("ACT","GGT","AACT"), [10,20,30]))

{'ACT': 10, 'GGT': 20, 'AACT': 30}

In [143]:
dict(zip("ACTTA", [10,20,30]))

{'A': 10, 'C': 20, 'T': 30}

_____

<b>`map(function, iterable, ...)`</b> - apply function to every element of an iterable - return iterable with results

In [144]:
map(abs,[-2,0,-5,6,-7])

<map at 0x7f85418ea190>

In [145]:
list(map(abs,[-2,0,-5,6,-7]))

[2, 0, 5, 6, 7]

In [146]:
[1,2,3] + [3,4,5]

[1, 2, 3, 3, 4, 5]

In [154]:
def compute_addition(x,y):
    return x + y


In [155]:
list(map(compute_addition, [1,2,3,4], [50,60,70]))

[51, 62, 73]

In [156]:
list(map(compute_addition, [1,2,3,4]))

TypeError: compute_addition() missing 1 required positional argument: 'y'

In [157]:
def compute_addition(x,y = 10):
    return x + y

In [158]:
list(map(compute_addition, [1,2,3,4]))

[11, 12, 13, 14]

In [159]:
list(map(compute_addition, [1,2,3,4], [50,60,70]))

[51, 62, 73]

https://www.geeksforgeeks.org/python-map-function/

In [160]:
numbers1 = [1, 2, 3] 
numbers2 = [4, 5, 6] 
  
result = map(lambda x, y: x + y, numbers1, numbers2) 
list(result)

[5, 7, 9]

In [161]:
list(map(lambda x, y: x + y, [1,2,3,4], [50,60,70]) )

[51, 62, 73]

____
Use a lambda function and the map function to compute a result from the followimg 3 lists.<br>
If the element in the third list is divisible by 3 return 3*x, otherwise return 2*y.

In [162]:
numbers1 = [1, 2, 3, 4, 5, 6] 
numbers2 = [7, 8, 9, 10, 11, 12] 
numbers3 = [13, 14, 15, 16, 17, 18] 

result = map(lambda x, y, z: 3*x if z%3 ==0 else 2*y, \
             numbers1, numbers2, numbers3) 
list(result)



[14, 16, 9, 20, 22, 18]

In [163]:
def compute_res(x,y,z):
    res = None
    if z%3 == 0:
        res = 3*x
    else:
        res = 2*y
    return res


result = map(compute_res, numbers1, numbers2, numbers3) 
list(result)

[14, 16, 9, 20, 22, 18]

____
<b>`filter(function, iterable)`</b> - apply function (bool result) to every element of an iterable - return the elements from the input iterable for which the function returns True

In [164]:
test_list = [3,4,5,6,7]
result = filter(lambda x: x > 4, test_list)
result

<filter at 0x7f85418d8d30>

In [165]:
list(result)

[5, 6, 7]

In [169]:
# Filter to remove empty structures or 0
test_list = [3, 0, 5, None, 7, "", "AACG", [], {}, {1:"one"}]
result = filter(bool, test_list)
list(result)

[3, 5, 7, 'AACG', {1: 'one'}]

In [166]:
bool(None)

False

In [167]:
bool("")

False

In [168]:
bool(100)

True

____
<b>`functools.reduce(function, iterable[, initializer])`</b> - apply function to every element of an iterable to reduce the iterable to a single value



In [None]:
help(reduce)

In [171]:
from functools import reduce

In [172]:
help(reduce)

Help on built-in function reduce in module _functools:

reduce(...)
    reduce(function, sequence[, initial]) -> value
    
    Apply a function of two arguments cumulatively to the items of a sequence,
    from left to right, so as to reduce the sequence to a single value.
    For example, reduce(lambda x, y: x+y, [1, 2, 3, 4, 5]) calculates
    ((((1+2)+3)+4)+5).  If initial is present, it is placed before the items
    of the sequence in the calculation, and serves as a default when the
    sequence is empty.



In [None]:
reduce(lambda x,y: x+y, [47,11,42,13])

<img src = https://www.python-course.eu/images/reduce_diagram.png width=300/>

https://www.python-course.eu/lambda.php

https://www.geeksforgeeks.org/reduce-in-python/
https://www.tutorialsteacher.com/python/python-reduce-function

In [173]:
test_list = [1,2,3,4,5,6]
reduce(lambda x,y: x+y, test_list)

21

In [175]:
# compute factorial of n
n=4
reduce(lambda x, y: x*y, range(1, n+1))

24

In [176]:
list(range(n))

[0, 1, 2, 3]

In [177]:
list(range(1, n+1))

[1, 2, 3, 4]

In [178]:
reduce(lambda x,y: x+y, ["AACT", "AA", "C", "TTG"])

'AACTAACTTG'

In [181]:
# Create a dictionary from the following sequence of sequences:

keys = ["AACGT", "GCTTA", "GGGGTTA"]
# values should be the length of the sequence


In [185]:
res = {}
for s in keys:
    res[s] = len(s)
res

{'AACGT': 5, 'GCTTA': 5, 'GGGGTTA': 7}

In [186]:
map(len, keys)

<map at 0x7f8541a8e4f0>

In [184]:
list(map(len,keys))

[5, 5, 7]

In [187]:
dict(zip(keys, map(len, keys)))

{'AACGT': 5, 'GCTTA': 5, 'GGGGTTA': 7}