## Comprehensions, Generators
### BIOINF 575 - Fall 2022

### For loop RECAP

### for: the repetitive control structure with a known number of steps

To loop through a sequence of elements is to iterate

```python
for var in sequence:
    statements
```

___ 

### Python Comprehension Statements
Courtesy of Marcurs Sherman - partly adapted

First, the **purpose** of comprehensions:
> "\[...\] comprehensions provide a more concise way to create \[iterables\] in situations where `map()` and `filter()` and/or nested loops would currently be used" - Barry Warsaw, [PEP 202](https://www.python.org/dev/peps/pep-0202/)

Comprehensions are what we call "_syntactic sugar_". 
This means that they do not do anything you could not have done already.     
But, with them, you can do some operations easier.

<img src="venn_diagram2.png" width=400 />

---
### Comprehension Syntax

#### Legend

<img src="legendary.png" width=250 />

#### Examples
<img src="comprehensions.png" width=500 />

#### Alternate syntax of a comprehensions

<center><img src="http://python-3-patterns-idioms-test.readthedocs.io/en/latest/_images/listComprehensions.gif" width = "500"/></center>

---
#### The Comprehension Categories
1. `list` comprehensions - create a list
2. `dict`ionary comprehensions - create dictionaries
3. `set` comprehensions - create sets
4. `tuple`? comprehensions

In [1]:
sequences = ["ACTTGCCC", "AAAGTC", "CCTAC", "AAACCTA"]

In [2]:
sequences

['ACTTGCCC', 'AAAGTC', 'CCTAC', 'AAACCTA']

#### Basic list comprehension
* Compute simple expression for each element

In [3]:
[len(seq) for seq in sequences]


[8, 6, 5, 7]

In [4]:
res_list = []

for s in sequences:
    res_list.append(len(s))
    
res_list

[8, 6, 5, 7]

#### List comprehension - use [ ]
* Compute complex expression for each element



In [5]:
# compute GC count

[s for s in sequences]

['ACTTGCCC', 'AAAGTC', 'CCTAC', 'AAACCTA']

In [6]:
[s.count("C") for s in sequences]

[4, 1, 3, 2]

In [None]:
[s.count("C") for s in sequences]

In [7]:
[s.count("C") + s.count("G") for s in sequences]

[5, 2, 3, 2]

In [9]:
[100 * (s.count("C") + s.count("G"))/len(s) for s in sequences]

[62.5, 33.333333333333336, 60.0, 28.571428571428573]

#### List comprehension with predicate
* Compute complex expression for specific elements
    * add a predicate - an if expression 
    * if expression - similar to the to the if statement but with no statements after the header line
        * e.g.: if "#" not in item


In [11]:
# compute GC content only for sequences that contain "AC"

sequences

['ACTTGCCC', 'AAAGTC', 'CCTAC', 'AAACCTA']

In [10]:
[100 * (s.count("C") + s.count("G"))/len(s) for s in sequences if "AC" in s]

[62.5, 60.0, 28.571428571428573]

#### If the comprehension becomes to complex - use a regular for loop

In [13]:
# compute GC content only for sequences that contain "AC"

sequences

['ACTTGCCC', 'AAAGTC', 'CCTAC', 'AAACCTA']

In [14]:
res_list = []
for s in sequences:
    if "AC" in s:
        GC_count = 100 * (s.count("C") + s.count("G"))/len(s)
        res_list.append(GC_count)

res_list

[62.5, 60.0, 28.571428571428573]

In [16]:
# implement power of two for only numerical values

test_list = [1,4,3,"A",[6,7],("Hello", "there"), 10]
test_list

[1, 4, 3, 'A', [6, 7], ('Hello', 'there'), 10]

In [17]:
[n for n in test_list if type(n) == int]

[1, 4, 3, 10]

In [18]:
[n**2 for n in test_list if type(n) == int]

[1, 16, 9, 100]

In [19]:
{n**2 for n in test_list if type(n) == int}

{1, 9, 16, 100}

#### Set comprehensions - use { } 
* Use when you want unique elements and the order does not matter

In [20]:
# get the first codon in each sequence  

sequences

['ACTTGCCC', 'AAAGTC', 'CCTAC', 'AAACCTA']

In [21]:
[s for s in sequences]

['ACTTGCCC', 'AAAGTC', 'CCTAC', 'AAACCTA']

In [22]:
[s[:3] for s in sequences]

['ACT', 'AAA', 'CCT', 'AAA']

In [23]:
{s[:3] for s in sequences}

{'AAA', 'ACT', 'CCT'}

#### Dictionary comprehensions - use { }
* must start with something like: key_expression:value_expression
* Use when you want key:value pairs and the order does not matter

In [None]:
# sequence as key and length as value

In [24]:
[s for s in sequences]

['ACTTGCCC', 'AAAGTC', 'CCTAC', 'AAACCTA']

In [26]:
# create a key:value element
# make sure your keys are unique

{s:len(s) for s in sequences}

{'ACTTGCCC': 8, 'AAAGTC': 6, 'CCTAC': 5, 'AAACCTA': 7}

In [27]:

{s[:3]:len(s) for s in sequences}

{'ACT': 8, 'AAA': 7, 'CCT': 5}

In [28]:
# sequence as key GC count as value

{s:s.count("C") for s in sequences}



{'ACTTGCCC': 4, 'AAAGTC': 1, 'CCTAC': 3, 'AAACCTA': 2}

In [29]:
{s:(100 * (s.count("G") + s.count("C"))/len(s)) for s in sequences}

{'ACTTGCCC': 62.5,
 'AAAGTC': 33.333333333333336,
 'CCTAC': 60.0,
 'AAACCTA': 28.571428571428573}

#### <font color = "red">Exercise:</font>   

* Create a list comprehension where we store if the corresponding sequence can code for the amino acid Tyrosine (TAT and TAC codons code for this amino acid).
* Change this into a dictionary comprehension where the key is the "Seq pos", where pos is the position of the sequence on the `sequences` list.


In [30]:
sequences

['ACTTGCCC', 'AAAGTC', 'CCTAC', 'AAACCTA']

In [31]:
s = 'ACTTGCCC'

In [32]:
"TAC" in s

False

In [34]:
list(range(4))

[0, 1, 2, 3]

In [38]:
for i in range(len(sequences)):
    print(i + 1)
    print(sequences[i])

1
ACTTGCCC
2
AAAGTC
3
CCTAC
4
AAACCTA


In [39]:
# Create a list comprehension where we store if the corresponding sequence can code 
# for the amino acid Tyrosine (TAT and TAC codons code for this amino acid).

[s for s in sequences]

['ACTTGCCC', 'AAAGTC', 'CCTAC', 'AAACCTA']

In [41]:
[("TAC" in s) for s in sequences]

[False, False, True, False]

In [42]:
[(("TAC" in s) or ("TAT" in s)) for s in sequences]

[False, False, True, False]

In [43]:
# Change this into a dictionary comprehension where the key is the "Seq pos", 
# where pos is the position of the sequence on the sequences list.

{s:(("TAC" in s) or ("TAT" in s)) for s in sequences}

{'ACTTGCCC': False, 'AAAGTC': False, 'CCTAC': True, 'AAACCTA': False}

In [45]:
[i for i in range(4)]

[0, 1, 2, 3]

In [47]:
[i for i in range(len(sequences))]

[0, 1, 2, 3]

In [48]:
{i:sequences[i] for i in range(len(sequences))}

{0: 'ACTTGCCC', 1: 'AAAGTC', 2: 'CCTAC', 3: 'AAACCTA'}

In [49]:
{i:("TAT" in sequences[i]) for i in range(len(sequences))}

{0: False, 1: False, 2: False, 3: False}

In [44]:
{i:(("TAC" in sequences[i]) or ("TAT" in sequences[i])) for i in range(len(sequences))}

{0: False, 1: False, 2: True, 3: False}

In [50]:
{i + 1:(("TAC" in sequences[i]) or ("TAT" in sequences[i])) for i in range(len(sequences))}

{1: False, 2: False, 3: True, 4: False}

In [51]:
{("Seq" + str(i + 1)):(("TAC" in sequences[i]) or ("TAT" in sequences[i])) for i in range(len(sequences))}

{'Seq1': False, 'Seq2': False, 'Seq3': True, 'Seq4': False}

In [52]:
res_dict = {}

for i in range(len(sequences)):
    key = "Seq" + str(i + 1)
    value = ("TAC" in sequences[i]) or ("TAT" in sequences[i])
    res_dict[key] = value
    
res_dict

{'Seq1': False, 'Seq2': False, 'Seq3': True, 'Seq4': False}

In [53]:
list(enumerate(sequences))

[(0, 'ACTTGCCC'), (1, 'AAAGTC'), (2, 'CCTAC'), (3, 'AAACCTA')]

In [54]:
{i:s for i, s in enumerate(sequences)}

{0: 'ACTTGCCC', 1: 'AAAGTC', 2: 'CCTAC', 3: 'AAACCTA'}

In [55]:
{"Seq" + str(i+1):s for i, s in enumerate(sequences)}

{'Seq1': 'ACTTGCCC', 'Seq2': 'AAAGTC', 'Seq3': 'CCTAC', 'Seq4': 'AAACCTA'}

In [57]:
{("Seq" + str(i+1)):(("TAC" in s) or ("TAT" in s)) for i, s in enumerate(sequences)}

{'Seq1': False, 'Seq2': False, 'Seq3': True, 'Seq4': False}

In [58]:
"Seq" + 1

TypeError: can only concatenate str (not "int") to str

In [59]:
"Seq" + "1"

'Seq1'

In [60]:
"Seq" + str(1)

'Seq1'

### Some pros of comprehensions
1. Concise - their use can easily distill multiple lines of code into a single, concise statement
1. Efficient (time and other resources) - _slightly_ more performant than regular loops
1. Flexible output - list, set, dictionary ...

### Some cons of comprehensions
1. The "imperative" syntax - the order in which you type things to make one is different from the rest of Python
1. Readability - comprehension statements get more unreadable as complexity is added

### RESOURCES

https://www.tutorialspoint.com/python-list-comprehension  
https://python-3-patterns-idioms-test.readthedocs.io/en/latest/Comprehensions.html  
https://realpython.com/list-comprehension-python/  
http://scipy-lectures.org/advanced/advanced_python/index.html   

#### Did we miss the tuple comprehensions?

In [61]:
# Try to make a `tuple` comprehension
# this will not return a tuple

(number * 2 for number in range(10))

<generator object <genexpr> at 0x7fd588209f20>

In [62]:

[number * 2 for number in range(10)]

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

### Python Generators
Courtesy of Marcurs Sherman - partly adapted

#### What was mentioned above as "comprehension statements" are actually called "generator expressions".

<img src="http://nvie.com/img/relationships.png" width=600 align='middle'/>


* Iterable is an object, which one can iterate over.
    * It generates an Iterator when passed to `iter()` method.       
* Iterator is an object, which is used to iterate over an iterable object using `__next__()` method. 
    * Iterators have `__next__()` method, which returns the next item of the object.       

* Note that **every iterator** is also an **iterable**, but **_not every iterable is an iterator_**.    
    * For example, a list is iterable but a list is not an iterator.        
* An iterator can be created from an iterable by using the function `iter()`. 
    * To make this possible, the class of an object needs either a method `__iter__`, which returns an iterator, or a `__getitem__` method with sequential indexes starting with 0.           

https://www.geeksforgeeks.org/python-difference-iterable-iterator/



In [63]:
range(3)

range(0, 3)

In [64]:
dir(range)

['__bool__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'count',
 'index',
 'start',
 'step',
 'stop']

In [65]:
# is range an iterator?
next(range(3))

TypeError: 'range' object is not an iterator

In [66]:
iter(range(3))

<range_iterator at 0x7fd5c9bcf690>

In [68]:
next(iter(range(3)))

0

In [69]:
test_iter = iter(range(3))

In [72]:
next(test_iter)

2

#### and we can do next again and again ...

In [73]:
# and ...that's it ... 
# when we reach the end of the sequence 
# the generator gives an error on next
# we have to create it again to start from the beginning

next(test_iter)

StopIteration: 

In [74]:
test_gen = (number * 2 for number in range(10))

In [77]:
next(test_gen)

4

In [78]:
# retrieve all values
tuple(test_gen)

(6, 8, 10, 12, 14, 16, 18)

In [80]:
tuple((number * 2 for number in range(10)))

(0, 2, 4, 6, 8, 10, 12, 14, 16, 18)

In [81]:
list((number * 2 for number in range(10)))

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

In [83]:
list((1,2,3))

[1, 2, 3]

In [84]:
[(1,2,3)]

[(1, 2, 3)]

In [86]:
[(number * 2 for number in range(10))]

[<generator object <genexpr> at 0x7fd5881c20b0>]

___
#### Functions RECAP

```python

# DEFINITION - creating a function

def function_name(arg1, arg2, darg=None):
    # instructions to compute result
    return result

# CALL - running a function

function_result = function_name(val1, val2, dval)
```

___


In [89]:
# define the function
def do_add(x, y):
    return x + y


In [90]:
# call the function
do_add(3,10)

13

A generator is just a special case of a function. The main difference is how it gives its output. 

How do you make a function give a result?

In [91]:
def number_one():
    number = 1
    return number

In [92]:
number_one()

1

In [93]:
# create a generator for an infinite sequence of numbers
# Note for generators we have yield instead of return

def infinite_sequence():
    number = 0
    while True:
        yield number
        number += 1 # number = number + 1

In [94]:
numbers_seq_gen = infinite_sequence()

In [95]:
numbers_seq_gen

<generator object infinite_sequence at 0x7fd5881db820>

In [98]:
next(numbers_seq_gen)

2

#### and we can do next again and again ...

In [102]:
next(numbers_seq_gen)

6

In [103]:
# a generator for a finite sequence of numbers
# this starts to look like range

def finite_sequence(limit):
    number = 0
    while number < limit:
        yield number
        number += 1

In [104]:
numbers_seq_gen = finite_sequence(3)

In [105]:
numbers_seq_gen

<generator object finite_sequence at 0x7fd5a8342350>

In [106]:
next(numbers_seq_gen)

0

In [107]:
# and we can do next again and again ... and ...that's it

next(numbers_seq_gen)


1

In [108]:
# we can put all the results in a list

list(numbers_seq_gen)

[2]

In [109]:
# go through the elements of the generator

x = finite_sequence(10)
y = next(x)
while y < 5:
    print(y)
    y = next(x)

0
1
2
3
4


In [110]:
for i in x:
    print(i)

6
7
8
9


In [111]:
for i in x:
    print(i)

In [112]:
x = finite_sequence(10)
for i in x:
    print(i)


0
1
2
3
4
5
6
7
8
9


In [113]:
# generator to put a key and a values list together in a dictionary

def zip_2sequences(seq1, seq2):
    n1 = len(seq1)
    n2 = len(seq2)
    n = min(n1, n2)
    index = 0
    while index < n:
        yield (seq1[index], seq2[index])
        index = index + 1


In [115]:
s1 = [1,2,3,4,5]
s2 = "ABCDE"

gen_ex = zip_2sequences(s1,s2)
list(gen_ex)

[(1, 'A'), (2, 'B'), (3, 'C'), (4, 'D'), (5, 'E')]

In [117]:
gen_ex = zip_2sequences(s1,s2)


In [120]:
next(gen_ex)

(3, 'C')

In [122]:
dict([(1, 'A'), (2, 'B')])

{1: 'A', 2: 'B'}

In [121]:
gen_ex = zip_2sequences(s1,s2)
dict(gen_ex)



{1: 'A', 2: 'B', 3: 'C', 4: 'D', 5: 'E'}

In [123]:
s1 = [1,2,3,4,5,6]
s2 = "ABCDE"

gen_ex = zip_2sequences(s1,s2)
list(gen_ex)

[(1, 'A'), (2, 'B'), (3, 'C'), (4, 'D'), (5, 'E')]

In [124]:
s1 = [1,2,3,4,5,6]
s2 = "ABCDEFGH"

gen_ex = zip_2sequences(s1,s2)
list(gen_ex)

[(1, 'A'), (2, 'B'), (3, 'C'), (4, 'D'), (5, 'E'), (6, 'F')]

#### <font color = "red">Exercise:</font>   

* Create a generator of n nucleotides that keeps giving us a nucleotide in the order A,C,G,T and then starts again from A until it reaches n nucleotides. 


In [125]:
def gen_nucleotide(n):
    nucleotide = "A"
    yield nucleotide
    

In [126]:
list(gen_nucleotide(10))

['A']

In [127]:
def gen_nucleotide(n):
    nucleotide = "A"
    index = 0
    while index < n:
        yield nucleotide
        index = index + 1

In [128]:
list(gen_nucleotide(10))

['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A']

In [133]:
def gen_nucleotide(n):
    nucleotides = ("A", "C", "G", "T")
    index = 0
    while index < n:
        yield nucleotides[0]
        index = index + 1

In [134]:
list(gen_nucleotide(10))

['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A']

In [142]:
def gen_nucleotide(n):
    nucleotides = ("A", "C", "G", "T")
    index = 0
    while index < n:
        yield nucleotides[index % 4]
        index = index + 1

In [143]:
list(gen_nucleotide(10))

['A', 'C', 'G', 'T', 'A', 'C', 'G', 'T', 'A', 'C']

---
# Conclusion
Generators and generator expressions should be a standard tool in every bioinformaticist's tool belt. 

1. Generator expressions can compress simple for loops down to a single line
1. List comprehensions tend to be more efficient than standard for loops when the data is sufficiently large
1. The same syntax to make a list comprehension can be used to make dictionaries, sets, and generators
1. Generators are iterators that lazily evaluate the next value and `yield` it back
1. Once a generator (or any iterator) is consumed you need to recreate it to get the values again

### Some pros of generators
1. Lazy evaluation: does not produce all the data at one time
1. Maintains state between steps: does not forget where it left off
1. Easily handles data of any size

### Some cons of generators
1. Hard to explain to someone that does not use Python
1. The data you are using is sufficiently small that the trade-off is not worth it

#### RESOURCES 
https://www.tutorialspoint.com/generators-in-python   
https://www.geeksforgeeks.org/generators-in-python/   
https://book.pythontips.com/en/latest/generators.html   


---
### Function Examples

___
##### <b>`*args`</b> - unkown no. of arguments - unpack collection of argument values
##### <b>`**kargs`</b> - unkown no. of arguments - unpack mapping of names and values 

In [144]:
x ,y ,z = [20,30,40]
print(x)
print(y)
print(z)

20
30
40


In [145]:
# what if the number of elements do not match?

x ,y ,z = [20,30,50, "A", 40]

ValueError: too many values to unpack (expected 3)

In [146]:
x ,*y ,z = [20,30,50, "A", 40]
print(x)
print(y)
print(z)

20
[30, 50, 'A']
40


In [147]:
x ,y ,*z = [20,30,50, "A", 40]
print(x)
print(y)
print(z)

20
30
[50, 'A', 40]


In [148]:
# if we use * we can provide an unknown number value of arguments

def test_arg(*args_list):
    for value in args_list:
        print("value = ", value)

In [149]:
test_arg(1,2,3, {"a":4}, [4,5])

value =  1
value =  2
value =  3
value =  {'a': 4}
value =  [4, 5]


In [150]:
# no key=value arguments allowed
test_arg(args_list = 2)

TypeError: test_arg() got an unexpected keyword argument 'args_list'

In [153]:
test_arg(x = "A")

TypeError: test_arg() got an unexpected keyword argument 'x'

In [156]:
test_arg("A",1, (3,4))

value =  A
value =  1
value =  (3, 4)


In [157]:
# if we use * we can provide an unknown number value of arguments
# if we use ** we can provide an unknown number key = value of arguments

def test_karg(**keys_args_dict):
    for name,value in keys_args_dict.items():
        print("name = ", name)
        print("value = ", value)

In [158]:
test_karg(**{"gene":"EGFR", "expression": 20,"transcript_no": 4})

name =  gene
value =  EGFR
name =  expression
value =  20
name =  transcript_no
value =  4


In [159]:
test_karg(gene = "EGFR", expression = 20, transcript_no = 4, snp_no = 5, genes_regualted = {"TP53", "EGR"})

name =  gene
value =  EGFR
name =  expression
value =  20
name =  transcript_no
value =  4
name =  snp_no
value =  5
name =  genes_regualted
value =  {'EGR', 'TP53'}


In [160]:
# we can check for the key and perform computations with the value for that key
# or retrieve the value for a specific key

def test_karg(**keys_args_dict):
    for name,value in keys_args_dict.items():
        print("name = ", name)
        print("value = ", value)
        if (name == "expression"):
            print("new value", 2*keys_args_dict[name])
        

In [161]:
test_karg(gene = "EGFR", expression = 20, transcript_no = 4, snp_no = 5, genes_regualted = {"TP53", "EGR"})

name =  gene
value =  EGFR
name =  expression
value =  20
new value 40
name =  transcript_no
value =  4
name =  snp_no
value =  5
name =  genes_regualted
value =  {'EGR', 'TP53'}


In [162]:
test_karg(gene = "EGFR", Expression = 20, transcript_no = 4, snp_no = 5, genes_regualted = {"TP53", "EGR"})

name =  gene
value =  EGFR
name =  Expression
value =  20
name =  transcript_no
value =  4
name =  snp_no
value =  5
name =  genes_regualted
value =  {'EGR', 'TP53'}


In [163]:
# if we provide a dictionary then all our key value pairs have to be in the dictionary we create
def test_karg(keys_args_dict):
    for name,value in keys_args_dict.items():
        print("name = ", name)
        print("value = ", value)

In [164]:
test_karg({"gene":"EGFR", "expression": 20,"transcript_no": 4})

name =  gene
value =  EGFR
name =  expression
value =  20
name =  transcript_no
value =  4


In [165]:
# we cannot provide the dictionary items as independent arguments
test_karg(gene = "EGFR", Expression = 20, transcript_no = 4, snp_no = 5, genes_regualted = {"TP53", "EGR"})

TypeError: test_karg() got an unexpected keyword argument 'gene'

____
##### <b>`lambda` function</b> - anonymous function - it has no name
Should be used only with simple expressions

https://docs.python.org/3/reference/expressions.html#lambda<br>
https://www.geeksforgeeks.org/python-lambda-anonymous-functions-filter-map-reduce/<br>
https://realpython.com/python-lambda/<br>

`lambda arguments : expression`

A lambda function can take <b>any number of arguments<b>, but must always have <b>only one expression</b>.

In [1]:
help(compute_expression)

NameError: name 'compute_expression' is not defined

In [2]:
compute_expression = lambda x, y: x + y + x*y

In [3]:
help(compute_expression)

Help on function <lambda> in module __main__:

<lambda> lambda x, y



In [169]:
compute_expression(2, 3)

11

____
### Useful functions

#### Built-in functions
https://docs.python.org/3/library/functions.html

##### <b>`zip(*iterables)`</b> - make an iterator that aggregates respective elements from each of the iterables.   
https://docs.python.org/3/library/functions.html#zip

##### <b>`map(function, iterable, ...)`</b> - apply function to every element of an iterable - return iterable with results
https://docs.python.org/3/library/functions.html#map

##### <b>`filter(function, iterable)`</b> - apply function (bool result) to every element of an iterable - return the elements from the input iterable for which the function returns True
https://docs.python.org/3/library/functions.html#filter

##### <b>`functools.reduce(function, iterable[, initializer])`</b> - apply function to every element of an iterable to reduce the iterable to a single value
https://docs.python.org/3/library/functools.html#functools.reduce

____



<b>`zip(*iterables)`</b> - make an iterator that aggregates respective elements from each of the iterables.  


In [8]:
combined_res = zip([10,20,30],["ACT","GGT","AACT"],[True,False,True])
combined_res

<zip at 0x7f9bd0828880>

In [6]:
for element in combined_res:
    print(element)

(10, 'ACT', True)
(20, 'GGT', False)
(30, 'AACT', True)


In [7]:
list(combined_res)

[]

In [9]:
combined_res = zip([10,20,30],["ACT","GGT","AACT"],[True,False,True])
list(combined_res)

[(10, 'ACT', True), (20, 'GGT', False), (30, 'AACT', True)]

In [10]:
lst1 = [10,20,30]
lst2 = ["ACT","GGT","AACT"]
lst3 = [True, False, True]
result = []
for i in range(len(lst1)):
    result.append((lst1[i], lst2[i], lst3[i]))
    
result

[(10, 'ACT', True), (20, 'GGT', False), (30, 'AACT', True)]

In [11]:
combined_res = zip([10,20,30,500],["ACT","GGT","AACT"],[True,False,True], "CCT")
list(combined_res)

[(10, 'ACT', True, 'C'), (20, 'GGT', False, 'C'), (30, 'AACT', True, 'T')]

In [175]:
# unzip list
x, y, z = zip(*[(3,4,7), (12,15,19), (30,60,90)])
print(x, y, z)

(3, 12, 30) (4, 15, 60) (7, 19, 90)


In [13]:
list(zip(*[(10, 'ACT', True, 'C'), (20, 'GGT', False, 'C'), (30, 'AACT', True, 'T')]))


[(10, 20, 30), ('ACT', 'GGT', 'AACT'), (True, False, True), ('C', 'C', 'T')]

In [15]:
x, y, z = zip(*[(3,4,7,8), (12,15,19), (30,60,90)])
print(x)
print(y)
print(z)

(3, 12, 30)
(4, 15, 60)
(7, 19, 90)


In [177]:
combined_res = zip(["ACT","GGT","AACT"], [10,20,30])
dict(combined_res)

{'ACT': 10, 'GGT': 20, 'AACT': 30}

In [178]:
dict(zip(["ACT","GGT","AACT"], [10,20,30]))

{'ACT': 10, 'GGT': 20, 'AACT': 30}

In [16]:
dict(zip(["ACT","GGT","AACT"], [10,20,30], [1,2,3]))

ValueError: dictionary update sequence element #0 has length 3; 2 is required

_____

<b>`map(function, iterable, ...)`</b> - apply function to every element of an iterable - return iterable with results

In [17]:
map(abs,[-2,0,-5,6,-7])

<map at 0x7f9bc0fe8250>

In [18]:
list(map(abs,[-2,0,-5,6,-7]))

[2, 0, 5, 6, 7]

In [20]:
def compute_addition(x,y):
    return x + y


In [21]:
list(map(compute_addition, [1,2,3,4], [50,60,70]))

[51, 62, 73]

In [22]:
[1,2,3,4] + [50,60,70]

[1, 2, 3, 4, 50, 60, 70]

In [23]:
list(map(compute_addition, [1,2,3,4]))

TypeError: compute_addition() missing 1 required positional argument: 'y'

In [24]:
def compute_addition(x,y = 10):
    return x + y

In [25]:
list(map(compute_addition, [1,2,3,4]))

[11, 12, 13, 14]

In [26]:
list(map(compute_addition, [1,2,3,4], [50,60,70]))

[51, 62, 73]

https://www.geeksforgeeks.org/python-map-function/

In [27]:
numbers1 = [1, 2, 3] 
numbers2 = [4, 5, 6] 
  
result = map(lambda x, y: x + y, numbers1, numbers2) 
list(result)

[5, 7, 9]

In [28]:
list(map(lambda x, y: x + y, [1,2,3,4], [50,60,70]) )

[51, 62, 73]

____
Use a lambda function and the map function to compute a result from the followimg 3 lists.<br>
If the element in the third list is divisible by 3 return 3*x, otherwise return 2*y.

In [29]:
numbers1 = [1, 2, 3, 4, 5, 6] 
numbers2 = [7, 8, 9, 10, 11, 12] 
numbers3 = [13, 14, 15, 16, 17, 18] 

result = map(lambda x, y, z: 3*x if z%3 ==0 else 2*y, \
             numbers1, numbers2, numbers3) 
list(result)



[14, 16, 9, 20, 22, 18]

In [30]:
def compute_res(x,y,z):
    res = None
    if z%3 == 0:
        res = 3*x
    else:
        res = 2*y
    return res


result = map(compute_res, numbers1, numbers2, numbers3) 
list(result)

[14, 16, 9, 20, 22, 18]

____
<b>`filter(function, iterable)`</b> - apply function (bool result) to every element of an iterable - return the elements from the input iterable for which the function returns True

In [31]:
test_list = [3,4,5,6,7]
result = filter(lambda x: x > 4, test_list)
result

<filter at 0x7f9bc0fdbc40>

In [32]:
list(result)

[5, 6, 7]

In [35]:
# Filter to remove empty structures or 0
test_list = [3, 0, 5, None, 7, "", "AACG", [], {}, {1:"one"}]
result = filter(bool, test_list)
list(result)

[3, 5, 7, 'AACG', {1: 'one'}]

In [33]:
bool(None)

False

In [34]:
bool({})

False

____
<b>`functools.reduce(function, iterable[, initializer])`</b> - apply function to every element of an iterable to reduce the iterable to a single value



In [36]:
help(reduce)

NameError: name 'reduce' is not defined

In [37]:
from functools import reduce

In [38]:
help(reduce)

Help on built-in function reduce in module _functools:

reduce(...)
    reduce(function, sequence[, initial]) -> value
    
    Apply a function of two arguments cumulatively to the items of a sequence,
    from left to right, so as to reduce the sequence to a single value.
    For example, reduce(lambda x, y: x+y, [1, 2, 3, 4, 5]) calculates
    ((((1+2)+3)+4)+5).  If initial is present, it is placed before the items
    of the sequence in the calculation, and serves as a default when the
    sequence is empty.



In [39]:
reduce(lambda x,y: x+y, [47,11,42,13])

113

<img src = https://www.python-course.eu/images/reduce_diagram.png width=300/>

https://www.python-course.eu/lambda.php

https://www.geeksforgeeks.org/reduce-in-python/
https://www.tutorialsteacher.com/python/python-reduce-function

In [40]:
test_list = [1,2,3,4,5,6]
reduce(lambda x,y: x+y, test_list)

21

In [45]:
# compute factorial of n
# n=5 factorial will be 1*2*3*4*5
n=5
reduce(lambda x, y: x*y, range(1, n+1))

120

In [46]:
n=3
reduce(lambda x, y: x*y, range(1, n+1))

6

In [42]:
list(range(5))

[0, 1, 2, 3, 4]

In [43]:
list(range(1,5))

[1, 2, 3, 4]

In [44]:
list(range(1,5 + 1))

[1, 2, 3, 4, 5]

In [47]:
n

3

In [48]:
list(range(n))

[0, 1, 2]

In [49]:
list(range(1, n+1))

[1, 2, 3]

In [50]:
reduce(lambda x,y: x+y, ["AACT", "AA", "C", "TTG"])

'AACTAACTTG'

In [52]:
# Exercise

seq_list = ["AACT", "AA", "C", "TTG"]

# use the above functions to create a dictionary 
# where the keys are the sequences in the list and 
# values are the length of the sequence



In [55]:
len("AACT")

4

In [53]:
map(len, seq_list)

<map at 0x7f9bc0f86850>

In [54]:
list(map(len, seq_list))

[4, 2, 1, 3]

In [57]:
list(zip([1,2,3], "ABC"))

[(1, 'A'), (2, 'B'), (3, 'C')]

In [58]:
dict(zip([1,2,3], "ABC"))

{1: 'A', 2: 'B', 3: 'C'}

In [59]:
list(zip([1,2,3], [["A","B"], "C", "D"]))

[(1, ['A', 'B']), (2, 'C'), (3, 'D')]

In [60]:
values = map(len, seq_list)
list(zip(seq_list, values))

[('AACT', 4), ('AA', 2), ('C', 1), ('TTG', 3)]

In [61]:
seq_list

['AACT', 'AA', 'C', 'TTG']

In [63]:
list(map(len, seq_list))

[4, 2, 1, 3]

In [64]:
dict([('AACT', 4), ('AA', 2), ('C', 1), ('TTG', 3)])

{'AACT': 4, 'AA': 2, 'C': 1, 'TTG': 3}

In [65]:
values = map(len, seq_list)
dict(zip(seq_list, values))

{'AACT': 4, 'AA': 2, 'C': 1, 'TTG': 3}

In [66]:
list(zip(seq_list, map(len, seq_list)))

[('AACT', 4), ('AA', 2), ('C', 1), ('TTG', 3)]

In [67]:
res = {}
for seq in seq_list:
    res[seq] = len(seq)
    
res

{'AACT': 4, 'AA': 2, 'C': 1, 'TTG': 3}