### Common data structures and algorithms involving data: <br>
* List
* sets
* Dictionaries

### Unpacking a variable into separate variables:

In [2]:
p = (3,2,5)
p

(3, 2, 5)

In [3]:
x,y,z = p
z

5

In [4]:
data = ['chocolate',5,10,(2021,12,31)]
data

['chocolate', 5, 10, (2021, 12, 31)]

In [6]:
articulo, stock, precio, fecha = data
print(articulo)
print(fecha)

chocolate
(2021, 12, 31)


In [7]:
articulo, stock, precio, (yy, mth, day) = data
yy

2021

**unpacking actually works with any object that happens to be iterable**

In [10]:
s = 'hola'
a, b, c, d = s
a

'h'

**for discart certain values**

In [11]:
articulo, _ , precio, _ = data
precio

10

**Unpacking iterables of unknown or arbitrary lenght**

In [52]:
GeneID ='AT1G14690.1 protein_coding microtubule-associated protein 65-7     microtubule-associated protein 65-7'
GeneID

'AT1G14690.1 protein_coding microtubule-associated protein 65-7    microtubule-associated protein 65-7'

In [53]:
gene, isoform = GeneID.split(".")
gene

'AT1G14690'

In [56]:
isoform

'1 protein_coding microtubule-associated protein 65-7    microtubule-associated protein 65-7'

In [58]:
isoform, tipo = isoform.split("_")
isoform

'1 protein'

In [59]:
tipo

'coding microtubule-associated protein 65-7    microtubule-associated protein 65-7'

**Unpacking head and tail from a list**

In [60]:
list_numbers = [1,2,3,55,56,87,20,384]
head, *tail = list_numbers
head

1

In [61]:
tail

[2, 3, 55, 56, 87, 20, 384]

**Finding largests or smallest N items**

In [66]:
import heapq

#here you want to do a list of the 3 largest elements in the list
print(heapq.nlargest(3,list_numbers))

#here you want to do a list of the 2 smallest elements in the list
print(heapq.nsmallest(2,list_numbers))

[384, 87, 56]
[1, 2]


### Dictionaries <br>
Use dicts if you want to preserve the insertion order of the items.

In [73]:
dic1 = {
    'gen:':['AT1G75610', 'AT1G75530', 'AT1G75620'],
    'desc':['pseudogene', 'protein_coding', 'protein_coding']
}

dic2 = {
    'gen:':['AT1G33490', 'AT1G11840', 'AT1G34530'],
    'desc':['protein_coding', 'transposable_element_gene']
}
dic1

{'gen:': ['AT1G75610', 'AT1G75530', 'AT1G75620'],
 'desc': ['pseudogene', 'protein_coding', 'protein_coding']}

In [76]:
# the easy way to construct dicts is to use defaultdict
# AUtomatically initializes the first value.

from collections import defaultdict

dict3 = defaultdict(list)
dict3['gen'].append('AT1G53330')
dict3['desc'].append('protein_coding')
dict3['gen'].append('AT1G65940')
dict3['desc'].append('pseudogene')
dict3

# dict3 = defaultdict(set)

defaultdict(list,
            {'gen': ['AT1G53330', 'AT1G65940'],
             'desc': ['protein_coding', 'pseudogene']})

In [84]:
dict3 = {'AT1G53330':300,
        'AT1G65940':560,
        'AT1G53331':348,
        'AT1G65940':289
        }


In [86]:
max_len = max(zip(dict3.values(), dict3.keys()))
max_len

(348, 'AT1G53331')

In [87]:
min_len = min(zip(dict3.values(), dict3.keys()))
min_len

(289, 'AT1G65940')

**Finding commonalities in two dicts** <br>
Use common set operations using the keys() or items() methods.


In [95]:
dic1 = {
    'AT1G75610':520,
    'AT1G75530':320,
    'AT1G75620':480
}

dic2 = {
    'AT1G33490':421,
    'AT1G11840':365,
    'AT1G34530':258,
    'AT1G75620':480
}


In [96]:
dic1.keys() & dic2.keys()

{'AT1G75620'}

In [98]:
## you want to find keys in dic2 that are not in keys 1
dic2.keys() - dic1.keys()

{'AT1G11840', 'AT1G33490', 'AT1G34530'}

***Naming a slide***

In [99]:
cadena = "ILITHYIA (ILA) is a HEAT repeat protein involved in plant immunity. The gene is also involved in systemic acquired resistance induced by P. syringae expressing avrRps4. Loss-of-function mutants of ILA caused pleiotropic defects in the mutant plants. The mutant plants are smaller in size and the leaves are serrated and yellow to light green in color. Required for bacterium-triggered stomatal closure.ILITYHIA;(source:Araport11)"

In [106]:
gene_func = cadena[20:40]

In [107]:
gene_func

'HEAT repeat protein '

In [120]:
items = [1,2,3,55,56,87,20,384]
a = slice(2,4)

In [126]:
#items[2:4]
items[a]

[333, 555]

In [125]:
items[a] = [333,555,7777,9999]

In [127]:
items

[1, 2, 333, 555, 7777, 9999, 7777, 9999, 56, 87, 20, 384]

In [128]:
items[a]

[333, 555]

**Most frequently ocurring items in a sequence**