# Useful Pythonic Data Types

## Resources

**Lectures and level solutions**

https://github.com/sksuzuki/How-to-Learn-to-Code/

**Python Resources on today's topics**

- [Python 3 data structures tutorial](https://docs.python.org/3/tutorial/datastructures.html)
- [Dictionary tutorial and built-in fxns and methods](https://www.tutorialspoint.com/python3/python_dictionary.htm)
- [Comprehensions tutorial](http://book.pythontips.com/en/latest/comprehensions.html)

# Dictionaries

A useful data type in python is the ```dictionary``` which is used to map ```keys``` to ```values```. Dictionaries are indexed by their ```keys``` rather than a range of numbers, which is how sequences, like lists, are indexed. A ```key``` can be any immutable type, such as a ```str``` or an ```int```, and a value can be any type, like a ```str``` or a ```list``` etc.

## Why use a dictionary?

Dictionaries are very fast, implemented using a technique called hashing, which allows us to access a value very quickly. List and tuple implementation is slow, if we wanted to find a value associated with a key, we would have to iterate over every list/tuple, checking the 0th element. 

## Examples

In [68]:
# syntax


<class 'dict'>


syntax is: ```{key:value, ..., key:value }```

In [69]:
# intializing


{'GCT': 'A', 'AGT': 'S'}


In [70]:
# adding a dict. element


{'GCT': 'A', 'AGT': 'S', 'ATG': 'M'}


In [71]:
# deleting a key


{'GCT': 'A', 'ATG': 'M'}


In [72]:
# adding an element with an iterable value


{'M': 'ATG', 'S': ['TCT', 'TCA', 'TCG', 'TCC']}


**Key Characteristics**
- More than one entry per ```key``` not allowed. Which means no duplicate ```key``` is allowed. When duplicate ```keys``` encountered during assignment, the last assignment wins.
- ```Keys``` must be immutable. Which means you can use strings, numbers or tuples as dictionary keys but something like ```['key']``` is not allowed.

**Some Useful Methods**

In [73]:
# . + tab

SyntaxError: invalid syntax (<ipython-input-73-09da35794296>, line 1)

In [74]:
# get values


ATG


In [75]:
# get all elements
print(aa_2_codon.items())

dict_items([('M', 'ATG'), ('S', ['TCT', 'TCA', 'TCG', 'TCC'])])


In [76]:
# get just keys


dict_keys(['M', 'S'])


In [83]:
# get keys in a list


['M', 'S']

# List comprehensions

Comprehensions are constructs that allow sequences to be built from other sequences. You can have ```list``` comprehensions, as well as ```dictionary```, ```set```, and ```generator``` comprehensions. We're just going to introduce ```list``` comprehensions, but the logic applies to all types of comprehensions.

Here's one way to generate a list:

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

List comprehensions provide a concise way of creating a ```list```. They are used to make new lists where each element of the list is the result of some operation applied to each member of another sequence. You can even use apply conditionals while creating a list!

For a simple list comprehensions, the syntax is as follows:

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

## Benefits of using a list comprehension

- *cleaner code*: more compact
- *faster*: The ```.append()``` method causes a ```list``` to grow each iteration whereas list comprehensions gathers all elements before creating the ```list``` to fit them all at the same time.

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [8]:
%timeit square(1000000)
%timeit [x**2 for x in range(1000000)]

4.92 s ± 109 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
4.3 s ± 85.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


## Examples

**reading in a file**

In [46]:
infile = 'data.txt' 
with open(infile) as infile:
    content = [line.strip() for line in infile.readlines()]
    
print(content)

['This', 'is', 'data']


**Simultaneously iterating through multiple lists**

In [43]:
list1 = [1,2,3]
list2 = [4,5,6]



print(z)

[5, 7, 9]


**Using conditionals in list comprehensions**

good explanation found here: https://stackoverflow.com/questions/4406389/if-else-in-a-list-comprehension

In [65]:
gencode = {
    'ATA':'I', 'ATC':'I', 'ATT':'I', 'ATG':'M',
    'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACT':'T',
    'AAC':'N', 'AAT':'N', 'AAA':'K', 'AAG':'K',
    'AGC':'S', 'AGT':'S', 'AGA':'R', 'AGG':'R',
    'CTA':'L', 'CTC':'L', 'CTG':'L', 'CTT':'L',
    'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCT':'P',
    'CAC':'H', 'CAT':'H', 'CAA':'Q', 'CAG':'Q',
    'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGT':'R',
    'GTA':'V', 'GTC':'V', 'GTG':'V', 'GTT':'V',
    'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCT':'A',
    'GAC':'D', 'GAT':'D', 'GAA':'E', 'GAG':'E',
    'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGT':'G',
    'TCA':'S', 'TCC':'S', 'TCG':'S', 'TCT':'S',
    'TTC':'F', 'TTT':'F', 'TTA':'L', 'TTG':'L',
    'TAC':'Y', 'TAT':'Y', 'TAA':'_', 'TAG':'_',
    'TGC':'C', 'TGT':'C', 'TGA':'_', 'TGG':'W'}


['ATA', 'ATC', 'ATT', 'ATG', 'ACA', 'ACC', 'ACG', 'ACT', 'AAC', 'AAT', 'AAA', 'AAG', 'AGC', 'AGT', 'AGA', 'AGG', 'CTA', 'CTC', 'CTG', 'CTT', 'CCA', 'CCC', 'CCG', 'CCT', 'CAC', 'CAT', 'CAA', 'CAG', 'CGA', 'CGC', 'CGG', 'CGT', 'GTA', 'GTC', 'GTG', 'GTT', 'GCA', 'GCC', 'GCG', 'GCT', 'GAC', 'GAT', 'GAA', 'GAG', 'GGA', 'GGC', 'GGG', 'GGT', 'TCA', 'TCC', 'TCG', 'TCT', 'TTC', 'TTT', 'TTA', 'TTG', 'TAC', 'TAT', 'TAA', 'TAG', 'TGC', 'TGT', 'TGA', 'TGG']


**Filtering**

['ATT', 'CTT', 'GTT', 'TCT', 'TTC', 'TTT', 'TTA', 'TTG', 'TAT', 'TGT']


**Conditional**

['ATA', 'ATC', 'ATTATT', 'ATG', 'ACA', 'ACC', 'ACG', 'ACT', 'AAC', 'AAT', 'AAA', 'AAG', 'AGC', 'AGT', 'AGA', 'AGG', 'CTA', 'CTC', 'CTG', 'CTTCTT', 'CCA', 'CCC', 'CCG', 'CCT', 'CAC', 'CAT', 'CAA', 'CAG', 'CGA', 'CGC', 'CGG', 'CGT', 'GTA', 'GTC', 'GTG', 'GTTGTT', 'GCA', 'GCC', 'GCG', 'GCT', 'GAC', 'GAT', 'GAA', 'GAG', 'GGA', 'GGC', 'GGG', 'GGT', 'TCA', 'TCC', 'TCG', 'TCTTCT', 'TTCTTC', 'TTTTTT', 'TTATTA', 'TTGTTG', 'TAC', 'TATTAT', 'TAA', 'TAG', 'TGC', 'TGTTGT', 'TGA', 'TGG']
