In [1]:
import course;course.header()

# Advanced Python Course 
## Mobi Heidelberg WS 2021/22
### by Christian Fufezan 

christian@fufezan.net

https://fufezan.net

<img src="./images/cc.png" alt="drawing" width="200" style="float: left;"/>


# The csv module


There are several ways to interact with files that contain data in a "comma separated value" format.

We cover the [basic csv module](https://docs.python.org/3/library/csv.html), as it is sometimes really helpful to retain only a fraction of the information of a csv to avoid memory overflow.

In [2]:
import csv

with open("../data/amino_acid_properties.csv") as aap:
    aap_reader = csv.DictReader(aap, delimiter=",") 
    for line_dict in aap_reader:
        print(line_dict)
        break

{'Name': 'Alanine', '3-letter code': 'Ala', '1-letter code': 'A', 'Molecular Weight': '89.1', 'Molecular Formula': 'C3H7NO2', 'Residue Formula': 'C3H5NO', 'Residue Weight': '71.08', 'pka1': '2.34', 'pka2': '9.69', 'pkaX': '', 'pI': '6.0', 'hydropathy index (Kyte-Doolittle method)': '1.8', 'Accessible surface': '44.1', 'hp_type': 'neutral', 'hp_color': 'green', 'pk-state': 'neutral'}


Print not always very readable - use pretty print! :)

In [3]:
import pprint
pprint.pprint(line_dict)

{'1-letter code': 'A',
 '3-letter code': 'Ala',
 'Accessible surface': '44.1',
 'Molecular Formula': 'C3H7NO2',
 'Molecular Weight': '89.1',
 'Name': 'Alanine',
 'Residue Formula': 'C3H5NO',
 'Residue Weight': '71.08',
 'hp_color': 'green',
 'hp_type': 'neutral',
 'hydropathy index (Kyte-Doolittle method)': '1.8',
 'pI': '6.0',
 'pk-state': 'neutral',
 'pka1': '2.34',
 'pka2': '9.69',
 'pkaX': ''}


The hydropathy index is the energy released or required ot transfer the amino acid from water to a hydrophobic environment.

 - Arg: +4.5 kcal/mol
 - Ile: -4.5 kcal/mol

We can also use the csv module to write csvs, or tab separated value files if we change the delimiter to "\t"

In [4]:
with open("../data/test.csv", "w") as output:
    aap_writer = csv.DictWriter(output, fieldnames=["Name", "3-letter code"]) # 1-letter code wird nicht definiert
    aap_writer.writeheader()
    aap_writer.writerow({"Name": "Alanine", "3-letter code": "Ala", "1-letter code": "A"})

ValueError: dict contains fields not in fieldnames: '1-letter code'

In [5]:
!cat ../data/test.csv

Der Befehl "cat" ist entweder falsch geschrieben oder
konnte nicht gefunden werden.


## Fix it!

In [6]:
# fix it
with open("c", "w") as output:
    aap_writer = csv.DictWriter(output, fieldnames=["Name", "3-letter code"], extrasaction='ignore')
    aap_writer.writeheader()
    aap_writer.writerow({"Name": "Alanine", "3-letter code": "Ala", "1-letter code": "A"}) 
    # Eintrag für 1-letter code wird ignoriert

# Collections - high performance containers ... sorta

## [collections.Counter](https://docs.python.org/3.7/library/collections.html#counter-objects)
A counter tool is provided to support convenient and rapid tallies. For example

In [7]:
from collections import Counter
s = """
MQRLMMLLATSGACLGLLAVAAVAAAGANPAQRDTHSLLPTHRRQKRDWIWNQMHIDEEK
NTSLPHHVGKIKSSVSRKNAKYLLKGEYVGKVFRVDAETGDVFAIERLDRENISEYHLTA
VIVDKDTGENLETPSSFTIKVHDVNDNWPVFTHRLFNASVPESSAVGTSVISVTAVDADD
PTVGDHASVMYQILKGKEYFAIDNSGRIITITKSLDREKQARYEIVVEARDAQGLRGDSG
TATVLVTLQDINDNFPFFTQTKYTFVVPEDTRVGTSVGSLFVEDPDEPQNRMTKYSILRG
DYQDAFTIETNPAHNEGIIKPMKPLDYEYIQQYSFIVEATDPTIDL RYMSPPAGNRAQVI
"""
Counter(s)

Counter({'\n': 7,
         'M': 8,
         'Q': 14,
         'R': 20,
         'L': 24,
         'A': 29,
         'T': 28,
         'S': 23,
         'G': 20,
         'C': 1,
         'V': 31,
         'N': 16,
         'P': 17,
         'D': 28,
         'H': 10,
         'K': 18,
         'W': 3,
         'I': 23,
         'E': 21,
         'Y': 13,
         'F': 13,
         ' ': 1})

In [8]:
# Counter objects can be added together
Counter("AABB") + Counter("BBCC")

Counter({'A': 2, 'B': 4, 'C': 2})

In [9]:
# Works with any type of object that are comparable
Counter([(1, 1), (1, 2), (2, 1), (1, 1)])

Counter({(1, 1): 2, (1, 2): 1, (2, 1): 1})

## [collections.deque](https://docs.python.org/3.7/library/collections.html#deque-objects)
Deque \[deck\] or double-ended queue can be used for many tasks, e.g. building a sliding window

In [12]:
from collections import deque
s = """MQRLMMLLATSGACLGLLAVAAVAAAGANPAQRDTHSLLPTHRRQKRDWIWNQMHIDEEKNTSLPHHVGKIKSSVSRKNAKYLLKGEYVGKVFRVDAETGDVFAIERLDRENISEYHLTA"""
window = deque([], maxlen=5)

In [13]:
for pos, aa in enumerate(s): # enumerate: Position und Element
    window.append(aa)
    print(window)
    if pos > 7:
        break

deque(['M'], maxlen=5)
deque(['M', 'Q'], maxlen=5)
deque(['M', 'Q', 'R'], maxlen=5)
deque(['M', 'Q', 'R', 'L'], maxlen=5)
deque(['M', 'Q', 'R', 'L', 'M'], maxlen=5)
deque(['Q', 'R', 'L', 'M', 'M'], maxlen=5)
deque(['R', 'L', 'M', 'M', 'L'], maxlen=5)
deque(['L', 'M', 'M', 'L', 'L'], maxlen=5)
deque(['M', 'M', 'L', 'L', 'A'], maxlen=5)


In [15]:
Counter(window)

Counter({'M': 2, 'L': 2, 'A': 1})

## [collections.defaultdicts](https://docs.python.org/3.7/library/collections.html#defaultdict-objects)
Defaultdicts are like dicts yet they treat missing values not with an error, thus testing if key exists is not neccessary and makes life easier :) Ofcourse, one needs to define the default value that is taken if a key is not existent. 


I use it a lot for counting 
```python
counter["error"] += 1
```
or collecting elements in lists
```python
sorter["typeA"].append({"name": "John"})
```

No more, let's check if I have the key and if not I need to initialize.

In [16]:
from collections import defaultdict

ddict_int = defaultdict(int) # Typ des dict wird festgelegt
#                        ^---- default factory
ddict_list = defaultdict(list)

In [17]:
ddict_int[10] += 10
ddict_int

defaultdict(int, {10: 10})

In [21]:
ddict_int[0] # an der Stelle 0 wurde nichts definiert

0

In [25]:
def default_factory_with_prefilled_dictionary():
    return {"__name": "our custom dict", "errors": 0}
ddict_custom = defaultdict(default_factory_with_prefilled_dictionary)

Does that work?

In [26]:
ddict_custom[10] += 10

TypeError: unsupported operand type(s) for +=: 'dict' and 'int'

In [27]:
ddict_custom["what_ever_key"]

{'__name': 'our custom dict', 'errors': 0}

In [28]:
ddict_custom[10]['errors'] += 10

In [29]:
ddict_custom

defaultdict(<function __main__.default_factory_with_prefilled_dictionary()>,
            {10: {'__name': 'our custom dict', 'errors': 10},
             'what_ever_key': {'__name': 'our custom dict', 'errors': 0}})