# Advanced Python Course 
## Mobi Heidelberg 2019
### by Christian Fufezan 

christian@fufezan.net

https://fufezan.net

<img src="./imgs/cc.png" alt="drawing" width="200" style="float: left;"/>


In [32]:
# %load topics.py
import pandas as pd
import psutil

pd.set_option("display.max_colwidth" , 300)

df_high_level = pd.DataFrame(
    data=[
        {'day': 'Monday', 'Topic': 'Check-In, recaps and functions'},
        {'day': 'Tuesday', 'Topic': 'Coding philosophy, data flow and some more useful std modules'},
        {'day': 'Wednesday', 'Topic': 'Test driven development, python module, sphinx'},
        {'day': 'Thursday', 'Topic': 'OOP - Object oriented programming'},
        {'day': 'Friday', 'Topic': 'Q&A and code clean up'},
        {'day': '', 'Topic': ''},
        {'day': 'Monday', 'Topic': ''},
        {'day': 'Tuesday', 'Topic': ''},
        {'day': 'Wednesday', 'Topic': ''},
        {'day': 'Thursday', 'Topic': ''},
        {'day': 'Friday', 'Topic': 'Q&A and Tutorium'},


    ]
)

df_details = pd.DataFrame(
    data=[
        {'day': 1, 'Topic': 'Check-in'},
        {'day': 1, 'Topic': 'Procedural stuff'},
        {'day': 1, 'Topic': "python basic in 5'"},
        {'day': 1, 'Topic': 'lists and generators'},
        {'day': 1, 'Topic': 'bisect module'},
        # ----------------------------
        {'day': 2, 'Topic': 'functions'},
        {'day': 2, 'Topic': 'csv module'},
        {'day': 2, 'Topic': 'Exercises'},
        {'day': 2, 'Topic': 'Zen of Python and general coding philosophy'},
        {'day': 2, 'Topic': 'basic plotting with plotly'},
        {'day': 2, 'Topic': "String format"},
        {'day': 2, 'Topic': 'dicts'},
        {'day': 2, 'Topic': 'collections module'},
        {'day': 2, 'Topic': 'itertools'},
        {'day': 2, 'Topic': 'data flow'},
        # -----------------------------
        {'day': 3, 'Topic': "Basic Python package"},
        {'day': 3, 'Topic': "Test Driven development"},
        {'day': 3, 'Topic': "Auto documentation with Sphinx"},
        # -----------------------------
        {'day': 4, 'Topic': "OOP"},
    ]
)


def display_topics(day=1, df=None):
    if df is None:
        df = df_details
    return df[df['day'] == day][['day', 'Topic']].head(20)


# Advanced Python - day 4
by Christian Fufezan

# Overview

In [33]:
display_topics(day=4)

Unnamed: 0,day,Topic
18,4,OOP


In the first three days we have coded functions that encapsulate our code and that wrangle our data when  passed into our function. 

This is termed [procedural programming](https://en.wikipedia.org/wiki/Procedural_programming)  

Python (and Java, C++, ...) is a object oriented programming language which adds a more natural level to programms. 

For example, instead of storing x, y, z coordinates in lists and then using these lists to calculate distances between two points in space, which could look like:
``` python
def calc_difference(x_coordinates, y_coordiantes, z_coordinates, index1=0, index2=1):
    ...
    return distance
```

it would be much more convenient to be able to 
``` python
a = Point(x1, y1, z1)
b = Point(x1, y1, z1)
difference = a - b
```

Another view of object oriented programming is that we attach functions to a customized data container and define how this data container behaves.

We call **classes** the blueprints of such customized data containers and **objects** initialized instances of a class. One can spawn many objects from one class, each of which will be unique. 

From the procedural prgramming, we used the terminology functions and variabels. In order to avoid confusion, functions that are associated to classes/objects are called **methods** and variables are calles **attributes** or **properties**.

In [38]:
# one uses the class declaration for class names
# quick reminder - PEP8! https://www.python.org/dev/peps/pep-0008/#class-names
class Sequence(object):
    def aa_distribution(self):
        return "Not implemented yet"
        # should raise NotImplementedError('!')

s1 = Sequence()
s1.aa_distribution()


'Not implemented yet'

The *object* in the brackets refer to the parent class from which our Sequence class inherits its properties. Here object is not neccessary since all classes inhert from object to start with.

We defined a new method **aa_distribution** which takes one argument **self**, this is always the case for functions (methods!) associated to objects. Think about it as passing the actual data container into our "function". 

## \__init__
One very importante method is **__init__** as it is called when a new instance is initialized.

Note: methods starting and ending on two _ have special meanings in Python and should not be used in order to avoid collisions. They ar called **magic functions**

In [39]:
from collections import Counter

class Sequence(object):
    def __init__(self, sequence):
        self.sequence = sequence
        # we store the sequence that is used to 
        # initialize this object into self.sequence
    
    def aa_distribution(self):
        return Counter(self.sequence) 
  
s1 = Sequence("WHEREISELVIS")
s1.aa_distribution()

Counter({'W': 1, 'H': 1, 'E': 3, 'R': 1, 'I': 2, 'S': 2, 'L': 1, 'V': 1})

Classes and their objects can each have methods and attributes. 

Class attributes can be used like:

In [40]:
from collections import Counter

class Sequence(object):
    
    total_initialized_sequence = 0
    
    def __init__(self, sequence):
        self.sequence = sequence
        Sequence.total_initialized_sequence += 1
        # every time a Sequence object is initialized, we increase
        # the counter of the class attribute
    
    def aa_distribution(self):
        return Counter(self.sequence)
        # should raise 

for _ in range(13):
    s1 = Sequence("AACCEE")

Sequence.total_initialized_sequence
# ^-- note: we are refereing to the actual class Sequence and not the instance s1

13

Methods that belong to the class are assiged by using a decorator

In [41]:
from collections import Counter

class Sequence(object):
    
    total_initialized_sequence = 0
    
    def __init__(self, sequence):
        self.sequence = sequence
        Sequence.total_initialized_sequence += 1
        # every time a Sequence object is initialized, we increase
        # the counter of the class attribute
    
    def aa_distribution(self):
        return Counter(self.sequence)
        # should raise 

    @classmethod
    def class_status(cls):
        print(f"We have initialized {Sequence.total_initialized_sequence} sequences")
        
for _ in range(3):
    s1 = Sequence("AACCEE") 

Sequence.class_status()

We have initialized 3 sequences


For the sake of readability - class methods argument is **cls** not **self**.

In [7]:
# note: each class has this method as well, which is why defining classmethods
#       is in IMHO not so useful ..
s1.class_status()

We have initialized 3 sequences


# more fun with class attributes

In [10]:
from collections import Counter

class Sequence(object):
    
    all_intialized_sequences = []
    
    def __init__(self, sequence):
        self.sequence = sequence
        # we collect all intialized sequences in the class
        self.all_intialized_sequences.append(self)
            
for _ in range(3):
    s1 = Sequence("AACCEE")

Sequence.all_intialized_sequences

[<__main__.Sequence at 0x118097d10>,
 <__main__.Sequence at 0x118097550>,
 <__main__.Sequence at 0x118097e50>]

## more magic functions

## \__str__
making the object more descriptive

In [42]:
class Sequence(object):
    def __init__(self, sequence):
        self.sequence = sequence

    def __str__(self):
        return "Sequence class mobi-HD, length {0}, id {1}".format(
            len(self.sequence),
            id(self)
        )

In [43]:
s1 = Sequence("ELVISLIVES")
print(s1)

Sequence class mobi-HD, length 10, id 4359224720


# \__add__ 
allowing adding of objects

In [52]:
class Sequence(object):
    def __init__(self, sequence):
        self.sequence = sequence
        
    def __str__(self):
        return """Sequence class mobi-HD, length {0}, id {1}, {2}""".format(
            len(self.sequence),
            id(self),
            self.sequence
        )

    def add(self, other):
        return self + other  
    
    def __add__(self, other):
        new_sequence_obj = Sequence(self.sequence + other.sequence)
        return new_sequence_obj
  

In [71]:
s1 = Sequence("ELVIS")
s2 = Sequence("LIVES")
s3 = s1 + s2
print(s1)
print(s2)
print(s3)
s1 += s2
print(s1.add(s2))


Sequence class mobi-HD, length 5, id 4360087504, ELVIS
Sequence class mobi-HD, length 5, id 4360034448, LIVES
Sequence class mobi-HD, length 10, id 4360088528, ELVISLIVES
Sequence class mobi-HD, length 15, id 4360086864, ELVISLIVESLIVES


# Comparisons
Often we want to sort objects stored in a list or check for equality.
But what does it mean that sequence_1 \< sequence_2 or sequence_1 == sequence_2 ?

In [72]:
# answer is that we need to define magic functions that are called by Python internals 
# in order to eval equality or to sort. Minimum is __eq__ and __lt__, respectively.

class Sequence(object):
    def __init__(self, sequence):
        self.sequence = sequence
        
    def __str__(self):
        return self.sequence
    
    def __eq__(self, other):
        return self.sequence == other.sequence
    
    def __lt__(self, other):
        # return True if self < other
        # I chose sequence length but it could equaliy be anything 
        # one can computer for both sequence ...
        self_smaller = True
        if len(self.sequence) >= len(other.sequence):
            self_smaller = False
        return self_smaller

In [100]:
s1 = Sequence("ELVISLIVES")
s2 = Sequence("ELVISLIVES")
s3 = Sequence("ELVISISDEAD")
 
print("is s1 == s2 ?", s1 == s2)
print("is s1 != s3 ?", s1 != s3)

for sequence in sorted([s2, s3, s1], reverse=True):
    print(sequence)

is s1 == s2 ? True
is s1 != s3 ? True
ELVISISDEAD
ELVISLIVES
ELVISLIVES


Minimum set of magic functions that enables equality and sorting
\__eq__ for equality (and inequality) 
\__lt__ for sorting (less than)

For more possibilities see [python docu](https://docs.python.org/3/reference/datamodel.html#object.__lt__) (no need to know all of those for the exam)

# Make our class iterable

In [90]:

class Sequence(object):
    def __init__(self, sequence):
        self.sequence = sequence
        self._current_iter_state = 0
        
    def __iter__(self):
        self._current_iter_state = 0
        return self
    
    def __next__(self):
        
        if self._current_iter_state < len(self.sequence):
            current_aa = self.sequence[self._current_iter_state]
            self._current_iter_state += 1
            return current_aa
        raise StopIteration


In [91]:
s3 = Sequence("ELVISISDEAD")
for aa in s3:
    print(aa)
# next(s3)

E
L
V
I
S
I
S
D
E
A
D


In [94]:
for aa in s3:
    print(aa)

E
L
V
I
S
I
S
D
E
A
D


Again, what does it mean to iterate over our object ? Well it is on us to decide, why not iter over it using a sliding window ...

In [95]:
from collections import deque
class Sequence(object):
    def __init__(self, sequence):
        self.sequence = sequence   
        
    def __iter__(self):
        self._current_iter_state = 0
        self._sliding_window = deque([], maxlen=3)
        return self
    
    def __next__(self):
        if self._current_iter_state < len(self.sequence):
            current_aa = self.sequence[self._current_iter_state]
            self._sliding_window.append(current_aa)
            self._current_iter_state += 1
            return self._sliding_window
        raise StopIteration

In [97]:
s3 = Sequence("ELVISISDEAD")
for aa in s3:
    print(aa)

deque(['E'], maxlen=3)
deque(['E', 'L'], maxlen=3)
deque(['E', 'L', 'V'], maxlen=3)
deque(['L', 'V', 'I'], maxlen=3)
deque(['V', 'I', 'S'], maxlen=3)
deque(['I', 'S', 'I'], maxlen=3)
deque(['S', 'I', 'S'], maxlen=3)
deque(['I', 'S', 'D'], maxlen=3)
deque(['S', 'D', 'E'], maxlen=3)
deque(['D', 'E', 'A'], maxlen=3)
deque(['E', 'A', 'D'], maxlen=3)


## on demand - lazy loading et al. 

In [102]:
from collections import Counter

class Sequence(object):
    def __init__(self, sequence):
        self.sequence = sequence
        
    def aa_distribution(self):
        return Counter(self.sequence)
        # Problem is that we calculate aa_distribution every time
        # this method is called 


In [103]:
s1 = Sequence("ACGHCNASOINDQIEODHASDJALSKDJASDJ" * 100)

In [104]:
%timeit s1.aa_distribution()

166 ns ± 0.772 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


To be fast, let's calculate distribution on demand and only if we have not done before ...

In [18]:
from collections import Counter

class Sequence(object):
    def __init__(self, sequence):
        self.sequence = sequence
        self._aa_distribution = None
    
    def aa_distribution(self):
        if self._aa_distribution is None:
            self._aa_distribution = Counter(self.sequence)
        return self._aa_distribution


In [16]:
s1 = Sequence("ACGHCNASOINDQIEODHASDJALSKDJASDJ" * 100)

In [20]:
%timeit s1.aa_distribution()

156 ns ± 2.18 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


## accessing properties that do calculations on demand 

In [107]:
# Very slow ...

class Sequence(object):
    def __init__(self, sequence):
        self.sequence = sequence
        self.aa_distribution = Counter(self.sequence)

In [108]:
# on demand calculation, and if so then only once :)
class Sequence(object):
    def __init__(self, sequence):
        self.sequence = sequence
        self._aa_distribution = None
        
    @property
    def aa_distribution(self):
        if self._aa_distribution is None:
            self._aa_distribution = Counter(self.sequence)
        return self._aa_distribution

In [110]:
# s1 = Sequence("ACGHCNASOINDQIEODHASDJALSKDJASDJ" * 100)
# s1.aa_distribution

In [95]:
%timeit s1.aa_distribution

158 ns ± 8.68 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


# Object inheritance
Another major advantage of OOP is that blue-print properties can be inherited, thus reducing code dupplication.

Using inheritance can, however, also lead to complex data / class structure. Follow the Zen of Python! Not every method needs to have its own subclass

Note: The parents constructor (\__init__) is not called by default!

In [23]:
class SequenceBaseClass:
    def __init__(self, sequence):
        self.sequence = sequence

    def __add__(self, other):
        new_sequence_obj = SequenceBaseClass(self.sequence + other.sequence)
        return new_sequence_obj        

    def __len__(self):
        raise NotImplementedError(
            "If you inherit from SequenceBaseClass, "
            "you must define len yourself"
            # Note: you can split strings to make it more readable
        )
    
class Sequence(SequenceBaseClass):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
    
    def __len__(self):
        return len(self.sequence)
    
    
s1 = Sequence("SIRIFINDELVIS")

In [24]:
print(s1.sequence)
print(len(s1))

SIRIFINDELVIS
13


# Finally,
We can check if an object is based on a given class by using:

In [25]:
isinstance(s1, Sequence)

True

# Excersise No.5

Rewrite your code from the earlier days based on a class that is a *Sequence* class that has a property called *pi* which returns the pi of the sequence.

Create a list of objects, one for each sequence and sort the list by property pi


# Excersise No. 6*
Can you think of a way to turn a relative stringend property *average_hydropathy_index* to some method like calculate_property(type='hydropathy', sliding_window_size=5). Whereas type can be e.g. accessible surface (as) or hydropathy 

# Help - how to sort a list of objects based on some property ?


In [111]:
class Sequence(object):
    def __init__(self, sequence):
        self.sequence = sequence
        self._aa_distribution = None

    def __repr__(self):
        return self.sequence
    
    @property
    def number_of_alanine(self):
        return self.sequence.count('A')

In [112]:
import random

random_seqs = []
for i in range(20):
    r_seq = ''.join(random.choices('A_.,', k=random.randint(10,50)))
    random_seqs.append(Sequence(r_seq))

In [113]:
random_seqs

[AAA.,A_.,_._.,.,,A,_.AAA_A_A,A,
 A__AAA.,.,._,A__,,,_AA_....,_A,.A,_.._A,A,.,
 ,.._A..,._,.A_,_A__...,..,.._,...A.A.A._,.,._,
 ,.,A._,,...,
 .,AA__,._,___A_,_..,,,,.,,_.,
 AAA,AAA,_.,A,._,_.___.__,A___AA.,AA,A_,._.A,.,,A,
 ,A__.,AA.A,,AAA.,A,___A__..__.A__.,
 AA,,._,_AA._.,_A.,,_,,,.,,
 AA_A...,.A,
 .,__.,._A,,A_,_.A__.,A...,,_,
 A,AA.___,A,___A_._.,___,,.,A,_.___,AA.,____,
 _A,,,,,._,,_,.A_,A,.,,..AA_.A,_,,,A.__.AA,A,A,,
 _,,,,__,,_,AA_A_,
 _,,_.,,A___.A_A.,,A_A._.,AA_.,
 A,_A_,A.__,_.,
 _,,,A,__.AA..A,
 ..A,._..A_A_.,
 ,_.AAAA_._..,A_,__,A,.A.,
 ..A,A,.____,_AA.A_...,,.AA__.,.A.,.,.,_.,
 A.,._AA,_AA,,..._,,__._]

In [114]:
sorted(random_seqs, key=lambda x: x.number_of_alanine)

[,.,A._,,...,
 .,AA__,._,___A_,_..,,,,.,,_.,
 _,,,,__,,_,AA_A_,
 A,_A_,A.__,_.,
 ..A,._..A_A_.,
 AA_A...,.A,
 .,__.,._A,,A_,_.A__.,A...,,_,
 _,,,A,__.AA..A,
 AA,,._,_AA._.,_A.,,_,,,.,,
 A.,._AA,_AA,,..._,,__._,
 ,.._A..,._,.A_,_A__...,..,.._,...A.A.A._,.,._,
 _,,_.,,A___.A_A.,,A_A._.,AA_.,
 ,_.AAAA_._..,A_,__,A,.A.,
 A,AA.___,A,___A_._.,___,,.,A,_.___,AA.,____,
 ..A,A,.____,_AA.A_...,,.AA__.,.A.,.,.,_.,
 ,A__.,AA.A,,AAA.,A,___A__..__.A__.,
 AAA.,A_.,_._.,.,,A,_.AAA_A_A,A,
 A__AAA.,.,._,A__,,,_AA_....,_A,.A,_.._A,A,.,
 _A,,,,,._,,_,.A_,A,.,,..AA_.A,_,,,A.__.AA,A,A,,
 AAA,AAA,_.,A,._,_.___.__,A___AA.,AA,A_,._.A,.,,A]