# Introduction

## Recap

Four (rather packed) sessions:
* Python basics
* Numpy, matplotlib
* Pandas, images
* Editors, Q&A

## Feedback

Python sessions ($n=8$):
* Useful &ndash; very useful
* Level: a bit too advanced
* Intermediate to high applicability
* Open session was not very popular
* Great interest in ML / deep learning
* Future attendance (likely) depends on topics

--> Shorter sessions, classic format, more examples  
New: rounding off featuring a small "package of the week"

## Schedule (updated)

| Date | Topic
| :--- | :---
| **01.05.** | Object-oriented programming, exception handling & assertions
| **08.05.** | Code organization, package management & virtual environments
| 15.05. | Plotting revisited: plotnine, matplotlib, seaborn, dash/plotly, bokeh
| 22.05. | Recap & practice &ensp; *alternatively, continue already with*:
| ... | Machine learning with Python using scikit-learn <br> Graphs (igraph), single-cell data (scanpy, anndata)

**Shared sessions** (date tbd):
* Building pipelines using snakemake
* Containerization using Docker (tbd)


## This session

* String formatting primer
* Object-oriented programming
* Exception handling & assertions
* Package of the week

# String formatting primer

https://pyformat.info/

https://realpython.com/python-f-strings/

In [1]:
from datetime import datetime

now = datetime.now()
print("Now:", now)

Now: 2020-04-30 21:28:33.148773


In [2]:
# %-formatting
print("Now: %d:%d" % (now.hour, now.minute)) 

Now: 21:28


In [3]:
# format strings (Python >= 2.6)
print("Now: {n:%H}:{n:%M}".format(n=now))

Now: 21:28


In [4]:
# f-strings (Python >= 3.6)
print(f"Now: {now.hour}:{now.minute}")

Now: 21:28


## Quick peek: regular expressions

Regex: sequence of characters that define a search pattern

https://regexr.com

In [5]:
import re

text = "My email address is jonas.windhager@uzh.ch, try to extract it!"

pattern = r"[\w\.-]+@[\w\.-]+\.\w+"
match = re.search(pattern, text)
print(match.group())

jonas.windhager@uzh.ch


# Object-oriented programming

Wikipedia:
> Object-oriented programming (OOP) is a **programming paradigm based on the concept of "objects"**, which can contain data and code. In OOP, computer programs are designed by making them out of objects that interact with one another.

Q: *Is it used for data science?*  
A: *Yes, it is! In fact, you have been using it already.*

Q: *Is it useful for data science? Why do I need to know about it?*  
A: *Even if you don't use it explicitly, OOP is used by most third-party packages.*

## Objects in Python

In Python, **everything is an object.**  
OOP terminology: &nbsp; *classes* $\mathrel{\hat{=}}$ *types* &nbsp; *instances* $\mathrel{\hat{=}}$ *objects*

In [6]:
x = "Hello, World!" # x points to an object of type str
print(x.lower())  # call the object's lower() method

hello, world!


In [7]:
# even types themselves are objects!
t = type(x)
print("Type of x:", t)
print("Type of the type of x:", type(t))

Type of x: <class 'str'>
Type of the type of x: <class 'type'>


In [8]:
# also, functions are objects!
def hello(name):
    print(f"Hello, {name}!")

my_hello = hello
my_hello("Jonas")

Hello, Jonas!


### Inspecting objects

In [9]:
x = "Hello, World!"

In [10]:
print("ID of x:", id(x))
print("Type of x:", type(x))

ID of x: 139871589160560
Type of x: <class 'str'>


In [11]:
print("Attributes of x:", dir(x))

Attributes of x: ['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isascii', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']


## Introduction by example: cats

In [12]:
from datetime import datetime
from dateutil.relativedelta import relativedelta

def meow(name, birthday):
    age = relativedelta(datetime.today(), birthday)
    print(f'{name} is {age.years} years old and meows.')

tabby_name = 'Tabby'
tabby_birthday = datetime(2017, 10, 12)

dixie_name = 'Dixie'
dixie_birthday = datetime(2015, 12, 18)

meow(tabby_name, tabby_birthday)
meow(dixie_name, dixie_birthday)

Tabby is 2 years old and meows.
Dixie is 4 years old and meows.


In [13]:
from datetime import datetime
from dateutil.relativedelta import relativedelta

def meow(name, birthday):  # only makes sense for cats
    age = relativedelta(datetime.today(), birthday)
    print(f'{name} is {age.years} years old and meows.')

# variables related by name
tabby_name = 'Tabby'
tabby_birthday = datetime(2017, 10, 12)

# same data structure as before
dixie_name = 'Dixie'
dixie_birthday = datetime(2015, 12, 18)

meow(tabby_name, tabby_birthday)
meow(dixie_name, dixie_birthday)

Tabby is 2 years old and meows.
Dixie is 4 years old and meows.


### Problems
* Related variables of different kind (implicity, verbosity)
* Function that only makes sense for specific kind of data

## Introduction by example: dictionary cats

In [14]:
from datetime import datetime
from dateutil.relativedelta import relativedelta

def meow(cat):  # fewer parameters
    age = relativedelta(datetime.today(), cat['birthday'])
    print(f'{cat["name"]} is {age.years} years old and meows.')

# related information stored in same data structure
tabby = {'name': 'Tabby', 'birthday': datetime(2017, 10, 12)}
dixie = {'name': 'Dixie', 'birthday': datetime(2015, 12, 18)}

meow(tabby)
meow(dixie)

Tabby is 2 years old and meows.
Dixie is 4 years old and meows.


In [15]:
from datetime import datetime
from dateutil.relativedelta import relativedelta

def meow(cat):  # still only makes sense for cats
    age = relativedelta(datetime.today(), cat['birthday'])
    print(f'{cat["name"]} is {age.years} years old and meows.')

# data of different kinds is stored in dictionaries
tabby = {'name': 'Tabby', 'birthday': datetime(2017, 10, 12)}
dixie = {'name': 'Dixie', 'birthday': datetime(2015, 12, 18)}

meow(tabby)
meow(dixie)

Tabby is 2 years old and meows.
Dixie is 4 years old and meows.


### Problems

* Leass readable, more clutter
* Hidden assumptions wrt. data structure
* Sequence types should not store data of different kinds
* IDE features such as syntax error detection, code completion and refactoring don't play well with "object dictionaries" (dictionary items used like variables)

## Introduction by example: object-oriented cats

In [16]:
class Cat:    
    def __init__(self, name, birthday):
        self.name = name
        self.birthday = birthday
        
    def meow(self):
        age = relativedelta(datetime.today(), self.birthday)
        print(f'{self.name} is {age.years} years old and meows.')
        
tabby = Cat('Tabby', datetime(2017, 10, 12))
dixie = Cat('Dixie', datetime(2015, 12, 18))
tabby.meow()
dixie.meow()

Tabby is 2 years old and meows.
Dixie is 4 years old and meows.


### Classes, attributes & methods

In [17]:
class Cat:  # class: "blueprint for instances"   
    def __init__(self, name, birthday):  # initializer
        self.name = name          # attributes: specific to object
        self.birthday = birthday  # (self points to the current object)
        
    def meow(self):  # method: function associated with object
        age = relativedelta(datetime.today(), self.birthday)
        print(f'{self.name} is {age.years} years old and meows.')

In [18]:
# create instances of class Cat
tabby = Cat('Tabby', datetime(2017, 10, 12))
dixie = Cat('Dixie', datetime(2015, 12, 18))
print('Types of Tabby and Dixie:', type(tabby), type(dixie))

Types of Tabby and Dixie: <class '__main__.Cat'> <class '__main__.Cat'>


In [19]:
# access instance attribute
print(tabby.name)

Tabby


In [20]:
# call instance method
dixie.meow()

Dixie is 4 years old and meows.


### Class attributes & static methods

In [21]:
class Cat:
    count = 0  # class attribute: same for all instances
    
    def __init__(self, name, birthday):
        self.name = name
        self.birthday = birthday
        Cat.count += 1
        
    def meow(self):
        age = relativedelta(datetime.today(), self.birthday)
        print(f'{self.name} is {age.years} years old and meows.')
        
    @staticmethod  # decorator (syntactic sugar)
    def kill_all():  # static method: function associated with class
        Cat.count = 0

In [22]:
tabby = Cat('Tabby', datetime(2017, 10, 12))
dixie = Cat('Dixie', datetime(2015, 12, 18))

In [23]:
print(Cat.count)
Cat.kill_all()
print(Cat.count)

2
0


### Inheritance

In [24]:
class MovieCat(Cat):  # class Cat defined before
    def __init__(self, name, birthday, movie):
        super(MovieCat, self).__init__(name, birthday)  # call parent initializer
        self.movie = movie
        
    def act(self):
        print(f'{self.name} is playing in {self.movie}.')

In [25]:
norris = MovieCat('Mrs. Norris', datetime(1990, 11, 23), 'Harry Potter')
print('Type of Norris:', type(norris))
print('Is Norris a Cat?', isinstance(norris, Cat))

Type of Norris: <class '__main__.MovieCat'>
Is Norris a Cat? True


In [26]:
print(norris.name)
norris.meow()

Mrs. Norris
Mrs. Norris is 29 years old and meows.


In [27]:
print(norris.movie)
norris.act()

Harry Potter
Mrs. Norris is playing in Harry Potter.


### Magic methods

Magic methods add "magic" to Python classes.

See [Python 3 Data Model](https://docs.python.org/3/reference/datamodel.html) and [A Guide to Python's Magic Methods](https://rszalski.github.io/magicmethods)

In [28]:
class SizedCat(Cat):
    def __init__(self, name, birthday, size):
        super(SizedCat, self).__init__(name, birthday)
        self.size = size
    
    def __str__(self):
        return f'{self.name} (size: {self.size} cm)'
    
    def __lt__(self, other):
        return self.size < other.size

In [29]:
small_cat = SizedCat('Mrs. Norris', datetime(1990, 11, 23), 25)
large_cat = SizedCat('Mufasa', datetime(1980, 12, 13), 120)
print(f'Is {small_cat} larger than {large_cat}?', small_cat > large_cat)

Is Mrs. Norris (size: 25 cm) larger than Mufasa (size: 120 cm)? False


### Properties

Properties turn functions into dynamically computed attributes.

<font color="grey">Note: don't write getters/setters in Python, use properties instead.</font>

In [30]:
class LazyCat(Cat):
    def __init__(self, name, birthday):
        super(LazyCat, self).__init__(name, birthday)
        
    @property
    def age(self):
        return relativedelta(datetime.today(), self.birthday)
    
jonas = LazyCat('Jonas', datetime(1991, 1, 19))
print(jonas.age.years)

29


## Real-world examples

In previous sessions, we used the following classes, among others:
* Built-in types `int` `str` ...
* Numpy arrays `numpy.ndarray`
* Matplotlib [Figure](https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/figure.py#L219), [Axes](https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/axes/_axes.py#L71), ...
* Pandas [DataFrame](https://github.com/pandas-dev/pandas/blob/master/pandas/core/frame.py#L336) & [Series](https://github.com/pandas-dev/pandas/blob/master/pandas/core/series.py#L141)
* Scikit-learn's [GaussianMixture](https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/mixture/_gaussian_mixture.py#L434)
* ...

## Custom classes

Disclaimer: it's like French grammar &ndash; to every rule there is an exception.  
When to create custom classes is a design choice heavily based on experience.

### Some rules of thumb

<font color="red">Simplicity, readability and practicality count!</font> &nbsp; `import this`

> If it looks like a duck and quacks like a duck, it's a duck.

If you only need to encapsulate functionality (and not data/states), use a function.

Create classes only when built-in data types such as tuples are **not sufficient to represent your data**, e.g. do not create a custom 3DPoint class (use tuples instead).

Create classes for **related data of different kinds** and/or **functions that act on that data**.

**Reuse existing classes** (e.g. from other packages). When creating custom classes, ensure that the classes work well together with the main packages used in your project.

### Example use cases

A selection of use cases for which I used custom classes in the past:
* Cats :-)
* File readers
* Neural network models
* Customizing third-party packages
* Implementing extensions, e.g. CellProfile modules
* Working with third-party packages, e.g. deep learning
* Interactive notebooks & graphical user interfaces (GUIs)

## Remarks

My advice: keep OOP in mind, but only start using it when you have a use case.

Not covered in this Python sessions:
* Abstract base classes (ABC)
* Multiple inheritance & method resolution order (MRO)
* OOP design patterns, e.g. proposed by Gang of Four (GoF)
* ...

# Exception handling & assertions

https://docs.python.org/3/tutorial/errors.html

In [31]:
# syntax errors: Python won't execute the code
message = 'Don't'

SyntaxError: invalid syntax (<ipython-input-31-59c2c7fcf9f8>, line 2)

In [32]:
# exceptions: anomalous or exceptional conditions during execution
x = 12 / 0

ZeroDivisionError: division by zero

## Handling exceptions

In Python, *exceptions* are instances of a class that derives from `BaseException`.

In [33]:
try:
    x = 12 / 0
except ZeroDivisionError:
    print("Oops, dividing by zero!")

print("This line will be executed.")

Oops, dividing by zero!
This line will be executed.


Most commonly used for checking user input, e.g. to catch errors while reading files.

Built-in exception types: https://docs.python.org/3.7/library/exceptions.html

## Raising exceptions

In [34]:
def real_sqrt(x):
    if x < 0:
        raise ValueError('x is negative')
    return x ** 0.5

try:
    r = real_sqrt(-3)
except ValueError as e:
    print(e)

x is negative


## Custom exceptions

In [35]:
class NegativeNumberException(Exception):
    pass

def real_sqrt(x):
    if x < 0:
        raise NegativeNumberException()
    return x ** 0.5

try:
    r = real_sqrt(-3)
except NegativeNumberException:
    print("The number is negative")

The number is negative


## Clean-up actions

In [36]:
try:
    x = 12 / 0
except ZeroDivisionError:
    print("Oops, dividing by zero!")
finally:
    print("This will be executed, no matter what.")

Oops, dividing by zero!
This will be executed, no matter what.


Useful e.g. for file operations:

In [37]:
f = open('../Data/iris.csv')
try:
    pass  # read file
except IOError as e:
    print("Error reading file:", e)
finally:
    f.close()

Note: if possible, use context managers (`with`) instead, see [session 3](https://github.com/BodenmillerGroup/IntroDataAnalysis/blob/master/python/03_pandas_images.ipynb).

## Assertions

In [38]:
def real_sqrt(x):
    assert x >= 0, "x is negative"
    # equivalent to:
    # if __debug__ and not x >= 0:
    #     raise AssertionError("x is negative")
    return x ** 0.5

real_sqrt(-3)

AssertionError: x is negative

## Best practices

* Don't overuse exceptions
* Check user input using exceptions
* Use built-in exceptions unless you really custom exception types
* Use `assert` for debugging (only for conditions that should never occur)
* If possible, use `with` instead of try-finally for file operations

# Package of the week: tqdm

In [39]:
from time import sleep
# from tqdm import tqdm
from tqdm.notebook import tqdm

for i in tqdm(range(100)):
    sleep(.1)  # Sleep for 100 ms

HBox(children=(FloatProgress(value=0.0), HTML(value='')))




# Next session

May 8, 2020, 10.00 am

Code organization, package management & virtual environments