# Introduction

Python 3 for data analysis

Weekly crash course on a new topic (see schedule)  
Practice at home & ask for individual help when needed

Slides, data & material: https://github.com/BodenmillerGroup/IntroDataAnalysis  
Mattermost: https://mattermost.dqbm.uzh.ch/department/channels/data-analysis-intro

BIO134 online exercises: https://mnf.openedx.uzh.ch  
Register on https://mnf.openedx.uzh.ch/register and send me your email address

## Prerequisites

None - almost :-) attention required!

Have a working installation of Anaconda for Python 3.7

Target audience: complete beginners & Python beginners

## Weekly sessions

Every Friday, 10:00-12:00  
https://zoom.us/j/854293838

| Date | Topics |
| :--- | :---- |
| <font color='red'> 24.03. </font> | Introduction
| 27.03. | Programming with Python
| 03.04. | The `numpy` package, plotting with `matplotlib`
| 10.04. | Processing tabular data with `pandas`, image processing
| <font color='red'> 14.04. </font> | Version control using Git, <font color='gray'>development environments, package management</font>
| ... | COVID-19 modeling, ML with scikit-learn, deep learning, ...

## Practicing at home

Resources will be provided at the end of every session

How to get help:
1. Carefully read the error message & documentation
2. Break down the code into small units & test them separately
3. Look for help online - Google, StackOverflow, ...
4. Contact me (Jonas Windhager) on Mattermost

# Programming with Python

Session aims:
* Learn the Python language syntax
* Get familiar with the Python Standard Library

See https://docs.python.org/3.7/library/index.html

This is an interactive session - code along in Spyder!

## The Spyder IDE

<img src="img/spyder.png">

## The Spyder IDE

<img src="img/spyder_runsettings.png">

## Hello, World!

In [1]:
# This is a comment

message = "Hello, World!"
print(message)

Hello, World!


Variables:
* Name `message` (letters/digits/underscore, no [keywords](https://docs.python.org/3.7/reference/lexical_analysis.html#keywords))
* Value `"Hello, World!"`
* Type `str` (inferred from *literal*)

Exercise: modify the code such that it prints your name.

## Types
    
| Category | Types | Example literals
| :--- | :--- | :---
| Boolean | `bool` | `True` `False`
| Numeric | `int` `float` `complex` | `123` `1.23` `1+23j`
| <font style="color: gray">Sequences</font> | `list` `tuple` `range` | `[1, 2, 3]` `(4, 5)` `range(6)`
| <font style="color: gray">Character strings</font> | `str` | `"Hello, World!"` `'Bye!'`
| <font style="color: gray">Dictionaries</font> | `dict` | `{"name": "Jonas", "age": 29}`
| Null | `None` | `None`

Iterators, sets, context managers, modules, functions, classes, methods, types, ...  
See https://docs.python.org/3.7/library/stdtypes.html

To get the type of a variable/literal: `type(x)`  
To *convert* from one type to another: e.g. `int(7.6)`

What is the value of `int(7.6)`?

In [2]:
int(7.6)  # floor conversion!

7

What is the type of `'0.1'`? How about `0,1`?

In [3]:
type('0.1')  # watch the quotes!

str

In [4]:
x = 0,1  # tuples also work without parentheses
type(x)

tuple

Find non-boolean literals that convert to boolean `False`.

In [5]:
bool(0)  # similar for floats

False

In [6]:
bool([])  # similar for lists and dictionaries

False

In [7]:
bool(None)

False

Can lists/tuples/dictionaries contain elements of varying types?

In [8]:
[3, "abc", [1, 2, 3]]

[3, 'abc', [1, 2, 3]]

In [9]:
(7, 4.5, {'name': 'Jonas'})

(7, 4.5, {'name': 'Jonas'})

In [10]:
{'string key': True, 123: None}

{'string key': True, 123: None}

## User input

In [11]:
number = input('Please enter a number: ')
print(number)

Please enter a number: 123
123


In [12]:
type(number)  # always a character string!

str

In [13]:
n = int(number)
type(n)

int

## Operators

| Category | Operators | Example *expressions*
| :--- | :--- | :--- 
| Assignment | `=` | `x = 123`
| Arithmetic | `**` `*` `/` `%` `+` `-` | `1 + 2` `2 ** 3` `3 % 2`
| Comparison | `in` `not in` `is` `is not` <br> `<` `<=` `>` `>=` `!=` `==` | `3 in [1, 2, 3]` `3 is int` <br> `3 < 7`
| Boolean | `not` `and` `or` | `3 < 7 and not 3 + 4 < 7`
| Concatenation | `+` | `"He" + 'llo'` `[1, 2] + [3]`

Operation is dependent on operand type (e.g. concatenation vs addition)

**Operator precedence**: *PEMDAS* & left-to-right  
**P**arentheses, **E**xponentiation, **M**ultiplication and **D**ivision, **A**ddition and **S**ubtraction

Arithmetic operators can be combined with the assignment operator  
e.g. `x += 2` is equivalent to `x = x + 2`

What is the value of `(1 + 2) ** 3 / 2 * 4 - 5`? <br>
First compute manually, then validate using Python.

In [14]:
(1 + 2) ** 3 / 2 * 4 - 5

49.0

What is the type of `4 / 2`?

In [15]:
4 / 2

2.0

In [16]:
type(4 / 2)

float

You look at the clock and it is exactly 2pm. You set an alarm to go off in 276 hours.  
At what time does the alarm go off? Hint: use the modulo operator `%`.

In [17]:
(14 + 276) % 24

2

What is the type of comparisons such as `3 < 7`?

In [18]:
3 < 7

True

In [19]:
type(3 < 7)

bool

Check for equality of `x = 3` and `y = 4`.

In [20]:
x = 3
y = 4

In [21]:
x == y

False

Check for equality of `x = 3` and `y = 4` without using the equality operator `==`.

In [22]:
x = 3
y = 4

In [23]:
x <= y and x >= y

False

In [24]:
not (x < y or x > y)

False

In [25]:
x in [y]

False

Ask the user for a number using `input(...)` and compute the cube of it.

In [26]:
number = input('Please enter a number: ')  # type: str
print(int(number) ** 3)

Please enter a number: 123
1860867


## Control flow

> Control flow is the order in which individual statements, instructions or function calls are executed or evaluated.

Generally, top to bottom, in the order the *statements* are written :-)

Python is an *imperative* (as opposed to *declarative*) programming language, i.e. the execution order of statements can be altered explicitly using *control structures*:

* Conditional execution (if)
* Repetitive execution (loops)
* <font color='gray'>Functions, exception handling, context managers, ...</font>

https://docs.python.org/3.7/reference/compound_stmts.html

### Conditional execution

In [27]:
a = 32
b = 33

if a > b:  # condition (bool)
    print("a is greater than b")
elif a == b:  # alternative condition(s) [optional]
    message = "a and b are equal"
    print(message)
else:  # if no condition evalutes to True [optional]
    print("a is less than b")
    
print("Done")

a is less than b
Done


Note: in conditional statements, max. (here: exactly) one *code block* is executed

Check (i.e., print) if a number provided by the user is odd or even.

In [28]:
x = int(input('Please enter a number: '))

if x % 2 == 0:
    print(x, 'is even')
else:
    print(x, 'is odd')

Please enter a number: 123
123 is odd


Compute the square root of a user-provided number if it is non-negative;  
show an error message otherwise.

In [29]:
x = int(input('Please enter a number: '))

if x >= 0:
    print(x ** 0.5)
else:
    print('The number is negative')

Please enter a number: 5
2.23606797749979


Compute the square root `s` of a user-provided number `x` if it is non-negative;  
show an error message otherwise. 

Also, if `s` is smaller than 50, increment it by 1.

In [30]:
x = int(input('Please enter a number: '))

if x >= 0:
    s = x ** 0.5
    if s < 50:
        s += 1
    print(s)
else:
    print('The number is negative')

Please enter a number: 49
8.0


#### A note on floating point arithmetic

In [31]:
x = 2
y = x ** 0.5
z = y ** 2

print(x)
print(y)
print(z)

2
1.4142135623730951
2.0000000000000004


### Repetitive execution (while)

In [32]:
a = 0
b = 5

while a < b:
    print(a)
    a += 2
    
print("Done")

0
2
4
Done


Q: what is the value of `a` after executing the program?

**Schreibtischtest**: for every single iteration, write down what the program does

In [33]:
a = 0
b = 5

while a < b:
    a += 2

print(a)

6


Print your name ten times.

In [34]:
i = 0
while i < 10:
    print(i, "Jonas")  # print takes multiple arguments
    i += 1

0 Jonas
1 Jonas
2 Jonas
3 Jonas
4 Jonas
5 Jonas
6 Jonas
7 Jonas
8 Jonas
9 Jonas


Sum up all numbers between 1 and 100.

In [35]:
total = 0
number = 1
while number <= 100:
    total += number
    number += 1
print(total)

5050


**Challenge**: Count the number of digits in the number `123456`.

In [36]:
x = 123456

In [37]:
num_digits = 0
while x > 0:
    x = int(x / 10)
    num_digits += 1
print(num_digits)

6


**Challenge**: print all [Fibonacci numbers](https://en.wikipedia.org/wiki/Fibonacci_number) smaller than 60.

$0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55$

$f_0 = 0$  
$f_1 = 1$  
$f_i = f_{i-1} + f_{i-2} \, \forall i \in \mathcal{N}, i > 1$

In [38]:
a = 0  # f_0
b = 1  # f_1
print(a, end=' ')  # named argument
while b < 60:
    print(b, end=' ')
    temp = a + b  # b is still needed for updating a!
    a = b
    b = temp

0 1 1 2 3 5 8 13 21 34 55 

In [39]:
a = 0
b = 1
print(a, end=' ')
while b < 60:
    print(b, end=' ')
    a, b = b, a + b  # read: (a, b) = (b, a + b)

0 1 1 2 3 5 8 13 21 34 55 

### Repetitive execution (for)

In [40]:
l = [0, 1, 2, 3, 4]

for x in l:  # any sequence, dictionary, string, ...
    print(x)
    
print("Done")

0
1
2
3
4
Done


Q: what is the value of `x` after executing the program?

In [41]:
l = [0, 1, 2, 3, 4]

for _ in l:  # convention: use underscore if variable isn't used
    pass  # do nothing

print(x)

4


Print `"Please don't leave, Vito!"` five (or more?) times using a for-loop.

In [42]:
for _ in [1, 2, 3, 4, 5]:
    print("Please don't leave, Vito!")

Please don't leave, Vito!
Please don't leave, Vito!
Please don't leave, Vito!
Please don't leave, Vito!
Please don't leave, Vito!


In [43]:
# range(stop) --> numbers in half-open interval [0, stop)
# range(start, stop[, step]) --> numbers in half-open interval [start, stop)

for _ in range(5):
    print("Please don't leave, Vito!")

Please don't leave, Vito!
Please don't leave, Vito!
Please don't leave, Vito!
Please don't leave, Vito!
Please don't leave, Vito!


Multiply all numbers from the sequence $2, 3, 7, 8, 12, 13, \dots$ that are lower than 300.

In [44]:
# 2, 3, 7, 8, 12, 13, ...

total = 1
for i in range(2, 300, 5):
    total *= i * (i + 1)
print(total)

261769029544794935822212903666972385253081750867380613436814347925115735076922895135074210331437638301170754496779255943374772861215328500339266039134047083071752454766572775186756761229497775785845467921812002156734022426134037362884951907762176


### Breaking & continuing loops

In [45]:
x = 0
while x < 5:
    print(x, end=' ')
    x += 1

0 1 2 3 4 

In [46]:
x = 0
while True:  # bad style!
    if x >= 5:
        break  # exit the loop
    print(x, end=' ')
    x += 1

0 1 2 3 4 

In [47]:
x = 0
while True:  # bad style!
    if x < 5:
        print(x, end=' ')
        x += 1
        continue  # immediately continue with iteration
    break

0 1 2 3 4 

## Sequences

In [48]:
list_ = ['ZH', 'BS']  # convention: trailing underscore for "keyword variables"
print(list_)

['ZH', 'BS']


In [49]:
list_[0]  # access first element

'ZH'

In [50]:
list_[1] = 'BL'  # set second element
print(list_)

['ZH', 'BL']


In [51]:
list_[-1]  # access last element

'BL'

In [52]:
len(list_)  # get the length of the list

2

Using a for-loop, print every second element in the following list:  
`['H', 'W', 'e', 'o', 'l', 'r', 'l', 'l', 'o', 'd']`

Hint: use the `range` function.

In [53]:
l = ['H', 'W', 'e', 'o', 'l', 'r', 'l', 'l', 'o', 'd']

In [54]:
for i in range(0, len(l), 2):
    print(l[i], end='')

Hello

## Sequences (cont.)

In [55]:
l = ['GE', 'ZH']
l.insert(2, 'BS')  # insert at index 2
l.append('ZH')  # append to the end of the list
print(l)

['GE', 'ZH', 'BS', 'ZH']


In [56]:
del l[-2]  # remove second-to-last element
l.remove('ZH')  # remove FIRST occurence from list
print(l)

['GE', 'ZH']


In [57]:
l + l  # attention: concatenation, not element-wise sum!

['GE', 'ZH', 'GE', 'ZH']

In [58]:
3 * l  # attention: concatenation, not element-wise multiplication!

['GE', 'ZH', 'GE', 'ZH', 'GE', 'ZH']

In [59]:
list(range(10))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Given the list `l`, create a new list `l2` that contains the squared elements of `l`.

`l = list(range(10))`

In [60]:
l = list(range(10))

In [61]:
l2 = []
for x in l:
    l2.append(x ** 2)
print(l2)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


In [62]:
l2 = [0] * len(l)
for i in range(len(l)):
    l2[i] = l[i] ** 2
print(l2)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


**Challenge**: Create a list of all prime numbers lower than 100. Hint: use nested for-loops.

Prime number: a number that is divisible only by itself and 1

In [63]:
primes = []
for candidate in range(2, 100):  # 0 and 1 are not a prime numbers!
    is_prime = True
    for divisor in range(2, candidate):
        if candidate % divisor == 0:
            is_prime = False
            break  # we found an integer divisor, no need to find another one
    if is_prime:
        primes.append(candidate)
print(primes)

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]


### Lists versus tuples

Lists are *mutable*, tuples are *immutable*

In [64]:
t = (1, 2, 3)
t[1] = 4

TypeError: 'tuple' object does not support item assignment

### Lists/tuples versus character strings

In many cases, character strings (`str`) can be used like "tuples of characters".

In [65]:
message = "Hello, World!"
print(len(message))
print(message[7])

13
W


## Strings

In [66]:
s = 'Hello World!'
words = s.split(' ')  # split string by a character
print(words)

['Hello', 'World!']


In [67]:
', '.join(words)  # concatenate a list of strings

'Hello, World!'

In [68]:
s[6:11]  # get a substring (note: also works for lists!)

'World'

In [69]:
s.find('World')  # find the index of a substring

6

In [70]:
s.replace('World', 'Jonas')  # replace a substring by another string

'Hello Jonas!'

## Dictionaries

In [71]:
d = {'one': 'eins', 'two': 'zwei', 'three': 'drei'}
print(d)

{'one': 'eins', 'two': 'zwei', 'three': 'drei'}


In [72]:
d['two']  # access by (unique) key

'zwei'

In [73]:
d['two'] = 2  # set value associated with existing key
d['four'] = 'quattro'  # insert new key/value-mapping
del d['one']  # delete existing mapping by key
print(d)

{'two': 2, 'three': 'drei', 'four': 'quattro'}


In [74]:
len(d)  # get the length of the dictionary

3

## Dictionaries (cont.)

In [75]:
d = {'one': 'eins', 'two': 'zwei', 'three': 'drei'}
keys = d.keys()  # list-like
values = d.values()  # list-like
items = d.items()  # list-like
print(keys) 
print(values)
print(items)

dict_keys(['one', 'two', 'three'])
dict_values(['eins', 'zwei', 'drei'])
dict_items([('one', 'eins'), ('two', 'zwei'), ('three', 'drei')])


In [76]:
for key in d.keys():
    print(key, d[key])

one eins
two zwei
three drei


In [77]:
for key, value in d.items():
    print(key, value)

one eins
two zwei
three drei


Count the number of occurences of each character in `"Hello World!"`.

In [78]:
occurences = {}
for character in "Hello World!":
    if character not in occurences:
        occurences[character] = 0
    occurences[character] += 1  # read-access!
print(occurences)

{'H': 1, 'e': 1, 'l': 3, 'o': 2, ' ': 1, 'W': 1, 'r': 1, 'd': 1, '!': 1}


## Functions

In [79]:
print('Start')

def pythagoras(a, b):
    c2 = a ** 2 + b ** 2
    return c2 ** 0.5

c = pythagoras(3, 4)

print(c)

Start
5.0


Function definition:
* Function name `pythagoras`
* Function arguments `a, b` (optional)
* Function body, with optional `return`

Built-in functions: https://docs.python.org/3.7/library/functions.html  
`print()` `input()` `len()` `type()` and type (conversion) functions, incl. `range()`

Write a function that computes the area of a circle of a given radius $r$.

$A = \pi r^2 \approx 3.141593 r^2$

In [80]:
def circle_area(r):
    pi = 3.141593
    return r ** 2 * pi

circle_area(7)  # example call

153.938057

Write a function that computes the area of a cylinder of a given height $h$.  
Try to re-use the previously implemented function.

$V = \pi r^2 h \approx3.141593 r^2 h$

In [81]:
def circle_area(r):
    pi = 3.141593
    return r ** 2 * pi

def cylinder_volume(r, h):
    return circle_area(r) * h

cylinder_volume(7, 3)  # example call

461.814171

Write a function that computes $n!$ for a given number $n$.  
Try to implement it as a *recursive function*, i.e. as a function calling itself.

$n! = n \cdot (n-1) \cdot (n-2) \cdot \ldots \cdot 1$

In [82]:
def fact(n):
    if n <= 1:
        return 1
    return n * fact(n - 1)

fact(10)  # example call

3628800

Note: try to avoid deep recursions like this one.

## Packages & modules

Package: collection of modules that can be installed using a package manager

Formally, modules are files ending with ".py" containing Python code.  
They are a way of distributing code across files, supporting re-usability.

In [83]:
import math  # built-in module
from math import cos
from math import tan as tangent

print(math.sin(1))
print(cos(1))
print(tangent(1))

0.8414709848078965
0.5403023058681398
1.5574077246549023


### A selection of built-in modules

A default set of modules is provided with the [Python Standard Library](https://docs.python.org/3.7/library/).

| Module | Description
| :--- | :---
| `argparse` | Parser for command-line options and arguments
| `datetime` | Basic date and time types
| `math` | Mathematical functions
| `random` | Pseudo-random number generation
| `re` | Regular expression operations
| `os` `os.path` <br> `pathlib` `shutil` | File/directory- and other system operations
| `turtle` | Turtle graphics, a robot turtle for learning to code

## Practice material

This session: slides & video recording

BIO134 Open edX course: Python for Biologists; weeks 1-7

Online Python challenge platforms, e.g. https://edabit.com/challenges/python3

Online book for reference: http://openbookproject.net/thinkcs/python/english3e

In [84]:
from turtle import *

color('red', 'yellow')
begin_fill()

while True:
    forward(200)
    left(170)
    if abs(pos()) < 1:
        break
        
end_fill()
done()