# Numerical data analysis and modeling with Python

Dr. Lars Bittrich (e-mail: bittrich-lars@ipfdd.de)
* from Leibniz-IPF in Dresden Germany
* theoretical physicist

Expericences in Python (since 2004):
* 4 years of teaching seminars on Computational Physics for physicists
* Numerical modeling of systems in quantum chaos; HPC simulations
* Numerical modeling and optimization of composite structures
* Experimental data evaluation and measurement automation

# Overview

* basic python structures and differences to other languages
    * lists
    * tuples
    * strings
    * dictionaries
    * classes
* intruduction to numpy
    * working with arrays
    * programming array code instead of loops
    

* matplotlib, scipy, pandas
* writing fast code
    * beyond numpy, numexpr, numba
    * extensions with cython
    * bindings with cython and pybind11
    * python multiprocessing
* requested topics

# Literature

Tutorials and Dokumentation:
* [www.python-course.eu](https://www.python-course.eu)
* [scipy-cookbook.readthedocs.io](https://scipy-cookbook.readthedocs.io)

* [docs.python.org/3.7](https://docs.python.org/3.7/)
* [numpy.org/doc/1.18](https://numpy.org/doc/1.18/)
* [matplotlib.org/tutorials/index.html](https://matplotlib.org/tutorials/index.html)
* [matplotlib.org/gallery/index.html](https://matplotlib.org/gallery/index.html)


# History of Python

* initially developed by Guido van Rossum in late 1980
* name reference to Monty Python's Flying Circus
* a language for easy teaching
* version 1.0 was released in January 1994
* version 2.0 in October 2000
* Python 3.0 released in 2008

# Installation

Distributions with package management:
* [Anaconda](https://www.anaconda.com/distribution/) -> most essential packages in default installation contained
* with Anaconda: `conda install -c conda-forge rise`
* On Linux or with manual install: python 3.7, numpy, scipy, matplotlib, cython, jupyter, rise

# Typical for python

* interpreted dynamically typed
* easy to learn
* intuitive syntax, often short and concise
* ´´comes with batteries included´´: powerful libraries like: 
    * email, urllib, zipfile, multiprocessing and many more
    * `https://docs.python.org/3/library/`
* huge ecosystem of libraries (mostly CPython, currently more than 200.000 projects on PyPI)
    * search: python [what you need right now]

In [1]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


# Interactive shell

Basic types:

In [1]:
print(3 + 4)
print("This is a test")
print("3+4=", 3+4)
x = 5
y = 6.1
z = x + y
c = 4 + 5.1j
print("x, y, z:", x, y, z)
print("complex number:", c)
print("x*y=", x*y)
print("x/2=", x/2)

7
This is a test
3+4= 7
x, y, z: 5 6.1 11.1
complex number: (4+5.1j)
x*y= 30.5
x/2= 2.5


# Operators

* operators: `+  -  *  /` as usual
* `**` for exponents; `%` modulo 
* comparison: `==`, `<`,  `>`,  `<=`,  `>=`,  `!=` 
* logical opertors: `|` (or); `&` (and); `^` (xor); `~` (inv); `<<`,  `>>` (shift)
* all operators can be combined with assignment: `x -= 2` 
* use `type(x)` to get data type
* division with integer types yield float! Use `//` for integer division.
* big number are possible

In [3]:
print("type of x:", type(x))  
print("type of x == type of y:", type(x)==type(y))
print("is x integer:", isinstance(x, int))
print("is y integer:", isinstance(y, int))
print("x float:", isinstance(x, float))
print("5/2:", 5/2)
print("5//2:", 5//2)

type of x: <class 'int'>
type of x == type of y: False
is x integer: True
is y integer: False
x float: False
5/2: 2.5
5//2: 2


In [4]:
print("x^2:", x**2)
print("x^2000:", x**2000, type(x**2000))
print("x==y", x==y)
print("x!=y", x!=y)
print("x>y", x>y)
print("2 or 6:", 2 | 6)
print("2 and 6:", 2 & 6)
print("2 xor 6:", 2 ^ 6)
print("13 mod 5:", 13 % 5)
print("-3 mod 5:", -3 % 5)

x^2: 25
x^2000: 870980981621721667557619549477887229585910374270538861664349322949828885340626741378473875079978788106556408717745512063620430237198833632508279082452303686110151064231029731844770912338990942384805856374197234719063103709717102341338066824141470072823629277351483947365609114807773659084382927185815811948911134617256915670416545076575696978688296466624839314010926757566865640829864607020478209736583890745367167211108478699317026667107461706636187594713500639193851672095206666878678364305056913162755577282544319220234472829986232534712298259962687261646269421115058998416069587800908920579349669236736913827347448298397172895803277667115579753494052712115087421629797155964976280104903975345046567785695453426825434185295645428424293462112053395137460028538491546496267026898886434214356481725450758426045441108355790758165263762707637097797493609152464857349232178547471268908595592165403420696716220544097087042534305174783814271995118445411602119001993267698963158366035219345

In [5]:
for i in [1, 3, 7]:
    j = i**2
    print(f'The square of {i} is {j}')

The square of 1 is 1
The square of 3 is 9
The square of 7 is 49


In [6]:
print('range(10)')
for i in range(10):
    print(i)
    
print('range(2,12,2)')
for i in range(2, 12, 2):
    print(i)
    
print('range(10,-1,-1)')
for i in range(10,-1,-1):
    print(i)

range(10)
0
1
2
3
4
5
6
7
8
9
range(2,12,2)
2
4
6
8
10
range(10,-1,-1)
10
9
8
7
6
5
4
3
2
1
0


In [7]:
k, l = 1, 1           # initialization with tuples
while k < 100:
    k, l = k+l, k
    print(k, l, 1.0*k/l)

2 1 2.0
3 2 1.5
5 3 1.6666666666666667
8 5 1.6
13 8 1.625
21 13 1.6153846153846154
34 21 1.619047619047619
55 34 1.6176470588235294
89 55 1.6181818181818182
144 89 1.6179775280898876


In [8]:
if i < 5:
    print(i)
    print(i**i)


if i < 5:
    print(i)
    print(i**i)
else:
    print("i is >=5")

if i < 5:
    print(i)
    print(i**i)
elif i < 10:
    print("5<=i<10")
elif i < 20:
    print("10<=i<20")
else:
    print("i>=20")

0
1
0
1
0
1


In [9]:
def sum_prod(x, y):
    """Computes sum and product of x and y
    """
    return x + y, x * y


print("Help for sum_prod:")
help(sum_prod)

print("Doc string of sum_prod:")
print(sum_prod.__doc__)

# only in Ipython/Jupyter
sum_prod?

Help for sum_prod:
Help on function sum_prod in module __main__:

sum_prod(x, y)
    Computes sum and product of x and y

Doc string of sum_prod:
Computes sum and product of x and y
    


In [10]:
def some_function(a, b, name='default'):
    print(a, b, name)
    
some_function(1, 2)
some_function(b=2, a=3, name="test")

def acceptsAnything(*args, **kwargs):
    print(args)
    print(kwargs)
    
acceptsAnything(1,2,4, some=2, parameters=3)

1 2 default
3 2 test
(1, 2, 4)
{'some': 2, 'parameters': 3}


In [12]:
print(sum_prod(4, 6))
sum_res, prod_res = sum_prod(4, 6)
print(sum_res, prod_res)
inline_sum_res = lambda x, y: (x+y, x*y)
print(inline_sum_res(4, 6))

(10, 24)
10 24
(10, 24)


In [13]:
a = None
b = True
c = False
if b:
    print("b is True")
if not a:
    print("None is always False")
if 4 in [1,2,3,4,5]:
    print("quick and intuitive syntax")
l1 = [1,2]
l2 = [1,2]
l1ref = l1
if l1==l2:
    print("element comparison")
if l1 is not l2:
    print("object identity")
if l1 is l1ref:
    print("works with references")

b is True
None is always False
quick and intuitive syntax
element comparison
object identity
works with references


# Tuples

immutable objects

In [15]:
a, b, c = 3, 4, 5
tup = 4, 6, 7
print(tup[0])
a, b, c = tup
print(tup + 4*(a,b,c))
tup

4
(4, 6, 7, 4, 6, 7, 4, 6, 7, 4, 6, 7, 4, 6, 7)


(4, 6, 7)

# Lists

In [18]:
l = []
l += [1,2,3]
print(l)
l.append(7)
print(l)
print("Remove element:",l.pop())
print("Now l is:",l)
print("Remove first:",l.pop(0))
print("Now:",l)
print("3*l", 3*l)

[1, 2, 3]
[1, 2, 3, 7]
Remove element: 7
Now l is: [1, 2, 3]
Remove first: 1
Now: [2, 3]
3*l [2, 3, 2, 3, 2, 3]


In [20]:
l += list(range(10))
print(l)
l.sort()
print(l)
l.reverse()
print(l)
print("every second element:", l[::2])
print("element 0:", l[0])
print("element 0 to 3 (excluding 4)", l[0:4]) # or just l[:4]
print("backwards", l[::-1])
l2 = [i**2 for i in range(10)]
print(l2)
print([i**2 for i in range(10) if i!=5])

[9, 8, 7, 6, 5, 4, 3, 3, 2, 2, 1, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 0, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9]
[9, 9, 8, 8, 7, 7, 6, 6, 5, 5, 4, 4, 3, 3, 3, 2, 2, 2, 1, 1, 0, 0]
every second element: [9, 8, 7, 6, 5, 4, 3, 3, 2, 1, 0]
element 0: 9
element 0 to 3 (excluding 4) [9, 9, 8, 8]
backwards [0, 0, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
[0, 1, 4, 9, 16, 36, 49, 64, 81]


# Caution:

lambda function save local context but for loops have just one context

In [21]:
funclist = []
for i in range(5):
    funclist.append(lambda x: i*x)
    
for f in funclist:
    print(f(1), f(2))

4 8
4 8
4 8
4 8
4 8


In [22]:
# the right way
funclist = []
for i in range(5):
    def getFunc(locali):
        return lambda x: locali*x
    funclist.append(getFunc(i))
    
for f in funclist:
    print(f(1), f(2))

0 0
1 2
2 4
3 6
4 8


# Strings

also immutable

In [26]:
s = 'Hello' + ' World' + 3*'!'
print(s)
print(s.endswith('!!!'))
print(s.startswith('!!!'))
fname = 'filename.txt'
print(fname.rsplit('.', maxsplit=1))
s = "some text  with\t various\n whitespaces"
print(s.split(' '))

a, b = 2.73, 2
print(f'a={a:4.1f}, 2*b={2*b:02}')
print("a=%4.1f"%(a))

Hello World!!!
True
False
['filename', 'txt']
['some', 'text', '', 'with\t', 'various\n', 'whitespaces']
a= 2.7, 2*b=04
a= 2.7


# Exercise

* generate a list of numbered filenames
* use `os.listdir` (import with `import os`) to display all txt files in the current directory

# Exercise

Write a function that prints the following pattern 
for variable size parameter n: (here n=2):

n=10:

# Exercise

* compute the sum of all palindrom 3 digit numbers (they read the same forward and backward e.g. 121 but not 123)

# Exercise

* compute the sum of all number between 1 and 1000 that are divisible by 3 or 5

# File access


In [33]:
with open('sometextfile.txt', 'w') as fp:
    fp.write('some line\r\n')
    fp.writelines(['some other line\n', 'yet another one\n'])
    
!cat sometextfile.txt

with open('sometextfile.txt', 'r') as fp:
    cont = fp.read()
    
print(repr(cont))

some line
some other line
yet another one
'some line\nsome other line\nyet another one\n'


In [34]:
with open('sometextfile.txt', 'r') as fp:
    lines = fp.readlines()
    
print(lines)

with open('sometextfile.txt', 'r') as fp:
    for line in fp:
        print(repr(line))

with open('sometextfile.txt', 'rb') as fp:
    cont = fp.read()
    
print(repr(cont))

['some line\n', 'some other line\n', 'yet another one\n']
'some line\n'
'some other line\n'
'yet another one\n'
b'some line\r\nsome other line\nyet another one\n'


# Exercise

* open a file for writing and write random numbers (use `random.random`) of 4 columns and 10 lines separated by `,` (a csv file)

# Exercise

* write a simple csv file parser to read this file again and generate a list of lists with numbers
* optional if you are done early: handle missing data and additional tabs and spaces

# Dictionaries

In [4]:
d = {'a': 1, 'b':2}
d['c'] = 5
locals().update(d)
print(a, b, c)
for item in d:
    print(item, d[item])
list(d.keys())
d.update()

1 2 5
a 1
b 2
c 5


dict_values([1, 2, 5])

# Exercise

* setup a dictionay that maps the numbers 0..9 to letters in the alphabet

# Exercise

* use this dictionary to translate a list of random integer numbers between 0..9 (`random.randint`)

# Classes


In [5]:
from math import sqrt

class PolyLine:
    def __init__(self, points):
        self.points = points
        assert type(points)==list
        assert len(points)>1
        for p in points:
            assert type(p)==list
            assert len(p)==2
            
    def length(self):
        dp = lambda p1, p2: sqrt((p1[0]-p2[0])**2 + (p1[1]-p2[1])**2)
        return sum(map(lambda p1p2: dp(p1p2[0], p1p2[1]), 
                       zip(self.points[:-1], self.points[1:])))
    
    def appendPoint(self, p):
        self.points.append(p)
    
line = PolyLine([[0,0],[1,1],[2,2]])
print(line.length())

2.8284271247461903


In [6]:
class ClosedPolyLine(PolyLine):
    def length(self):
        dp = lambda p1, p2: sqrt((p1[0]-p2[0])**2 + (p1[1]-p2[1])**2)
        return super().length() + dp(self.points[0], self.points[-1])
    
    def appendPoint(self, *args, **kwargs):
        raise NotImplementedError("makes no sense for closed polylines")
        
cpline = ClosedPolyLine([[0,0],[1,1],[0,2]])
print(cpline.length())
cpline.appendPoint([1,1])

4.82842712474619


NotImplementedError: makes no sense for closed polylines

# Special methods


In [None]:
class AdvancedPolyLine(PolyLine):
    def __add__(self, other):
        return AdvancedPolyLine(self.points + other.points)
    
    def __len__(self):
        return len(self.points)
    
    def __getitem__(self, index):
        return self.points[index]
    
aline = AdvancedPolyLine([[0,0],[1,1],[2,2]])
print(len(aline))
print(len(aline+aline))
print(aline[1])