# Machine Learning (Summer 2018)

## Practice Session 2: Numpy

April 17, 2018

## Today's Session

* Exercise Sheet 02
* Some remarks on Notebooks/Python
* Main topic: Numpy

# Some remarks on Notebooks/Python

## Notebook cells

Make sure that variables are properly initialized
* Initialization may happen in another cell
  * make sure that cell was executed
* executing a cell multiple times requires reinitialization

In [None]:
a=1

In [None]:
a

## Tab completion in notebooks

Hitting <kbd> &#x21B9; </kbd> (the tab key) shows possible continuations of the current word:

In [None]:
len

<kbd> &#x21E7; </kbd>+<kbd> &#x21B9; </kbd> (shift+tab) provides a short info.

### Lists

In [None]:
L = ['aligator', 'butterfly', 'caterpillar', 'dog']

In [None]:
for value in L:
    print(value)

In [None]:
len(L)

In [None]:
for index in range(len(L)):
    value = L[index]
    print(index,value)

In [None]:
for index,value in enumerate(L):
    print(index,value)

Some loops may be done more compactly by list comprehension:

In [None]:
L2 = []
for value in L:
    L2.append(len(value))
print(L2)

In [None]:
# shorter with list comprehension:
L2 = [len(value) for value in L]
print(L2)

List comprehensions can be combined:

In [None]:
Product = [(a,b) for a in L for b in L]
print(Product)

Zip can be used to transform a pair of lists to a list of pairs:

In [None]:
for a in zip(L,L2):
    print(a)

### String formating

In [None]:
name = 'Python'

Simple way:

In [None]:
'Hello ' + name + '!'

Better way: the `format()` methdo

In [None]:
'Hello {}!'.format(name)

In [None]:
hi = 'Hello'
'{} {}!'.format(hi,name)

In [None]:
'{1} says "{0}"'.format(hi,name)

In [None]:
'{b} says "{a}"'.format(a=hi,b=name)

In [None]:
'{b} says "{a}, {a}"'.format(a=hi,b=name)

In [None]:
'|{0:10}|{0:^10}|{0:>10}|'.format(hi)


In [None]:
'result = {:.2}'.format(.2312)

More information on string formating is available at [https://pyformat.info/]

## Numpy

### Background

Python:
* provides a small core of frequently used functions and types
  * high-level number objects like integers and floating point numbers
  * containers like lists and dictionaries
* can be extended for specific purposes
  * we will use some of these extensions for the practice session
  * we will provide some introduction to those extensions
  * today: Numpy

Numpy is an extension package for Python. Numpy
* provides multidimensional arrays
* is closer to hardware (efficiency)
* designed for scientific computation
* is array oriented computing

Numpy arrays can be used for:
* values of an experiment/simulation at discrete time steps
* signals recorded by a measurement device, e.g. sound wave
* pixels of an image, grey-level or colour
* 3D data measured at different X-Y-Z positions, e.g. MRI scan

#### The `import` statement

To use an extension in Python, it has to be imported first.
The recommended way to import numpy is: `import numpy as np`.
This does two things:
1. load the extension
2. provides data and functions with the prefix `np.`

In [None]:
import numpy as np

### Numpy arrays

Numpy provides a datatype for $N$-dimensional arrays (`ndarray`). Such arrays can be initialized from Python lists using the `np.array` function:

In [None]:
a = np.array([0,1,2,3])

In [None]:
a

In [None]:
print(a)
print(type(a))
print(len(a))
print(a.dtype)
print(a.ndim)
print(a.shape)

Multidimensional arrays are possible

In [None]:
b = np.array([[0,1,2],[3,4,5]])
b

In [None]:
print(type(b))
print(b.ndim)
print(b.shape)
print(b.size)

In [None]:
c = np.array([[[1,2,3], [4,5,6]], [[7,8,9], [10,11,12]]])
c

In [None]:
print(c.ndim)
print(c.shape)
print(c.size)

There are also other ways to create arrays

In [None]:
np.arange(10)

In [None]:
np.arange(1,9,2) # start, end (exclusive),  step

In [None]:
np.linspace(0, 1, 6) # start, end, numpoints

In [None]:
np.linspace(0, 1, 5, endpoint=False)

In [None]:
np.ones((2,2))

In [None]:
np.zeros((3,3,3))

In [None]:
np.eye(4)

In [None]:
np.diag([1,2,3,4])

## Reshaping and combining arrays

In [None]:
a = np.reshape(np.arange(12),(3,4))
a

In [None]:
a.flatten()

**Attention:** flattening happens automatically in some situations (we will see examples below)

### Combining two arrays:

In [None]:
a = np.array([[1,2],[3,4]])
b = np.array([[5,6],[7,8]])

print(a)
print(b)

In [None]:
print(np.append(a,b))

In [None]:
print(np.append(a,b,axis=0))

We can also use `np.hstack` and `np.vstack`:

In [None]:
print(np.hstack((a,b)))
print(np.vstack((a,b)))

`numpy.delete` allows to removes element from an array:

In [None]:
print(np.delete(a,1))

### Indexing and slicing

One-dimensional arrays behave similar to lists:

In [None]:
a = np.arange(1,13)
a

Indices start at 0!

In [None]:
a[2]

In [None]:
a = np.reshape(np.arange(12),(3,4))
a

In [None]:
a[1,1]

A multidimensional array is basically an array of arrays:

In [None]:
a[1]

In [None]:
len(a)

### Flat (linear) indexing:

In [None]:
a.flat[8] = 66
print(a)

In [None]:
np.ravel_multi_index((2, 0), a.shape)

In [None]:
np.unravel_index(8,a.shape)

### Logical indexing

A boolean matrix `b` can be used for indexing a matrix `a` *of the same shape*:

In [None]:
a = np.reshape(np.arange(25),(5,5))
b = np.eye(5, dtype=np.bool)
print(a)
print(b)
print(a[b])

### Exercise

Create the following array:

```
array([[ 0,  1,  3,  4],
       [10, 11, 13, 14],
       [20, 21, 23, 24]])
```

In [None]:
# YOUR CODE HERE

## Numerical operations on arrays

### Elementwise operations

In [None]:
a = np.arange(10)
print(a)

In [None]:
a + a

In [None]:
2 * a

In [None]:
a - a 

In [None]:
a * a

In [None]:
a ** 2

In [None]:
np.sqrt(a)

### Exercise
* Create an $n\times 3$-array that contains `l1` in its first columnt, `l2` in its second column and the sum of `l1` and `l2` in its third column

In [None]:
l1 = [2,5,8,4]
l2 = [7,1,1,3]

In [None]:
# YOUR CODE HERE

### Matrix operations

In [None]:
a = np.reshape(np.arange(1,10),(3,3))
e = np.eye(3)
print(a)
print(e)

In [None]:
a + e # sum of two matrices

In [None]:
2 * a # multiplication with a scalar

In [None]:
a * e # pointwise multiplication (not the matrix multiplication!)

In [None]:
a.dot(e) # matrix multiplication

In [None]:
a.T # matrix transposition

### Some more mathematical functions

In [None]:
data = np.random.randn(100) # standard normal distribution (mean 0, variance 1)
print(data)

In [None]:
max(data) # largest value in the dataset

In [None]:
np.argmax(data) # index of the largest value in the dataset

In [None]:
# equivalent to:
[d for d in data].index(max(data))

In [None]:
# also works with multi-dimensional data:
a = np.random.randn(5,5)
print(a)
print(np.argmax(a))

In [None]:
np.mean(data) # the mean value of the data

In [None]:
np.std(data) # the standard deviation of the data

In [None]:
np.abs(data) # absolute value (remove sign)

In [None]:
np.sum(data) # sum of all data

In [None]:
np.prod(data) # alias: np.product

#### Exercise
* Compute the mean and variance of `data` (without using the buildin functions `mean` and `std`). Then compare your results with the values from `mean` and `std`.

In [None]:
# YOUR CODE HERE

### Random arrays

The subpackage `numpy.random` allows to create arrays filled with random numbers:

In [None]:
np.random.rand(20) # uniform distribution on [0,1]

Multidimensional arrays are possible (but unlike `zeros` and `ones` do not expect tuples):

In [None]:
r2 = np.random.rand(4,5)
print(r2)

In [None]:
r3 = np.random.randn(24) # normal distribution with mean 0 and variance 1
print(r3)

The random number generator can be seeded (reproducibility):

In [None]:
np.random.seed(42)
r3 = np.random.randn(24) # normal distribution with mean 0 and variance 1
print(r3)

## Efficiency

In [None]:
L = range(2000)

In [None]:
%timeit [i**2 for i in L]

In [None]:
a = np.arange(2000)

In [None]:
%timeit a**2

## Some general hints

Help (the docstring) for function or value can be displayed by appending a question mark to its name:

In [None]:
np.argmax?

looking for something: `np.lookfor('create array')`

In [None]:
np.lookfor('create array')

### References

* on the web: http://docs.scipy.org/