# CV1 Lab0 Exercise - Python Numpy

In [1]:
import sys
if sys.version_info[0] < 3:
    raise Exception("Must be using Python 3")

# Introduction into Python and Numpy
$\newcommand{\v}[1]{{\mathbf #1}}$

This assignment we won't do any actual machine learning yet, but we'll setup and get familiar with the tools we will be using for the rest of this course.

In machine learning and computer vision we are dealing with massive amounts of data. Data most often organised in tables. When all data elements in a table are of the same datatype (like an integer or a floating point number) the table can be represented with a homogeneous array.

Languages that are optimally suited for programming with data are therefore equipped with array data types that are integral part of the language. Although arrays look a lot like python lists they are not, as is shown a little down the road in this notebook.

## Jupyter Notebook cells

A notebook consists of a sequence of cells. A cell is a multi-line text input field, and its contents can be executed by typing `Shift-Enter`, or by clicking the `Run` button in the toolbar. What exactly this does depends on the type of cell. There are four types of cells: *code cells*, *markdown cells*, *raw cells* and *heading cells*. We will only focus on the first 2; code and markdown. Every cell starts off being a code cell, but its type can be changed by using a dropdown on the toolbar (which will be `Code`, initially).

In a code cell you can write *Python* code. When you run that cell (click on it and press `Shift-Enter`) the code in the cell will run, and the output of the cell will be displayed beneath the cell. Lets try out a very simple code cell below

In [2]:
x = 5
x = x + 2
print(x)

7


This produces the output you might expect, the exact the same result as executing that bit of *Python* code in a terminal. You can modify the contents of the code cell and run it again with `Shift-Enter` to see how the output changes. Global variables are shared between cells. This means we can still use variables or functions from the first cell in a second cell. Notebooks are expected to be run top to bottom, starting with the first cell and ending with the last. **Failing to run some cells or running cells out of order is likely to result in errors.** For example, if we were to run the second cell before the first has been run the first, we would get an error saying `x` is not defined

In [3]:
y = 2 * x
print(y)

14


### Markdown

*Markdown* is a simple way to format text using some extra symbols like asterisks (`*`) and underscores (`_`). You can do a simple [10 minute tutorial](http://www.markdowntutorial.com) or reference the [CheatSheet](http://commonmark.org/help/) for the available commands.

If you set a notebook cell as a *Markdown* cell, you can write *Markdown* directly in the cell. When you run this cell, the markdown will be formatted to the *rich text*. if you **double-click** the *rich text*, you can go back to editing the markdown code. All these assignment texts are *Markdown* cells and it will be convenient to write longer answers in, instead of using code comments.

## A note

Before you turn a problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

## Python packages: Matplotlib, Numpy

First things first, let's make sure to install all the right packages: numpy, opencv, Pillow and matplotlib.  

*Matplotlib* is a plotting library for *Python*. We can import the module with:

    import matplotlib.pyplot as plt

Here we rename the module to `plt` to make it a little less typing when we need to actually use it.

*NumPy* is a Python package whish is great for working with N-dimensional arrays and its operations; things like matrix multiplication and matrix inversion already come built in.


<!-- In stead of explicitely import these packages, one can also use the magic command '%pylab inline' that import amongst others matplotlib and numpy. This imports all required modules, and your plots will appear inline.

See for a discussion on magic commands (but you may skip it).
https://ipython.org/ipython-doc/dev/interactive/magics.html -->


In [5]:
# Import the four packages

import numpy as np


### Now we can start working with numpy arrays

In [6]:
a = np.array([1, 2, 3])
print(type(a))

<class 'numpy.ndarray'>


In [7]:
print(a)

[1 2 3]


In [8]:
b = [1, 2, 3]
print(b)
print(type(b))

[1, 2, 3]
<class 'list'>


In [9]:
print(a + a)

[2 4 6]


In [10]:
print(b+b)

[1, 2, 3, 1, 2, 3]


Both lists and arrays are so called iterables in Python, therefore constructions like the following are possible:

In [11]:
for element in a:
    print(element)

1
2
3


In [12]:
for element in b:
    print(element)

1
2
3


The nice thing about Numpy arrays is that it allows you to manipulate the data in arrays without writing explicit loops. For instance look at the addition of all elements in an array:

In [13]:
a = np.random.rand(65334)

In [14]:
print(a)
print(a.shape)

[0.99073658 0.53481467 0.66351593 ... 0.3678821  0.43865935 0.70718137]
(65334,)


In [15]:
def loopsum(a):
    sum = 0
    for v in a:
        sum += v
    return sum

%timeit loopsum(a)
%timeit np.sum(a)

7.92 ms ± 325 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
37.9 µs ± 1.39 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


So the explicit loop sum function in python takes about 8 ms versus 40 us for the numpy version. That is about 200 times slower for the explicit loop version.

So be aware in this course to use build-in Numpy tools to manipulate and calculate with arrays.

There are many python/numpy tutorials available like this one http://cs231n.github.io/python-numpy-tutorial/.

<span style="color:red">**Please open up a numpy manual or tutorial and only then use the exercises below to test your knowledge on numpy**</span>


## Array Calculations and Indexing

First we define some array to work with. By explicitly setting the seed the random number generator will always return the same 'random' numbers... (so i know the answers)

In [16]:
np.random.seed(99283)

A = np.random.rand(8,5)
print("A = ", A)

B = np.random.rand(8,5)
print("B = ", B)

C = np.random.rand(128,)
print("C = ", C)

A =  [[0.41060654 0.97277616 0.34664829 0.30763314 0.19940409]
 [0.31149602 0.84859173 0.26497371 0.80822227 0.15994864]
 [0.79155146 0.29057342 0.77240998 0.24267702 0.40651574]
 [0.21670841 0.79808991 0.05263509 0.19251708 0.14182459]
 [0.44997373 0.15038151 0.96809554 0.40134072 0.08446855]
 [0.61042722 0.1482476  0.195716   0.18051837 0.66881511]
 [0.44317913 0.00886134 0.60781822 0.29707803 0.90621548]
 [0.28093655 0.48012234 0.36632596 0.73682998 0.42640697]]
B =  [[0.29252436 0.09877095 0.98209075 0.71200725 0.20980036]
 [0.24171315 0.95754295 0.52900538 0.03498925 0.43258694]
 [0.92933117 0.51974005 0.68980843 0.2509423  0.31617492]
 [0.36231013 0.48113418 0.28021683 0.70822839 0.3803101 ]
 [0.8446592  0.4228325  0.23960671 0.89653899 0.08423222]
 [0.47686616 0.71566631 0.37584664 0.38352951 0.47330812]
 [0.3048834  0.36415702 0.31859479 0.29453833 0.00683401]
 [0.41444028 0.34648831 0.31315222 0.1035041  0.50282715]]
C =  [0.28418358 0.84760497 0.96705416 0.76505097 0.5039723 

In [17]:
# Some more examples - see what happens
# vector of dim 1
v1 = np.array([1, 2, 3, 4])
print('shape of 1d array v1:', v1.shape)
v2 = v1.transpose()
print(v2.shape)

# vector of dim 2
v3 = np.array([1, 2, 3, 4]).reshape((4,1))
print('shape of 1d array v3:', v3.shape)
v4 = v3.transpose()
print(v4.shape)

shape of 1d array v1: (4,)
(4,)
shape of 1d array v3: (4, 1)
(1, 4)


In [25]:
# Some more examples / exercises - see what is usefull

# Create a 10 x 10 matrix filled with 3s by using the built in function ones or zeros.
# (Hint: Type in help ones or help zeros)
m1 = 3* np.ones(10)
print(m1)

[3. 3. 3. 3. 3. 3. 3. 3. 3. 3.]


In [26]:
# a 5 by 5 identiy matrix
#(Hint: Check doc to use np.eye)
m2 = np.eye(5)

print(m2)


[[1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]]


In [29]:
x1 = np.arange(9.0).reshape((3, 3))
print(x1)
x2 = np.arange(9.0).reshape((3, 3))
print(x1 * x2)

[[0. 1. 2.]
 [3. 4. 5.]
 [6. 7. 8.]]
[[ 0.  1.  4.]
 [ 9. 16. 25.]
 [36. 49. 64.]]


In [30]:

# just an array with numbers
m3 = np.arange(10)
m4 = np.arange(10).reshape(2, 5)
print(m4)


# try element-wise division and matrix division
m5 = np.array([[10, 20, 30], [30, 40, 60]])
m6 = np.array([[2, 4, 6], [3, 5, 6]])
print(m5/m6)                # element wise
print(m5*m5)                # element wise


# Given the matrix F and G, observe the outputs
F = np.arange(1,10).reshape(3, 3)
print('F', F)
G = 10 * F
print('G', G)



[[0 1 2 3 4]
 [5 6 7 8 9]]
[[ 5.  5.  5.]
 [10.  8. 10.]]
[[ 100  400  900]
 [ 900 1600 3600]]
F [[1 2 3]
 [4 5 6]
 [7 8 9]]
G [[10 20 30]
 [40 50 60]
 [70 80 90]]


In [None]:

# concatenation of several arrays
i1 = np.concatenate([F, G])     # in x direction: axis = 0 (default)
i2 = np.concatenate([G, F])
i3 = np.concatenate((i1, i2), axis = 1)    # in y direction, axis = 1
print(i3.shape)


# slicing - note: Python starts counting indices at 0
i4 = F[1:2, 1]
print('i4:', i4)
i5 = F[:, 1]
print('i5:', i5)
F[1, :] = G[2, :]
F[2, 1] = 33
# B[5, 5] = 55
i6 = G[1:2]
print('i6:', i6)
G[1:3, :] = G[1:3, :] + 100
i7 = G[-1]
print('i7:', i7)


### Exercise

Write two functions: one to calculate the elementwise sum of A and B and another one to calculate the *elementwise* product of A and B. You are not allowed to use loops over the elements in the array.

In [None]:
def sumArrays(a, b):
    # YOUR CODE HERE
  raise notImplementedError()


In [None]:
# Check that functions are correct
assert all(sumArrays(A,B) == \
np.array([[17,  8,  6,  7, 14],
       [16, 14,  7,  6,  9],
       [16,  9,  6,  9, 11],
       [ 9, 12, 12, 11, 11],
       [12,  8,  8, 12,  8],
       [ 9, 14, 10,  7,  9],
       [18,  2, 13, 13,  9],
       [12, 14, 12, 13,  5]]))

assert all(sumArrays(A,-A) == zeros_like(A))


In [None]:
def mulArrays(a, b):
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
assert all(mulArrays(A,B) == \
np.array([[72, 15,  8,  6, 48],
       [63, 49,  6,  8, 18],
       [64, 20,  8, 20, 18],
       [14, 32, 32, 28, 30],
       [32, 12, 15, 35, 12],
       [ 8, 48, 16, 12,  8],
       [81,  1, 40, 36,  8],
       [27, 45, 35, 36,  6]]))
assert all(mulArrays(B, 1/B) == ones_like(B))

### Exercise

Calculate the mean of all elements in an array *without* using the mean or average function from numpy.

In [None]:
def meanArray(a):
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
assert meanArray(A) == mean(A) # the mean function that you can't use
assert meanArray(B) == mean(B)
assert allclose(meanArray(B/mean(B)), 1)

## Exercise

Calculate the standard deviation of all elements in an array *without* using the var or std functions from numpy.

In [None]:
def stdArray(a):
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
assert allclose(stdArray(A), std(A))
assert allclose(stdArray(B), std(B))

### Exercise

From C select the elements C[0], C[2], C[4], ... and sum all these

In [None]:
def selectEven(a):
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
assert all(328 == selectEven(C))

### Exercise

Select the first 32 elements from array C:

In [None]:
def selectFirst32(a):
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
assert all(selectFirst32(C) == \
           np.array([6, 5, 4, 5, 9, 8, 8, 4,
                  2, 7, 8, 4, 1, 8, 9, 3,
                  1, 1, 3, 3, 3, 7, 9, 4,
                  8, 4, 3, 8, 1, 5, 3, 1]))


### Exercise

Select all elements from C that are not equal to 8. This can be done without explicit loops using the concept of logical indexing.

In [None]:
def isnot8(a):
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
correct_answer = np.array([6, 5, 4, 5, 9, 4, 2, 7, 4, 1, 9, 3, 1, 1, 3, 3, 3, 7, 9, 4, 4, 3,
       1, 5, 3, 1, 3, 9, 7, 7, 9, 3, 4, 4, 7, 6, 2, 9, 1, 2, 9, 2, 3, 9,
       1, 6, 5, 9, 5, 6, 7, 5, 5, 3, 9, 3, 9, 3, 3, 5, 6, 5, 6, 9, 5, 2,
       1, 5, 6, 4, 3, 3, 3, 6, 9, 3, 4, 2, 2, 3, 1, 3, 2, 9, 4, 2, 5, 4,
       3, 7, 5, 6, 6, 7, 5, 2, 2, 3, 3, 3, 9, 4, 9, 3, 7, 5, 6, 9, 6, 7,
       6, 9])
assert all(correct_answer == isnot8(C))

### Exercise

Now select all rows from A that do not start with an 8

In [None]:
def notstart8(a):
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
assert all(notstart8(A) == \
           np.array([[9, 7, 6, 2, 3],
                  [2, 4, 8, 4, 6],
                  [4, 6, 3, 7, 6],
                  [9, 1, 5, 9, 8],
                  [9, 9, 5, 9, 2]]))

### Exercise

Now select all rows that do not contain any 8

In [None]:
def notany8(a):
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
assert all(notany8(A) == np.array([[9, 7, 6, 2, 3],
       [4, 6, 3, 7, 6],
       [9, 9, 5, 9, 2]]))

### Exercise

Reverse the order of the columns in array B:

In [None]:
def reverse_colums(a):
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
assert all(reverse_colums(B) == np.array([[6, 6, 4, 3, 9],
       [6, 4, 1, 7, 7],
       [2, 4, 2, 5, 8],
       [5, 7, 4, 8, 7],
       [2, 5, 5, 2, 8],
       [1, 4, 2, 6, 1],
       [1, 4, 8, 1, 9],
       [3, 4, 7, 5, 3]]))

Array indexing is probably one of the most difficult subjects of programming with numpy in an efficient way. Let A be a numpy ndarray (n-dimensional array) then A[obj] is an indexing operation on array A. It depends on the value and type of obj what type of indexing is used. There are really three types of indexing…



## Views on Arrays

Most often when you need arrays in programming, those arrays tend to be very large. Think of images with millions of pixels in it. Thus when calculating with arrays you don't want to make unnescessary copies of arrays. Numpy (thinks it) is very clever in circumventing the need of making copies of arrays. But the cleverness of numpy can bite you in the tail when your are not aware of what is going on.

In [None]:
AA = A.copy() # to be sure and not mess with A itself
              # we start by explicitly making a copy
print(AA)

In [None]:
AA2 = AA[::2,::2]
print(AA2)

In [None]:
AA2[:,:] = 999
print(AA2)

In [None]:
print(AA)

Evidently AA2 is still refering to the same data elements as AA. We say that AA2 provides a new **view on array AA**. The new view can be of different shape (as it is here). But remember that a view still points to the same data as the array on which it is view.

The rules Numpy uses when it doesn't and when it does make a copy of the data are not trivial.

Be sure to keep this phenomenon in the back of your mind when confronted with a nasty bug in your code.

## Tricks with Arrays

### Exercise

Given the array C of shape (128,) make it into an array of shape (128,1). You can do that with the reshape method or with the use of the 'newaxis' index.

In [None]:
def ncomma_to_ncomma1(a):
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
assert ncomma_to_ncomma1(C).shape == (128,1)
assert all(ncomma_to_ncomma1(C).flatten() == C)
n = random.randint(50,500)
D = np.random.rand(n)
assert ncomma_to_ncomma1(D).shape == (n,1)
assert all(ncomma_to_ncomma1(D).flatten() == D)

We make some new data to work on:

In [None]:
np.random.seed(38293804)
A35 = random.randint(1,10,size=(3,5))
print(A35)

v5 = np.array([1,2,3,4,5])
print(v5)

v3 = np.array([1,2,3])
print(v3)

### Exercise

Subtract the (5,) array from each of the rows of A35. **Note there is no need to first duplicate the v5 array to form a (3,5) shaped array**. Note that your function should work for all arrays of size (m,n) and rows of size (n,).

In [None]:
def subtract_row(a, r):
    """Subtract row r from all rows in a"""
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
assert all(subtract_row(A35, v5) == np.array([[ 1,  7,  2,  3, -3],
       [ 3,  1,  1, -2,  2],
       [ 1, -1,  5, -2, -2]]))

### Exercise

Subtract the (3,) array v3 from each of the columns of A35.  Note that your function should work for all arrays of size (m,n) and columns of size (m,).

In [None]:
def subtract_col(a, c):
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
assert all( subtract_col(A35, v3) == \
np.array([[ 1,  8,  4,  6,  1],
       [ 2,  1,  2,  0,  5],
       [-1, -2,  5, -1,  0]]))


## Linear Algebra

Python supports many linear algebra functions like calculating norm of a vector or determinant of a matrix.

In Python 3 the `@` operator for matrix multiplication was introduced. This means that `A @ B` denotes the matrix multiplication of a matrix (array) A of shape (m,n) with a matrix (array) B of shape (n,k).

Python also allows arrays of shape (n,) to be used in matrix multiplications. Depending on the context it depends on whether Python interprets an array with shape (n,) as a matrix of shape (n,1) or (1,n).

Below some examples are given.

In [None]:
# creation of some random matrices and vectors. Note difference in size (3,) and (3,1)

np.random.seed(324893485)
A = np.random.rand(3,4)
B = np.random.rand(3,3)
x = np.random.rand(4,)
y = np.random.rand(3,)
v = x.reshape((4, 1))
z = np.random.rand(4,1)
w = y.reshape((3, 1))

In [None]:
# norm and determinant

print('norm y:', linalg.norm(y))
print('determinant B:', linalg.det(B))

In [None]:
# see difference in outcome

print(A @ x)
print(A @ v)

In [None]:
# alignment problem will occur

A @ y

In [None]:
# This should work

print(y @ A)

In [None]:
# See difference in outcome. Understand difference

# v.Transpose @ z gives dot product
print(v.T@z)

# v @ z.Transpose result in matrix
print(v@z.T)

So nothing for you to do here except note that an array of shape (n,) can be used as either row vector of column vector in a vector-matrix or matrix-vector multiplication respectively.

## End of Notebook