 ## Introduction to numpy
 
As the title says, welcome to this introductory notebook where we will try to learn something about `numpy`.

Throughout this notebook you might be asked to provide some answers to our questions. These cells will be clearly marked as follows: 

<h5 style="color: #1B1BFF">Your input required:</h5>

------------------------------

Writing clean vectorized `numpy` code is in many cases essential. Familiarity with this tool will help you write code that is both elegant and efficient and thus leads into faster and more effective experiemnts. Much more importantly, however, it will become easier for you to rewrite matematical equations because your code **will read like equations**. So lets start with some basics.

In [None]:
import numpy as np
from utils import timeit, show_cut_image

First, the imports. While it is not strictly required, most of the time you will find `numpy` to be imported as `np`. We suggest you do the same in your own projects/scripts.

If you would want to create a vector `a` in `numpy` you can do so as follows:

In [None]:
a = np.array([2, 4, 5, 6])
print(a)

To create a matrix you can use same syntax as you would for 2D arrays in Python

In [None]:
A = np.array([[1, 2, 3, 4], 
              [5, 6, 7, 8]])
print(A)

Through this tutorial, we will try to keep the names of vectors (which will be lower-case) and matricies (which will
be keep upper-case) constant. 

Since we now know how to create basic numpy objects, lets talk about them a bit.

All objects like these (vector, matrix...) have a **shape** attribute which describes, as the name suggests, their shape. In other words, it descibes what do the dimensions of a particular object look like. This attribute can be accesed by typing `.shape` after the name of the object.

Here are some examples:

In [None]:
# create a vector
a = np.array([1, 2, 3, 4])
print("{}\n".format(a))

# create a 2D matrix
A = np.array([[1, 2, 3, 4],
              [1, 2, 3, 4]])
print("{}\n".format(A))

# create 3D matrix (literally a 3D Python array wrapped in np.array)
B = np.array([[[1, 2], [1, 2],
               [3, 4], [3, 4]],
              [[4, 5], [5, 6],
               [4, 5], [5, 6]]])
print("{}\n".format(B))

# print shapes of all create objects
print("Shape of vector a is {}".format(a.shape))
print("Shape of matrix A is {}".format(A.shape))
print("Shape of matrix B is {}".format(B.shape))

Throughout this course we will mostly work with high dimensional objects. 

In scientific computation practice, it is often needed to create a multidimensional object with certain values, such as an array with all elements equal to zero or a matrix initialized with values sampled from the normal distribution.

Luckilly, `numpy` is awesome in this regard, as it provides readily-available solutions to pretty much all of the "standard needs". If you feel like "someone must have done this before", chances are there is a `numpy` function for it.

Here are some examples of usage of functions for matrix and array creation that you may find useful:

In [None]:
# creates a vector with elements [0, 1, 2, 3, 4]
a = np.arange(5)
print("a = {}".format(a))

# we can also set its starting point. So to create vector [10, 11, 12, 13, 14]
# we would do
v = np.arange(10, 15)
print("v = {}".format(v))

# and also we can set step size. So to create a vector with elements spaced by 10
# with start at 10 and end at 40, that is [10, 20, 30, 40] we would do
u = np.arange(10, 50, 10)
print("u = {}".format(u))

# Create a matrix filled with zeros of shape (10, 5)
A = np.zeros((10, 5))
print("A =")
print(A)

# Create a matrix filled with ones of shape (5, 10)
B = np.ones((5, 10))
print("B =")
print(B)

In [None]:
# Create a 2D matrix with random values of shape (5, 5)
X = np.random.rand(5, 5)
print("X with shape {}".format(X.shape))
print(X)

# Create 3D matrix with values sampled from normal distribution of shape (2, 3, 2)
Y = np.random.randn(2, 3, 2)
print("\nY with shape {}".format(Y.shape))
print(Y)

Once we have these objects initalized, there are a ton of aritmetic operations we can do with them:

In [None]:
a = np.array([1, 2, 3, 4, 5])
print("a = {}".format(a))

# add a constant value to each element of a vector
print("a + 5 = {}".format(a + 5))

# multiply each element of a vector by a constant value
print("a * 5 = {}".format(a * 5))

# square the vector a
print("a^2 = {}".format(a ** 2))

v = np.array([2, 2, 3, 2, 3])
print("v = {}".format(v))

# multiply vector a by  vector v
print("a * v = {}".format(a * v))

To multiply matrices together (to get their **element-wise** or [Hadamard](https://en.wikipedia.org/wiki/Hadamard_product_(matrices)) product, not the **dot product**), we do not need to do anything special:

In [None]:
A = np.array([[1, 2, 3],
              [1, 2, 3]])
B = np.array([[2, 2, 2],
              [3, 3, 3]])
print("A =\n{}".format(A))
print("B =\n{}".format(B))
print("")
print("A * B =\n{}".format(A * B))


We can also do other standard matrix operations with `numpy`, such as **dot product**.

In [None]:
A = np.random.rand(2, 3)
print("A with shape {}\n".format(A.shape))
print(A)
B = np.random.randn(3, 5)
print("B with shape {}\n".format(B.shape))
print(B)
print("\nA \dot B =")
# dot product of 'A' and 'B' to create a tensor of size (2, 5)
print(A.dot(B))

Oftentimes your vectors/matricies will have a shape that is just not what you would want. 

For instance, your data can be (and often times will be) stored in a common data storing formats like CSV. These formats are easily loadable with Python but not very usefull, since our models will almost certainly require vectors or matricies of certain shape. To reshape our vectors/matricies, numpy provides a `.reshape` function.

For example, suppose we would find interesting data that consits of 100 gray scale images each having `36 x 36`px and loaded it into our Python environment. Note that since these are grayscale images, each pixel is represented as a single scalar value.

Long story short, our matrix would have shape `(100, 36, 36)`. Now, we would like to use this to train some model. But our model takes a vector, not a matrix. We can easily preprocess this data accordingly with `numpy`.

In [None]:
# loaded matrix
X = np.random.randn(100, 36, 36)

# print shape of matrix X
print("Shape of a matrix X is {}\n".format(X.shape))

# reshape matrix X to contain 100 vectors
Y = np.reshape(X, (100, 36 * 36))

# print shape of new matrix Y
print("Shape of new reshaped matrix Y is {}".format(Y.shape))

-------------------------------

<h5 style="color: #1B1BFF">Your input required:</h5>

Reshape vector X, such that after the reshape operation it will contain 100 examples -- each of those with dimensions 240, 240, 3

In [None]:
X = np.random.randn(17280000)

# RESHAPE HERE

print("New reshape vector X with shape: {}".format(X.shape))

------------------------------------------

You may also often times need to find the `sum` or `mean` of a vector or a matrix. 

For all operations like this `numpy` already provides an implemented function (`mean`, `sum`, ...). These functions can also take an optional `axis` argument, which specifies in which axis should an operation be executed. 

For instance, if we would like to take a sum of the first axis of vector `x` we would do something like `np.sum(x, axis=0)` (dimension are indexed from zero). 

In [None]:
# create some matrix x
X = np.array([[1, 3, 5, 7],
              [6, 7, 3, 1],
              [1, 4, 5, 1],
              [7, 8, 4, 3],
              [1, 2, 3, 4],
              [9, 5, 4, 3],
              [9, 8, 7, 6]])
print("Shape of X: {}\n".format(X.shape))

# take sum of first axis
s = np.sum(X, axis=0)
print("sum of first dimension of matrix x: \ns={}\n".format(s))

print("Shape of s: {}".format(s.shape))

-------------------------------

<h5 style="color: #1B1BFF">Your input required:</h5>

Find the mean of the matrix `Z` in the second dimension. Furthermore, please print the dimensions of this resulting matrix as well.

In [None]:
Z = np.random.rand(10, 10, 15)

## Mean of the Z matrix in the second dimension:

## Dimensions of the result

<h5 style="color: #1B1BFF">Your input required:</h5>

How would dimensions change if we would have taken the mean of the last (third) axis? Please also *briefly* explain what would the output of the `mean` operation represent if we changed the `axis` parameter.

*Write here **(double click on this text)** what would happend to the dimensions if we would have taken the mean of the third axis and your brief explanation.*

---------------------------------------

`numpy` also supports an operation called **broadcasting**. This can be better explained on some example.
Let us assume we have a 2D matrix with 10 training examples, each of them having 3 features (matrix `X` of shape (10, 3)). Further suppose we also have a vector `w` of weights for each feature of size `(3, )`. 

Now, we would like to weigh each one of our traning examples with our wieght vector `w`. In plain Python we would probably have to write 2 `for` loops. With `numpy` it seems that since we can multiply vectors, we could probably make do with just one.

Thankfully, all of this has been taken care of for us by **broadcasting**. In the end, all we need is a simple multiplication:

In [None]:
# create random weight vector w 
w = np.random.randn(3)
print("w with shape {}".format(w.shape))
print(w)

# create random matrix X
X = (np.random.rand(10, 3) * 10) + 3
print("X with shape {}".format(X.shape))
print(X)
print("")

# multiply each vector of shape (3,) in a second 
# dimension of matrix X by weight vector w
print("Multiplication result:")
print(X * w)

If we would want to set all elements in the second dimension of the first element of some matrix to zero, we can also do so with `numpy`s broadcasting by simply typing **:** instead of a number in a specific dimension.

*Note*: the magical colon **:** here is actually the delimeter -- part of the syntax for "slicing" Python sequences (of any kind, really). For instance,

- `[1:5]` is equivalent to saying "from 1 to 5"
- `[:5]` is equivalent to saying "from beginning to 5"
- `[1:]` is equivalent to saying "from 1 to end"

Taking all of this into consideration, in the end we get `[:]` which can be equivalently described as "from beginning to end"

In [None]:
# create random matrix X
X = np.random.randn(4, 3)
print("X =\n{}".format(X))
print("")

# set all elements of a first dimension to zero
X[0, :] = 0

print("X =\n{}".format(X))

`numpy` also supports indexing by specifying a vector of indexes and then passing it to a "sliceable" object (i.e. matrix or vector). Or you can also specify a boolean matrix/vector (in other words a *mask*) and pass it to any object of this sort. Values in places with `True` value will be picked up while values on `False` places will be ignored.

In [None]:
# create matrix X filled with ones
X = np.ones((4, 3))
print("X =\n{}\n".format(X))

# create vector of indecies for matrix X
idx = [0, 3]
X[idx] = 0
print("After running X[idx] = 0")
print("X =\n{}\n".format(X))

# create boolean matrix to select only certain values
bool_idx = np.zeros((4, 3)).astype('bool')
bool_idx[2, 2] = True
bool_idx[3, 2] = True
bool_idx[1, 0] = True
print("bool_idx =\n{}\n".format(bool_idx))

# set each values specified in bool_idx to 300
X[bool_idx] = 300
print("After running X[bool_idx] = 300")
print("X =\n{}".format(X))

These operations are usually heavily optimized in the background, so in most cases it is worth writing **vectorized** code with `numpy` rather than loops (even if it may take some time to figure out how to actually do that). 

We can also compare values in numpy arrays. Comparing an array or a matrix with some value will return an array or a matrix of the same shape but with boolean values. If the value in certain position meets the specified condition, the resulting matrix will have `True` value in that position. If it does not meet the condition, it will be `False`.

In [None]:
# create matrix X
X = np.array([[1, 4, 3, 5], 
              [1, 2, 3, 7], 
              [5, 6, 2, 7]])
print("X =\n{}\n".format(X))

# get all values that are more than 3
print("X > 3 =\n{}".format(X > 3))

As we can see, this nicely combines with what we learned before on `numpy`'s support for indexing by a boolean array/matrix. So if we want to set all the values that are bigger than `3` to say `300`, we can just do:

In [None]:
print("X =\n{}\n".format(X))

# get all values that are more than 3
bool_idx = X > 3
print("X > 3 =\n{}".format(bool_idx))

# set all those values to 300
X[bool_idx] = 300
print("X =\n{}\n".format(X))

-------------------------------

<h5 style="color: #1B1BFF">Your input required:</h5>

Create random array of shape (10, 10, 3) that will have every second element (starting with index 1) in the second dimension set to zero.

<h5 style="color: #1B1BFF">Your input required:</h5>

Create a function that will set all the elements of a given array/matrix under a specified threshold to a specific value (we prepared the function header for you).

In [None]:
def set_lower(X, thresh, value):
    pass


X = np.array([[1, 3, 5],
              [3, 5, 1],
              [1, 2, 5]])
set_lower(X, 3, 5000)
print("X =\n{}".format(X))

Y = np.random.randn(100, 100, 100)
set_lower(Y, 0.5, 1)

------------------------------------------

**And much more...** if you are iterested, all operations can be found in the [**offical documentation**](https://docs.scipy.org/doc/numpy/contents.html).

Furthermore, if you would like to familiarize yourself with `numpy` even more, we suggest you take a look at the [**Numpy Excercises**](https://github.com/Kyubyong/numpy_exercises).


Now for the fun part of this excercise. From this point on <span style='background:#b7b7b7;padding:5px;color: #1B1BFF'>***your input is required***</span> in every code cell :).


Rewrite this matematical equation first with the use of loops and then as functional vectorized numpy code: 

$$y = \frac{1}{n}\sum_i^n{X_i}$$

In [None]:
X = np.arange(50)

def operate_loops_avg(X):
    y = 0
    return y

operate_loops_avg(X)

In [None]:
X = np.arange(50)

def operate_vectorized_avg(X):
    y = 0
    return y

operate_vectorized_avg(X)

Rewrite this matematical equation with the use of `for` loops. Please use the prepared function header.

$$y = \sum_j \sum_n \big(\sum_m X_{j,m} \cdot W_{m,n}\big) + b_{n}$$

where $b$ is a vector, $X_j$ is an input vector for the example $j$ and $W$ is a weight matrix.

In [None]:
A = np.random.rand(10, 100)
B = np.random.rand(100, 100)
c = np.random.rand(100)

@timeit
def operate_loops(X, W, b):
    y = 0
    return y

for x in range(10):
    operate_loops(A, B, c)

Now write the same equation but as numpy vectorized code. Difference between results of these two functions should be **zero** or **very close to zero**.

In [None]:
@timeit
def operate_vectorized(X, W, b):
    y = 0
    return 0

for x in range(10):
    operate_vectorized(A, B, c)
    
# print("Difference: {}".format(np.sum(np.abs(operate_vectorized(A, B, c) - operate_loops(A, B, c)))))

As you can see, the `numpy` vectorized code runs much faster. This is due to the fact that vectorized operations, broadcasting and native `numpy` functions are heavily optimized to run as eficiently as possible.

------------------------------------------------------------------------------------------------------------------

### Bonus question for 3 points:

We loaded a picture for you in the cell below. With just reshape and function `moveaxis()` (check out the offical docs) split the image into equal chunks. In other words, simulate sliding window cutter that will cutout window of size `80 x 80` each **80 pixels**. Your final array should have shape `108 x 80 x 80 x 3`. This should not take more than few lines of code. You can also pass the final array to `show_cut_image(<your array here>)` function to see some of your cutouts.

In [None]:
from scipy import misc
import matplotlib.pyplot as plt

img = misc.face()
img = misc.imresize(img, (720, 960))
plt.imshow(img)
plt.show()

print("Image shape: {}".format(img.shape))

In [None]:
# RESHAPE AND ROLL HERE
# show_cut_image(x)

### Submission details


The deadline for this assignment is the **19th of October, 23:55 CEST**.

*If you need any help with this assignment, please feel free to ask the course TAs during their office hours, which can be found at the [course website](http://compbio.fmph.uniba.sk/vyuka/ml/).*


## This is all for now. Thank you