# Exercises on numpy

This is 8 short exercises on variables.

Use this [notebook](https://github.com/dtaantwerp/dtaantwerp.github.io/blob/master/notebooks2021/1_Week1_Monday_Python_and_Variables.ipynb) for a complete explanation of numpy.

# 0. How to read a line?!
Lines are functions of the form $f(x) = mx + q$, with $m$ and $q$ being the `slope` and `intercept` respectively.

If this is still unfamiliar to you, play with [this](https://www.desmos.com/calculator/59qdbtnlzy) for a while. See how changing the slope and intercept affects the shape and behavior of a line. Note that for a line which meets the horizontal and vertical axis at $x_0$ and $y_0$ respectively, $-\frac{y_0}{x_0}$ is basically the slope!


# 1. A very very simple (yet crucial) introduction to Machine Learning!
One of the main tasks in Machine Learning is Classification: you have some data (x), you have some classes or categories or labels (y), you want the machine to learn which data point belongs to which class. An example in sentiment analysis is that you have some product reviews and you want to classify them into Positive or Negative sentiment. If you can visualize your data, this might be how it looks in 2D, where classes are identified with color; e.g. the blue points are positive reviews and the orange points are the negative ones: 

<img src="https://machinelearningmastery.com/wp-content/uploads/2020/03/Scatter-Plot-of-Binary-Classification-Dataset-With-2D-Feature-Space.png" width="400" /> 

Now imagine that for this data, we want to learn a `linear` classifier, i.e. a LINE that is able to separate the two classes in a good (enough) way. This might not be always possible but looking at our data here, it seems like a pretty good choice. \
> Q: Which one of the following lines can be a good candidate for our classifier? 

* $y = 1.6x + 15$
* $y = -15x + 5$
* $y = -1.7x + 15$
* $y = 5x + 10$

# 2. (Gradient) descent from a mountain
In your class notebook you saw an image like this (ignore the "03"!):

<img src="https://blog.paperspace.com/content/images/size/w1750/2019/09/F1-03.large.jpg" width="400" /> 

Which represents the "error landscape" in a problem; i.e. how the error value varies when we change our parameters. Solving most (if not all) machine learning problems basically boils down to finding the minimum of an error function. Unfortunately we rarely have such an overall nice view of our function because it usually has thousands (or millions or billions!) of parameters. In this situation, solving such a problem is kinda like descending a mountain with closed eyes! 
One logical way to do this is feeling the ground with your feet to see which way goes down (or goes down sharper). If you want to turn this into an instruction, it would be something like this:
At any moment:
* Feel the ground around you with your feet and try to have an estimate of the slope in different directions.
* Move a bit in the direction which has the sharpest negative slope.
* Repeat. \
\
\
This is the main idea behind the Gradient Descent algorithm, which basically goes like this:
* Start somewhere on the error landscape. If you can, take a smart guess instead a random choice!
* Calculate the derivatives along different directions (this is called the Gradient vector).
* Move in the opposite direction of this vector (because it points towards higher values), with a step which is proportional to the slope value (i.e. you take bigger steps when the slope is sharper and vice versa)  \
\
\
Let's see this in action:
* Go [here](https://uclaacm.github.io/gradient-descent-visualiser/#playground)
* Copy-paste this under `Function` : .01x^4 - x^2 + 2x  . This is our error (or cost) function which we want to descent from; i.e. we are looking for its lowest value but -as mentioned before- we don't have this view of the function. We start blindfolded from somewhere and try to slowly move towards lowest (lower) point(s).
> Q: Give an estimate for $min(f)$ and $argmin(f)$ where $f$ is our error function.
* Let's assume that we start from $x = 12$. So put 12 under `Starting Point`.
* Now click on `Set Up`. You should see the function and the tangent line at the starting point (red dot) on the chart. Adjust your view of the chart (if necessary) by scrolling on it.
* We are almost ready for our descent but before that, we should decide on a value for our step (or `Learning Rate`)! Are we going to -generally- take small steps or big ones? For now, just put 0.01 under it. (Note that we will multiply this by the slope value).\
> Q: Where do you think you will end up? In the right valley? Left valley? Middle hill? Somewhere else?
* Start the descent by clicking on `Next Iteration`. Watch how your location (red point on the chart) changes after each iteration (or step).
* Keep clicking! Note how your movements become smaller and smaller as you come closer to the valley. (Q: Why is that?)\
\
> Q: After a while, it seems that you are stuck in the right valley, which is not the lowest point of the landscape. Do you think more clicking will get you out? Why?
* Now let's descent again, but this time with a different starting point, say $x = -12$. Change the `Starting Point`, click on `Set Up` and repeat the experiment. See how it goes. \
As you witnessed, where you start your descent from makes a big difference in where you end up. Bad news is we are blindfolded; we don't have the overall view, therefore if we go down the mountain guided by the Gradient Descent algorithm, we (almost) always end up in a Local Minimum instead of the Global Minimum. Good news is there are ways to improve the algorithm. Even better, by the power of God or some other supernatural being, local minima often are good enough to base our model on. (Note that for real error functions, the landscape is a super hilly one with millions or billions of valleys!) If this was not the case, Machine Learning as we know today, didn't exist!   

> Q: Play with the tool. Try different starting points, learning rates and functions. Do you think there could be a general receipe for the value of `Learning Rate`? What is good (and bad) about picking a small learning rate?
  

# 3. The coordinates of your smile!
Any digital image is an array of pixels. For a black&white image (not grayscale!), this array is a binary one, with the values telling us if that pixel should be On (1) or Off (0). Here is the smiley symbol as an 20x20 array of B&W pixels: \
\
<img src="https://github.com/dtaantwerp/dtaantwerp.github.io/tree/master/exercises2021/img/smiley.png" width="150" /> 
\
\
Let's create this array in Python so that we can play with it! Here is a boring way to do so:

In [None]:
# Run the cell
import numpy as np

a0 = [0]*20
a1 = a0
a2 = [0]*7 + [1]*6 + [0]*7
a3 = [0]*6 + [1] + [0]*6 + [1] + [0]*6
a4 = [0]*4 + [1,1] + [0]*8 + [1,1] + [0]*4
a5 = [0]*4 + [1] + [0]*10 + [1] + [0]*4
a6 = [0]*3 + [1] + [0]*3 + [1,1] + [0,0] + [1,1] + [0]*3 + [1] + [0]*3
a7 = [0]*2 + [1] + [0]*4 + [1,1] + [0,0] + [1,1] + [0]*4 + [1] + [0]*2
a8 = [0]*2 + [1] + [0]*14 + [1] + [0]*2
a9 = [0]*2 + [1] + [0]*14 + [1] + [0]*2
a10 = [0]*2 + [1] + [0]*14 + [1] + [0]*2
a11 = [0]*2 + [1] + [0]*2 + [1] + [0]*8 + [1] + [0]*2 + [1] + [0]*2
a12 = [0]*2 + [1] + [0]*3 + [1] + [0]*6 + [1] + [0]*3 + [1] + [0]*2
a13 = [0]*3 + [1] + [0]*3 + [1]*6 + [0]*3 + [1] + [0]*3
a14 = [0]*4 + [1] + [0]*10 + [1] + [0]*4
a15 = [0]*4 + [1,1] + [0]*8 + [1,1] + [0]*4
a16 = [0]*6 + [1] + [0]*6 + [1] + [0]*6
a17 = a2
a18 = a0
a19 = a0

smiley = [a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, a15, a16, a17, a18, a19]
smiley = np.array(smiley)
smiley

We can actually display the array as an image using some Python libraries. Here is a way:

In [None]:
!pip install matplotlib

In [None]:
from matplotlib import pyplot as plt
plt.imshow(smiley, cmap='Greys')
plt.xticks(np.arange(0.5, 20.5, 1), labels=[])
plt.yticks(np.arange(0.5, 20.5, 1), labels=[])
plt.grid(True)
plt.show()

> Q: Identify the left and right eye of the image as a selection of this array. \
> Q: Imagine a (rectangular) nose for the image. Identify this hypothetical nose as a selection of the array. (you can choose the size or orientation!)

In [None]:
left_eye = smiley["""...your code here..."""]
right_eye = smiley["""...your code here..."""]
nose = smiley["""...your code here..."""]

Now we can use the array to modify the image. Let's start by removing the left eye! which is equivalent to setting the left-eye pixels to 0.
> Q: Do this by turning off the left-eye pixels one-by-one:

In [None]:
smiley[???,???] = 0
smiley[???,???] = 0
smiley[???,???] = 0
smiley[???,???] = 0

Let's see how it looks like now. Since we are going to use the display cell frequently, let's turn it into a function that receives an array and displays it as a B&W image. For simplicity you can just assume that the arrays are always 20x20 so that you don't need to change the grid lines.

In [None]:
def display_array(array):
    # your code here
    

Now we can just call it on our array to see how it looks:

In [None]:
display_array(smiley)

Poor smiley! Let's put the eye back but in a less boring way! \
We can do it all at once, by assigning the whole left-eye area (as a selection of the array) to an array of the same size with values of 1: 

In [None]:
# let's create the eye separately ...
eye = np.ones("""...your code here...""")

# and then assign it to the left-eye area:
smiley["""...your code here..."""] = eye


display_array(smiley)

Nice! let's add the nose in the same way:

In [None]:
nose = np.ones("""...your code here...""")

smiley["""...your code here..."""] = nose

display_array(smiley)

Cool! Play more if you like! 

# 4. How accurate is your classifier?
Imagine that we have a classifier for binary classification (i.e. there are only two classes in our data, which we call 0 and 1). We have applied this classifier on some labeled data,  got some predictions and now we want to see how accurate they are. We define accuracy simply as the ratio of correct predictions, so: $Acc = \frac{Number-of-correct-predictions}{Total-number-of-predictions}$. \
Since the model is imaginary, we need to create hypothetical lists of predictions and labels.
> Q: In numpy there is an easy way to do this. Check out [this](https://numpy.org/doc/1.13/reference/generated/numpy.random.randint.html) page and write a code that creates two random 1D binary arrays of size 10.

In [3]:
import numpy as np
labels = # your code here
predictions = # your code here

But what if we didn't know about this `randint`? Can we do the same using only the random.random() method? 
Let's try! We know that random.random() returns random float values from the [0,1) interval. Like this:

In [16]:
random_array = np.random.random(10)
random_array

array([0.29528067, 0.18261099, 0.57663449, 0.8677704 , 0.09347673,
       0.18896265, 0.35688493, 0.28501031, 0.8687269 , 0.43049414])

Now if we deduct 0.5 from such values, they will be mapped into the [-.5, .5) interval. And if we round them up, they will be either 0. or 1.! \
Cool! Let's do this. First we deduct 0.5. 
> Q: What do you think about the following code? Should it work? Run and see!

In [None]:
float_labels = random_array - .5
float_labels

If you're not surprised, see what happens when you have your data as a `list` (instead of an `array`).

In [None]:
random_list = list(random_array)
random_list - .5

You get a TypeError -*unsupported operand type(s) for -: 'list' and 'float'*- saying that the "-" operand can't operate between a list and a float. So why does it work with an array? \
This is called `Broadcasting` and is a super useful feature in NumPy (not exclusive to it!) You can read more about it [here](https://numpy.org/devdocs/user/basics.broadcasting.html) if you like but what it basically does is trying to `match` mismatched arrays involved in an operation, by stretching the smaller one through repetition! Naturally there are rules:
> *When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing (i.e. rightmost) dimensions and works its way left. Two dimensions are compatible when 1) they are equal, OR 2) one of them is 1.*   
In the second case, instead of throwing an error, NumPy repeats the smaller array along the dimension(s) until it becomes of the same size as the bigger one. In our case for example, NumPy converts 0.5 to $array([0.5]*10)$ before applying the "-" operator. \
Back to our classifier. Now we need to round up the numbers which can be easily done using the ceil() method:

In [None]:
binary_labels = np.ceil(float_labels)
binary_labels

Finally we convert them to integers to have clean binaries:

In [None]:
binary_labels = binary_labels.astype(int)
binary_labels

Putting them all together:

In [25]:
labels = np.ceil(np.random.random(10) - .5).astype(int)
predictions = np.ceil(np.random.random(10) - .5).astype(int)

Now we have our artificial arrays and we need to compare them to calculate the accuracy. If you were working with lists, you probably needed to loop over the elements and compare them one by one. But since we have arrays, we can do easier/faster/better! \
Let's see what happens when you check the equality of two arrays, as a whole!

In [None]:
labels == predictions

As you see, instead of comparing the whole arrays, NumPy compares them element by element, and returns the result as a Boolean array of `True` and `False` values. \
Now all we need to do is counting the number of `True`s in this array and dividing it by its length, for which we can use the count() method for lists:

In [None]:
compare = list(labels == predictions)
accuracy = compare.count(True)/len(compare)
accuracy

This is the accuracy of our imaginary classifier! \
You can also do it all with arrays, like this :)

In [32]:
compare = (labels == predictions)
accuracy = compare.mean()
accuracy

0.7

Here NumPy treats the Boolean array as a binary (True = 1, False = 0) and calculates the mean of it. Quite convenient!

> Q: Go back to the smiley excercise. See if you can use `Broadcasting` to make the modifications even easier!