# Lab 1 -  Scientific computing in Python with Numpy

This tutorial provides a brief introduction to scientific computing in Python with [Numpy](www.numpy.org). This package contains (among other things):
 - a powerful N-dimensional array,
 - sophisticated array operations,
 - tools for integrating C/C++ and Fortran code,
 - useful linear algebra, Fourier transform, and random number capabilities.

One good reason to use Numpy is the computational efficiency. For example, suppose you want to compute the exponential of a list of numbers. An option would be to use the function `math.exp()`. Unfortunately, the latter doesn't work on lists!

In [3]:
import math

x = [1, 2, 3]

math.exp(x) # you will see this give an error when you run it, because x is a list.

TypeError: must be real number, not list

In machine learning, we always organize our data in vectors and matrices. That's why Numpy is more useful than the "math" library. In fact, if $x = (x_1, x_2, ..., x_N)$ is a vector, then the function `numpy.exp(x)` will apply the exponential to every element of `x`, and the output will be $e^x = (e^{x_1}, e^{x_2}, ..., e^{x_N})$. The advantange of doing so is that *for*-loops and *while*-loops can be entirely replaced with fast operations on vectors and matrices!

In [4]:
import numpy as np

x = np.array([1, 2, 3])

print(np.exp(x)) 

[ 2.71828183  7.3890561  20.08553692]


Any time you need more info on a numpy function, we encourage you to look at [the official documentation](https://docs.scipy.org/doc/numpy/reference/index.html). You can also create a new cell in the notebook and write `np.exp?` (for example) to get quick access to the documentation. Now, let's get started!

---
Before reading this tutorial you should know a bit of Python. If you would like to refresh your memory, take a look at the [Python tutorial](http://docs.python.org/tut/).

---

## 1. Tensors

A Numpy tensor is a table of elements (usually numbers), all of the same type, indexed by a tuple of positive integers. Its class is called `numpy.array`. The more important attributes of a Numpy array are:

 - `ndim` - The number of axes of the array.
 
 - `shape` - A tuple of integers indicating the size of the array in each axis.
 
 - `size` - The total number of elements of the array (equal to the product of the elements of `shape`).

 - `dtype` - The type of the elements in the array (e.g., `numpy.int32`, `numpy.float64`, etc).

### Scalars (0D tensors)

A tensor that contains only one number is called a scalar. In Numpy, a `float32` or `float64` number is a scalar tensor. Here’s a Numpy scalar:

In [5]:
x = np.array(12)
x

array(12)

A scalar has **no axis**.

In [6]:
x.ndim

0

In [7]:
x.shape

()

### Vectors (1D tensors)

An array of numbers is called a vector (or 1D tensor). Following is a Numpy vector:

In [8]:
a = np.array([0, 1, 8, 27, 64, 125, 216, 343, 512, 729])
a

array([  0,   1,   8,  27,  64, 125, 216, 343, 512, 729])

A vector has **one axis**. 

In [9]:
a.ndim

1

This particular vector has ten entries, so its size is 10 and its shape is (10,).

In [10]:
a.shape

(10,)

Vectors can be indexed, sliced and iterated over, much like lists and other Python sequences.

In [11]:
# Access to the element in position 2
a[2]

8

In [12]:
# Access to elements from position 2 to position 4
a[2:5]

array([ 8, 27, 64])

In [None]:
# Modify elements from start to position 5, every 2nd element (Sintax: [start:stop:step])
a[:6:2] = -100    
a

In [13]:
# Reverse a
a[::-1]

array([729, 512, 343, 216, 125,  64,  27,   8,   1,   0])

In [14]:
# Iterate over the elements of a
for i in a:
    print(i)

0
1
8
27
64
125
216
343
512
729


### Matrices (2D tensors)

An array of vectors is called a matrix (or 2D tensor).  This is a Numpy matrix:

In [15]:
b = np.array([[5, 78, 2, 34, 0],
              [6, 79, 3, 35, 1],
              [7, 80, 4, 36, 2],
              [8, 81, 5, 37, 3]])
b

array([[ 5, 78,  2, 34,  0],
       [ 6, 79,  3, 35,  1],
       [ 7, 80,  4, 36,  2],
       [ 8, 81,  5, 37,  3]])

A matrix has **two axes**. 

In [16]:
b.ndim

2

This specific matrix has four-by-five entries. Hence, its size is 20 and its shape is (4,5).

In [17]:
b.shape

(4, 5)

Note that:
 - You can visually interpret a matrix as a rectangular grid of numbers. 
 - The entries from the first axis (axis=0) are called **rows**
 - The entries from the second axis (axis=1) are called **columns**.
 - Matrix elements can be accessed with two indices, given in a tuple separated by commas.

In [18]:
# Acces to the element in position (2,3)
b[2,3]

36

In [19]:
# Access to the 2nd column
b[:,1]

array([78, 79, 80, 81])

In [20]:
# Access to 2nd and 3rd rows
b[1:3,:]

array([[ 6, 79,  3, 35,  1],
       [ 7, 80,  4, 36,  2]])

When fewer indices are provided than the number of axes, the missing indices are considered complete slices: e.g., `b[i]` is treated as `b[i,:]`.

In [21]:
# Access the last row
b[-1]

array([ 8, 81,  5, 37,  3])

The `None` object can be used in all slicing operations to create an axis of length one.

In [22]:
c = b[-1] # This is a vector
c.shape

(5,)

In [23]:
c = b[-1,None] # This is a matrix with one row
c.shape

(1, 5)

In [24]:
c = b[:,[-1]] # This is a matrix with one column
c.shape

(4, 1)

Iterating over matrices is done with respect to the rows:

In [25]:
for row in b:
    print(row)

[ 5 78  2 34  0]
[ 6 79  3 35  1]
[ 7 80  4 36  2]
[ 8 81  5 37  3]


### ND tensors

If you pack matrices in a new array, you obtain a 3D tensor. By packing 3D tensors in an array, you can create a 4D tensor, and so on. 

Following is a Numpy 3D tensor:

In [26]:
c = np.array([[[5, 78, 2, 34, 0],
               [6, 79, 3, 35, 1],
               [7, 80, 4, 36, 2],
               [8, 81, 5, 37, 3]],
              
              [[78, 2, 34, 0, 5],
               [79, 3, 35, 1, 6],
               [80, 4, 36, 2, 7],
               [81, 5, 37, 3, 8]],
              
              [[0, 5, 78, 2, 34],
               [1, 6, 79, 3, 35],
               [2, 7, 80, 4, 36],
               [3, 8, 81, 5, 37]]])
c

array([[[ 5, 78,  2, 34,  0],
        [ 6, 79,  3, 35,  1],
        [ 7, 80,  4, 36,  2],
        [ 8, 81,  5, 37,  3]],

       [[78,  2, 34,  0,  5],
        [79,  3, 35,  1,  6],
        [80,  4, 36,  2,  7],
        [81,  5, 37,  3,  8]],

       [[ 0,  5, 78,  2, 34],
        [ 1,  6, 79,  3, 35],
        [ 2,  7, 80,  4, 36],
        [ 3,  8, 81,  5, 37]]])

A ND tensor has **3 or more** axes.

In [27]:
c.ndim

3

This tensor has three-by-four-by-five entries. Hence, its size is 60 and its shape is (3,4,5).

In [28]:
c.shape

(3, 4, 5)

Tensor elements can be accessed by giving one index per axis. 

In [29]:
# Acces to the element in position (1,3,2)
c[1,3,2]

37

Iterating over multidimensional arrays is done with respect to the first axis:

In [30]:
for matrix in c:
    print(matrix)

[[ 5 78  2 34  0]
 [ 6 79  3 35  1]
 [ 7 80  4 36  2]
 [ 8 81  5 37  3]]
[[78  2 34  0  5]
 [79  3 35  1  6]
 [80  4 36  2  7]
 [81  5 37  3  8]]
[[ 0  5 78  2 34]
 [ 1  6 79  3 35]
 [ 2  7 80  4 36]
 [ 3  8 81  5 37]]


---
#### Exercise - Array creation and manipulation

This is a collection of short exercises that will help you understand the basics of Numpy arrays. Solve them!

1. Create a vector of size 10, with all zeros but the fifth value which is 1. *Hint:* `numpy.zeros()`.
```
Expected output: [0 0 0 0 1 0 0 0 0 0]
```

- Create a vector with values ranging from 10 to 19. *Hint:* `numpy.arange()`.
```
Expected output: [10 11 12 13 14 15 16 17 18 19]
```

- Create a 8x8 matrix and fill it with a checkerboard pattern.
```
Expected output: [[0 1 0 1 0 1 0 1]
                    [1 0 1 0 1 0 1 0]
                    [0 1 0 1 0 1 0 1]
                    [1 0 1 0 1 0 1 0]
                    [0 1 0 1 0 1 0 1]
                    [1 0 1 0 1 0 1 0]
                    [0 1 0 1 0 1 0 1]
                    [1 0 1 0 1 0 1 0]]
```

In [42]:
print('Execice 1 :')
a = np.zeros(10)
a[4] = 1
print(a)

print('Exercice 2 :')
b = np.arange(10,20)
print(b)

print('Exercice 3 :')
c = np.zeros((8,8))
c[1::2, 0::2] = 1
c[0::2, 1::2] = 1
print(c)


Execice 1 :
[0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
Exercice 2 :
[10 11 12 13 14 15 16 17 18 19]
Exercice 3 :
[[0. 1. 0. 1. 0. 1. 0. 1.]
 [1. 0. 1. 0. 1. 0. 1. 0.]
 [0. 1. 0. 1. 0. 1. 0. 1.]
 [1. 0. 1. 0. 1. 0. 1. 0.]
 [0. 1. 0. 1. 0. 1. 0. 1.]
 [1. 0. 1. 0. 1. 0. 1. 0.]
 [0. 1. 0. 1. 0. 1. 0. 1.]
 [1. 0. 1. 0. 1. 0. 1. 0.]]


### Reshaping

An array has a shape given by the number of elements along each axis:

In [43]:
a = np.floor(10*np.random.random((3,4)))

print("Shape: " + str(a.shape))
print()
print(a)

Shape: (3, 4)

[[0. 4. 7. 1.]
 [6. 4. 1. 2.]
 [5. 9. 2. 4.]]


The shape of an array can be changed with various commands, **without** changing the elements in the original array.

In [44]:
# Returns the array flattened as a vector
b = a.ravel()

print("Shape: " + str(b.shape))
print()
print(b)

Shape: (12,)

[0. 4. 7. 1. 6. 4. 1. 2. 5. 9. 2. 4.]


In [45]:
# Returns the array with a modified shape
b = a.reshape(6,2)

print("Shape: " + str(b.shape))
print()
print(b)

Shape: (6, 2)

[[0. 4.]
 [7. 1.]
 [6. 4.]
 [1. 2.]
 [5. 9.]
 [2. 4.]]


In [46]:
# Returns the array, transposed
b = a.T  

print("Shape: " + str(b.shape))
print()
print(b)

Shape: (4, 3)

[[0. 6. 5.]
 [4. 4. 9.]
 [7. 1. 2.]
 [1. 2. 4.]]


If a dimension is given as **-1** in a reshaping operation, the other dimensions are automatically calculated.

In [47]:
b = a.reshape(4,-1)

print("Shape: " + str(b.shape))
print()
print(b)

Shape: (4, 3)

[[0. 4. 7.]
 [1. 6. 4.]
 [1. 2. 5.]
 [9. 2. 4.]]


The `np.reshape` function returns its argument with a modified shape, whereas the `np.array.resize` method modifies the array itself:

In [48]:
print("Shape: " + str(a.shape))
print()
print(a)

Shape: (3, 4)

[[0. 4. 7. 1.]
 [6. 4. 1. 2.]
 [5. 9. 2. 4.]]


In [49]:
a.resize((2,6))

print("Shape: " + str(a.shape))
print()
print(a)

Shape: (2, 6)

[[0. 4. 7. 1. 6. 4.]
 [1. 2. 5. 9. 2. 4.]]


Note that:

 - The order of the elements resulting from `ravel()` is “C-style”, in which the **rightmost index** changes the fastest: `a[0,0]`, then `a[0,1]`, and so on. 
 - If the array is reshaped (or resized) to some other shape, again the array is treated as “C-style”. 
 - The functions `ravel()`, `reshape()` and `resize()` can also be instructed, using an optional argument, to use FORTRAN-style arrays, in which the **leftmost index** changes the fastest: `a[0,0]`, then `a[1,0]`, and so on.

---
#### Exercise - Preparing the training data

In computer science, an image is represented by a 3D array of shape `(height, width, depth)`. However, when you use an image as the input of a machine learning algorithm, you usually convert it to a 4D tensor of shape `(1, depth, height, width)`.

Here is a pictorial view 4D tensor of shape (samples, Color channels, Height, Width). It is hard to represent on a screen a 4D object, so keep in mind this is just a reprensentation that helps you to understand what you are doing.

![4D tensor](https://images4.programmersought.com/753/dc/dc45c040e78f0252e8cd8645be259419.png)

**Instructions:** Implement a function `image2tensor()` that takes a tensor of shape `(height, width, depth)` and returns a tensor of shape `(1, depth, height, width)`. Please don't hardcode the dimensions of image as a constant. Instead look up the quantities you need with `image.shape`. 

**Hints:**
 - [`np.transpose()`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.transpose.html)
 - [`np.reshape()`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html)

In [62]:
def image2tensor(image):
    """
    Argument:
    image -- a numpy array of shape (height, width, depth)
    
    Returns:
    v -- a vector of shape (1, depth, height, width)
    """
    
    ### START CODE HERE ### (≈ 2 lines of code)
    h, w, d = image.shape

    t_i = np.transpose(image, (2,0,1))    
    
    t_f = np.reshape(t_i, (1,d,h,w) )
    ### END CODE HERE ###
    
    return t_f

In [63]:
image = np.stack( [np.arange(12).reshape(3,4), np.arange(50,62).reshape(3,4)], axis=2 )

print("Shape: " + str(image.shape))
print()
print(image)

Shape: (3, 4, 2)

[[[ 0 50]
  [ 1 51]
  [ 2 52]
  [ 3 53]]

 [[ 4 54]
  [ 5 55]
  [ 6 56]
  [ 7 57]]

 [[ 8 58]
  [ 9 59]
  [10 60]
  [11 61]]]


In [64]:
v = image2tensor(image)

print("Shape: " + str(v.shape))
print()
print(v)

Shape: (1, 2, 3, 4)

[[[[ 0  1  2  3]
   [ 4  5  6  7]
   [ 8  9 10 11]]

  [[50 51 52 53]
   [54 55 56 57]
   [58 59 60 61]]]]


**Expected Output**: 

<table style = "width:40%">
    <tr>
    <td>** Shape **</td> 
        <td>(1,2,3,4)</td> 
    </tr>

</table>

In [65]:
assert v.shape == (1,2,3,4)
assert np.all(v[0,0] == image[:,:,0])
assert np.all(v[0,1] == image[:,:,1])

### Copies & Views

When operating and manipulating arrays, their data is sometimes copied into a new array and sometimes not. This is often a source of confusion for beginners. There are three cases:
 1. No Copy at All
 - Shallow copy (view)
 - Deep copy

#### No copy at all

Simple assignments make no copy of array objects or of their data.

In [66]:
a = np.arange(12)

In [67]:
b = a # no new object is created

In [68]:
b is a

True

Python passes mutable objects as references, so function calls make no copy.

In [69]:
def f(x):
    print(id(x))

In [70]:
id(a)

1863666900528

In [71]:
f(a)

1863666900528


#### Shallow copy

Different array objects can share the same data. The view method creates a new array object that looks at the same data.

In [72]:
a = np.array([[i for i in np.arange(j,j+4)] for j in np.arange(0,12,4)])
a

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [73]:
c = a.view()

In [74]:
c is a

False

In [75]:
c.base is a

True

In [76]:
c.resize(2,6)

In [77]:
a.shape

(3, 4)

In [78]:
c[1,2] = 100

In [79]:
print(a)

[[  0   1   2   3]
 [  4   5   6   7]
 [100   9  10  11]]


Slicing an array returns a view of it:

In [80]:
s = a[:,1:3]
s[:] = 10 

In [81]:
print(a)

[[  0  10  10   3]
 [  4  10  10   7]
 [100  10  10  11]]


#### Deep copy

The `copy()` method makes a complete copy of the array and its data.

In [82]:
d = a.copy()  # a new array object with new data is created

In [83]:
d is a

False

In [84]:
d.base is a   

False

In [85]:
d[0,0] = 9999

In [86]:
print(a)

[[  0  10  10   3]
 [  4  10  10   7]
 [100  10  10  11]]


In [87]:
print(d)

[[9999   10   10    3]
 [   4   10   10    7]
 [ 100   10   10   11]]


## 2. Tensor operations

### Element-wise operations

Arithmetic operations on arrays operate **element-by-element**. A new array is created and filled with the result.

In [88]:
a = np.array( [20,30,40,50] )
b = np.array( [1,2,3,4] )

# Arithmetic
c = a-b
c

array([19, 28, 37, 46])

Unlike in many matrix languages, the product operator `*` operates **element-wise** in NumPy arrays, and does not correspond to matrix multiplication. There are special functions for linear algebra that we will cover later.

In [90]:
A = np.array( [[1,1,1],
               [0,1,0]] )
B = np.array( [[2,0,5],
               [3,4,1]] )
C = A*B
C

array([[2, 0, 5],
       [0, 4, 0]])

In binary operations, the two arrays should be the same size. Errors are thrown if arrays do not match in size.

In [91]:
x = np.array([1,2,3])
y = np.array([4,5])

x + y  # you will see this give an error when you run it, because the shapes of x and y don't match.

ValueError: operands could not be broadcast together with shapes (3,) (2,) 

One exception to the above rule is given by operations between a tensor and a scalar.

In [92]:
# Exponentiation
c = b**2
c

array([ 1,  4,  9, 16])

In [93]:
# Logical
a < 35

array([ True,  True, False, False])

NumPy provides standard mathematical functions, such as `sin`, `cos`, and `exp`. They operate **element-wise** on an array, producing an array as output.

In [94]:
c = 10*np.sin(b)
c

array([ 8.41470985,  9.09297427,  1.41120008, -7.56802495])

---
#### Exercise - Sigmoid function

The *sigmoid* is a non-linear function that plays a central role in Logistic Regression and Neural Networks. Mathematically, it is defined as 

$${\rm sigmoid}(x) = \frac{1}{1+e^{-x}}.$$ 

![Sigmoid.png](attachment:Sigmoid.png)

**Instructions:** Build a function that returns the sigmoid of a Numpy array `x`. Note that the sigmoid must be computed on every element of `x`.

$${\rm sigmoid}({\rm x}) = \begin{bmatrix}
    {\rm sigmoid}(x_1)  \\
    {\rm sigmoid}(x_2)  \\
    \vdots  \\
    {\rm sigmoid}(x_N)  \\
\end{bmatrix}
= \begin{bmatrix}
    \dfrac{1}{1+e^{-x_1}}  \\
    \dfrac{1}{1+e^{-x_2}}  \\
    \vdots  \\
    \dfrac{1}{1+e^{-x_N}}  \\
\end{bmatrix} $$

In [101]:
def sigmoid(x):
    """
    Compute the sigmoid of x

    Arguments:
    x -- A scalar or numpy array of any size

    Return:
    s -- sigmoid(x)
    """
    
    ### START CODE HERE ### (≈ 1 line of code)
    s = 1 / (1 + np.exp(-x))
    ### END CODE HERE ###
    
    return s

In [102]:
x = np.array([1, 2, 3])
s = sigmoid(x)

print('Sigmoid([1,2,3]) = ' + str(s))

Sigmoid([1,2,3]) = [0.73105858 0.88079708 0.95257413]


**Expected Output**: 
<table>
    <tr> 
        <td> **sigmoid([1,2,3])**</td> 
        <td> [ 0.73105858,  0.88079708,  0.95257413] </td> 
    </tr>
</table> 

In [103]:
s.shape == (3,1)
np.testing.assert_almost_equal(s, [0.73105858, 0.88079708, 0.95257413])

### Reduction

Some operations act on all the elements in a tensor, returning a scalar.

In [104]:
a = np.floor(10*np.random.random((2,3)))
a

array([[4., 9., 9.],
       [1., 3., 7.]])

In [105]:
# sum of all the elements
a.sum()

33.0

In [106]:
# minimum of all the elements
a.min()

1.0

In [107]:
# cumulative sum of all the elements (in C-style order)
a.cumsum()

array([ 4., 13., 22., 23., 26., 33.])

By default, these operations apply to the array as though it were a list of numbers, regardless of its shape. However, by specifying the `axis` parameter, you can apply an operation along the specified axis of a multidimensional array (matrices and ND tensors), placing the results in a return array.

In [108]:
# sum of each column
a.sum(axis=0) 

array([ 5., 12., 16.])

In [109]:
# minimum of each row
a.min(axis=1)

array([4., 1.])

In [110]:
# cumulative sum along each row
a.cumsum(axis=1)

array([[ 4., 13., 22.],
       [ 1.,  4., 11.]])

Note that many reduction operations change the dimensions of the input array: e.g., a matrix becomes a vector. You can keep the dimensions of the input array by setting the `keepdims` parameter to `True`.

In [111]:
b = a.min(axis=1)

print(b)
print()
print("Shape: " + str(b.shape))

[4. 1.]

Shape: (2,)


In [112]:
c = a.min(axis=1, keepdims=True)

print(c)
print()
print("Shape: " + str(c.shape))

[[4.]
 [1.]]

Shape: (2, 1)


---

#### Exercise - Row normalization

A common technique we use in Machine Learning is to normalize the data. Here, by normalization we mean dividing each row vector of x by its norm. For example, if $$x = 
\begin{bmatrix}
    0 & 3 & 4 \\
    2 & 6 & 4 \\
\end{bmatrix}\tag{3}$$ then $${\sf row\_norms}(x) = \begin{bmatrix}
    5 \\
    \sqrt{56} \\
\end{bmatrix}\tag{4} $$and        $$ {\sf x\_normalized} = \frac{x}{{\sf row\_norms}(x)} = \begin{bmatrix}
    0 & \frac{3}{5} & \frac{4}{5} \\
    \frac{2}{\sqrt{56}} & \frac{6}{\sqrt{56}} & \frac{4}{\sqrt{56}} \\
\end{bmatrix}\tag{5}$$ 

Note that you can divide matrices of different sizes and it works fine: this is called broadcasting and you're going to learn about it in the next section.


**Instruction**: Implement `normalizeRows()` to normalize the rows of a matrix. After applying this function to an input matrix x, each row of x should be a vector of unit length (length = 1).

In [113]:
def normalizeRows(x):
    """
    Implement a function that normalizes each row of the matrix x (to have unit length).
    
    Argument:
    x -- A numpy matrix of shape (n, m)
    
    Returns:
    x -- The normalized (by row) numpy matrix. You are allowed to modify x.
    """
    
    ### START CODE HERE ### (≈ 2 lines of code)
    
    # Compute the norm of the rows of x. Use np.linalg.norm(..., ord = ..., axis = ..., keepdims = True)
    norms = np.linalg.norm(x, ord = 2, axis = 1, keepdims = True)
    
    # Divide x by its norms.
    x_norm = x / norms
    
    ### END CODE HERE ###
    
    return x_norm

In [114]:
x = np.array([[0, 3, 4],
              [1, 6, 4]])
x_norm = normalizeRows(x)

print("normalizeRows(x):\n" + str(x_norm))

normalizeRows(x):
[[0.         0.6        0.8       ]
 [0.13736056 0.82416338 0.54944226]]


**Expected Output**: 

<table style="width:60%">

     <tr> 
       <td> **normalizeRows(x)** </td> 
       <td> [[ 0.          0.6         0.8       ]
             [0.13736056  0.82416338  0.54944226]]
       </td> 
     </tr>
    
   
</table>

In [115]:
assert x.shape == x_norm.shape
np.testing.assert_almost_equal(x_norm, [[0., 0.6, 0.8],[0.13736, 0.82416, 0.54944]], decimal=4)

In `normalizeRows()`, you can try to print the shapes of `x_norm` and `x`, and then rerun the assessment. You'll find out that they have different shapes. This is normal given that `x_norm` takes the norm of each row of `x`. So `x_norm` has the same number of rows but only 1 column. So how did it work when you divided `x` by `x_norm` ? This is called broadcasting and we'll talk about it now! 

### Broadcasting

Numpy operations usually involve a pair of arrays, which are combined on an element-by-element basis to produce the result. In the simplest case, the two arrays must have exactly the same shape. However, this constraint can be relaxed when the smaller array can be "broadcasted" across the larger array so that they have compatible shapes. The simplest broadcasting example occurs when an array and a scalar value are combined in an operation:

In [None]:
a = np.array([1.0, 2.0, 3.0])
b = 2.0
a * b

We can think of the scalar `b` being **stretched** during the arithmetic operation into an array with the same shape as the vector `a`. NumPy is smart enough to use the original scalar value without actually making copies, so that broadcasting operations are memory and computationally efficient.

![scalar_broadcast.gif](attachment:scalar_broadcast.gif)


More generally, when operating on two arrays, Numpy compares their shapes according to the following rules:

- **First rule.** If the input arrays do not have the same number of dimensions, a **1** is repeatedly prepended to the shapes of the smaller array until it has the same number of dimensions as the bigger array.

- **Second rule.** Arrays with size **1** along a particular dimension are *stretched* to match the size of the array with the largest shape along that dimension.

After application of these rules, the sizes of all arrays must match, otherwise an exception is thrown (indicating that the arrays have incompatible shapes). 

![vector_broadcast.gif](attachment:vector_broadcast.gif)

Here are examples of shapes that broadcast:
```
A      (2d array):  5 x 4
B      (1d array):      1
Result (2d array):  5 x 4

A      (2d array):  5 x 4
B      (1d array):      4
Result (2d array):  5 x 4

A      (3d array):  15 x 3 x 5
B      (3d array):  15 x 1 x 5
Result (3d array):  15 x 3 x 5

A      (3d array):  15 x 3 x 5
B      (2d array):       3 x 5
Result (3d array):  15 x 3 x 5

A      (3d array):  15 x 3 x 5
B      (2d array):       3 x 1
Result (3d array):  15 x 3 x 5

A      (4d array):  8 x 1 x 6 x 1
B      (3d array):      7 x 1 x 5
Result (4d array):  8 x 7 x 6 x 5
```

The following example shows an outer addition operation of two 1-d arrays:

In [None]:
a = np.array([0.0, 10.0, 20.0, 30.0])
b = np.array([0.0, 1.0, 2.0])

a[:, None] + b

The `None` index operator inserts a new axis into `a`, making it a `4x1` matrix. Then, it is combined with `b`, which has shape `(3,)`, yielding a `4x3` array.


![outer_broadcast.gif](attachment:outer_broadcast.gif)

Here are examples of shapes that **do not** broadcast:
```
A      (1d array):  3
B      (1d array):  4

A      (2d array):      2 x 1
B      (3d array):  8 x 4 x 3
```

The following example shows an incompatible operation: 

In [None]:
a = np.arange(12).reshape(4,3)
b = np.ones(4)
a + b

When the trailing dimensions of the arrays are unequal, broadcasting fails because it is impossible to align the values in the rows of the 1st array with the elements of the 2nd arrays for element-by-element addition.

![bad_broadcast.gif](attachment:bad_broadcast.gif)

---

#### Exercise - Softmax function

The *softmax* is a normalizing function used when your algorithm needs to classify two or more classes. Mathematically, it is defined as follows.

- For a vector:

$${\sf softmax}({\rm x}) = {\sf softmax}\left(\begin{bmatrix}
    x_1  &
    x_2 &
    ...  &
    x_N  
\end{bmatrix}\right) = \begin{bmatrix}
     \dfrac{e^{x_1}}{\sum_{j=1}^N e^{x_j}}  &&
    \dfrac{e^{x_2}}{\sum_{j=1}^N e^{x_j}}  &&
    \dots  &&
    \dfrac{e^{x_N}}{\sum_{j=1}^N e^{x_j}} 
\end{bmatrix} $$

- For a matrix:

$${\sf softmax}({\rm X}) = {\sf softmax}\left(\begin{bmatrix}
    x_{11} & x_{12} & x_{13} & \dots  & x_{1n} \\
    x_{21} & x_{22} & x_{23} & \dots  & x_{2n} \\
    \vdots & \vdots & \vdots & \ddots & \vdots \\
    x_{m1} & x_{m2} & x_{m3} & \dots  & x_{mn}
\end{bmatrix}\right) = \begin{bmatrix}
    \frac{e^{x_{11}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{12}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{13}}}{\sum_{j}e^{x_{1j}}} & \dots  & \frac{e^{x_{1n}}}{\sum_{j}e^{x_{1j}}} \\
    \frac{e^{x_{21}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{22}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{23}}}{\sum_{j}e^{x_{2j}}} & \dots  & \frac{e^{x_{2n}}}{\sum_{j}e^{x_{2j}}} \\
    \vdots & \vdots & \vdots & \ddots & \vdots \\
    \frac{e^{x_{m1}}}{\sum_{j}e^{x_{mj}}} & \frac{e^{x_{m2}}}{\sum_{j}e^{x_{mj}}} & \frac{e^{x_{m3}}}{\sum_{j}e^{x_{mj}}} & \dots  & \frac{e^{x_{mn}}}{\sum_{j}e^{x_{mj}}}
\end{bmatrix} = \begin{pmatrix}
    {\sf softmax}\text{(first row)}  \\
    {\sf softmax}\text{(second row)} \\
    ...  \\
    {\sf softmax}\text{(last row)} \\
\end{pmatrix} $$

**Instructions:** Implement the function `softmax()` in Numpy.

In [None]:
def softmax(x):
    """Calculates the softmax for each row of the input x.

    Your code should work for a row vector and also for matrices of shape (n, m).

    Argument:
    x -- A numpy matrix of shape (n,m)

    Returns:
    s -- A numpy matrix equal to the softmax of x, of shape (n,m)
    """
    
    ### START CODE HERE ### (≈ 3 lines of code)
    
    # Compute the exponential of every element in x. Use np.exp(...).
    x_exp = None

    # Create a vector x_sum that sums each row of x_exp. Use np.sum(..., axis = ..., keepdims = True).
    x_sum = None
    
    # Compute softmax by dividing x_exp by x_sum. It should automatically use numpy broadcasting.
    s = None

    ### END CODE HERE ###
    
    return s

In [None]:
x = np.array([
    [9, 2, 5, 0, 0],
    [7, 5, 0, 0 ,0]])
y = softmax(x)

print("softmax(x):\n" + str(y))

**Expected Output**:

<table style="width:60%">

     <tr> 
       <td> **softmax(x)** </td> 
       <td> [[  9.80897665e-01   8.94462891e-04   1.79657674e-02   1.21052389e-04
    1.21052389e-04]
 [  8.78679856e-01   1.18916387e-01   8.01252314e-04   8.01252314e-04
    8.01252314e-04]]</td> 
     </tr>
</table>


In [None]:
assert x.shape == y.shape
np.testing.assert_almost_equal(y, 
    [[9.80897665e-01, 8.94462891e-04, 1.79657674e-02, 1.21052389e-04, 1.21052389e-04],
     [8.78679856e-01, 1.18916387e-01, 8.01252314e-04, 8.01252314e-04, 8.01252314e-04]], 
    decimal=4)

**Note**: If you print the shapes of `x_exp`, `x_sum` and `s` above and rerun the assessment cell, you will see that `x_sum` is of shape (2,1) while `x_exp` and `s` are of shape (2,5). **`x_exp/x_sum`** works due to python broadcasting.

## 3. Vectorization

In machine learning, you deal with very large datasets. Hence, a slow function can become a huge bottleneck in your algorithm and can result in a model that takes ages to run. To make sure that your code is  computationally efficient, you will use vectorization. Vectorization can be roughly divided in two classes:
1. the problem you're trying to solve is inherently vectorizable and only requires a few numpy tricks to make it faster,
2. you fundamentally have to rethink your problem in order to make it vectorizable.

Luckly, most of machine learning algorithms fall in the first category. In the following, you will see a couple of useful vectorization tricks.

### Scalar product

Let us consider the following problem:
 - Given two vectors `x` and `y` of the same size, we want to compute the sum of `x[i]*y[i]` for all indices `i`. 

One simple and obvious solution is to write the following function.

In [None]:
def compute_dot_python(x, y):
    result = 0
    
    for i in range(len(x)):
        result += x[i] * y[i]
        
    return result

However, this implementation requires a loop, which is known to be slow in Python.

In [None]:
import time

x = np.random.randn(1000)

tic = time.process_time()
compute_dot_python(x,x)
toc = time.process_time()

print ("Computation time (python code) = " + str(1000*(toc - tic)) + "ms")

In [None]:
import timeit

TEST_CODE = ''' 
x = np.random.randn(10000)
y = compute_dot_python(x,x)'''

SETUP_CODE = ''' 
from __main__ import compute_dot_python 
import numpy as np'''

sec = timeit.timeit(TEST_CODE, setup=SETUP_CODE, number=100) / 100

print ("Computation time (python code) =", 1000*sec, "ms")

How to vectorize our problem? It's simple: by removing the **unnecessary loop.** 

If you remember your linear algebra course, you may have identified the expression `result += x[i] * y[i]` as a scalar product. Mathematically, the scalar product between two (column) vectors $x\in\mathbb{R}^N$ and $y\in\mathbb{R}^N$ is

$$
x^\top y = [x_1,\dots,x_N]\begin{bmatrix}y_1\\\vdots\\y_N\end{bmatrix} = x_1 y_1 + \dots + x_N y_N = \sum_{n=1}^N x_n y_n
$$

Numpy provides the function `np.dot()` to do exactly this. 

**Remark:** `np.dot()` is different from `np.multiply()` and the `*` operator, which perform an element-wise multiplication.

In [None]:
import timeit

TEST_CODE = ''' 
x = np.random.randn(10000)
y = np.dot(x, x)'''

SETUP_CODE = ''' 
from __main__ import compute_dot_python 
import numpy as np'''

sec = timeit.timeit(TEST_CODE, setup=SETUP_CODE, number=100) / 100

print ("Computation time (numpy function) =", 1000*sec, "ms")

As you have noticed, the vectorized implementation is much cleaner and more efficient. For bigger vectors, the difference in running time is even larger!

---
#### Exercise - Logistic model

The logistic model is used in Machine Learning to perform binary classification. Mathematically, it is defined as 

$$
f_{\rm w}({\rm x}) = {\sf sigmoid}({\rm w}^\top {\rm x}) = {\sf sigmoid}(w_0 + w_1 x_1 + \dots + w_Q x_Q)
$$

where 

$$
{\rm w} = \begin{bmatrix}w_0\\w_1\\\vdots\\w_Q\end{bmatrix} \in \mathbb{R}^{Q+1}
\qquad\qquad
{\rm x} = \begin{bmatrix}1\\x_1\\\vdots\\x_Q\end{bmatrix} \in \mathbb{R}^{Q+1}
$$

with the convention $x_0=1$.

**Instructions**: Implement the logistic model using `np.dot()`. The function takes the vectors `w` and `x` as inputs. Note that such vectors don't have the same length. You can get around this by using one of following tricks:
1. Insert **1** at the beginning of `x`, and multiply it with `w`.
2. Slice `w` by removing its first element `w[0]`, multiply the remaining elements `w[1], ..., w[Q]` with `x`, and then add `w[0]` to the result.

In [None]:
def logistic_prediction_v0(w, x):
    """Calculates the logistic prediction of the input x.
     
    Arguments:
    w -- A numpy vector of shape (Q+1,)
    x -- A numpy vector of shape (Q  ,)

    Returns:
    s -- A scalar equal to the logistic prediction of x
    """
    
    ### START CODE HERE ### (≈ 2 lines of code)
    
    # Compute w0 + w1*x1 + ... + wQ*xQ. Use np.dot(...).
    w_dot_x = None
    
    # Evaluate the sigmoid function in w_dot_x. Use your function implemented before.
    s = None
    
    ### END CODE HERE ###
    
    return s

In [None]:
x = np.array([3, 2, 7])
w = np.array([-3, 2, 0.5, -1])

s = logistic_prediction_v0(w,x)

print("logistic_prediction_v0(w,x) = " + str(s))

**Expected Output**:

<table style="width:50%">
    <tr>
        <td>  ** logistic prediction (v0) **  </td>
      <td> 0.04742587 </td>
    </tr>
</table>

In [None]:
assert s.shape == ()
np.testing.assert_almost_equal(s, 0.04742587, decimal=4)

### Matrix-vector product

The function `np.dot()` is also capable of performing matrix-vector or vector-matrix products. Assume that `A` is a matrix of size $N\times K$, `b` is a vector of size $K$, and `c` is a vector of size $N$:

$$
A = 
\begin{bmatrix}
a_{1,1} & \dots & a_{1,K}\\
\vdots &  & \vdots\\
a_{N,1} & \dots & a_{N,K}\\
\end{bmatrix}
\qquad\qquad
b = \begin{bmatrix}b_1\\\vdots\\b_K\end{bmatrix}
\qquad\qquad
c = \begin{bmatrix}c_1\\\vdots\\c_N\end{bmatrix}
$$

Mathematically, the exact operation performed by `np.dot()` depends on the order of inputs:

- **`np.dot(A,b)`** multiplies `b` with the *rows* of `A`

$$
Ab = \left[ \sum_{k=1}^K a_{n,k} b_k \right]_{1\le n\le N}
$$

- **`np.dot(c,A)`** multiplies `c` with the *columns* of `A` 

$$
c^\top A = \left[ \sum_{n=1}^N a_{n,k} c_n \right]_{1\le k\le K}
$$

In both cases, the result is a vector (1-D array).

---

#### Exercise - Prediction on batch

In logistic regression, you normally need to compute the prediction for many samples ${\rm x}^{(1)}, \dots, {\rm x}^{(N)}$. This computation can be easily vectorized.

- Stack the samples as rows of a matrix $X$ with size $N\times Q$:

$$
X = \begin{bmatrix}
\_\!\_\; {{\rm x}^{(1)}}^\top \_\!\_ \\
\vdots\\
\_\!\_\; {{\rm x}^{(N)}}^\top \_\!\_ \\
\end{bmatrix}.
$$

- Compute the products $z_n = {\rm w}^\top{\rm x}^{(n)}$ for every index $n$. This is equivalent to multiplying $X$ by ${\rm w}$:

$$ {\rm z} = X{\rm w} =
\begin{bmatrix}
{\rm w}^\top{\rm x}^{(1)}\\
\vdots\\
{\rm w}^\top{\rm x}^{(N)}
\end{bmatrix}.$$


- Compute the logistic function of $z_n$ for every index $n$:

$$
{\sf sigmoid}({\rm z}) =
\begin{bmatrix}
{\sf sigmoid}(z_1)\\
\vdots\\
{\sf sigmoid}(z_N)\\
\end{bmatrix}.
$$



**Instructions**: Modify your implementation of `logistic_prediction()` so that it takes a vector `w` and a matrix `X` as inputs. However, pay attention to the fact that `X` is a matrix of size $N\times Q$, while `w` is a vector of length $Q+1$. As before, you can either slice `w`, or insert a column of **1** in front of `X`.



In [None]:
def logistic_prediction(w, x):
    """Calculates the logistic prediction for each row of the input x.
     
    Arguments:
    w -- A numpy vector of shape (Q+1,)
    x -- A numpy matrix of shape ( N, Q)

    Returns:
    s -- A numpy vector of shape (N,)
    """
    
    ### START CODE HERE ### (≈ 2 lines of code)
    
    # Compute w0 + w1*x1 + ... + wQ*xQ. Use np.dot(...).
    w_dot_x = None

    # Evaluate the sigmoid function in x_dot_w. Use your function implemented before.
    s = None
    
    ### END CODE HERE ###
    
    return s

In [None]:
X = np.array([[4, -5,  2],
              [5,  3, -3],
              [5,  2,  1],
              [8,  6,  9],
              [5, -6,  2]])

y = logistic_prediction(w, X)

print("y.shape: " + str(y.shape))
print("logistic_prediction(w,X): " + str(y))

**Expected Output**:

<table style="width:50%">
    <tr>
      <td>  ** y.shape **  </td>
      <td> (5,) </td>
    </tr>
        <tr>
      <td>  ** logistic prediction **  </td>
      <td> [0.62245933 0.99998987 0.99908895 0.99908895 0.88079708] </td>
    </tr>
</table>

In [None]:
assert y.shape == (5,)
np.testing.assert_almost_equal(y, [0.62245933, 0.99998987, 0.99908895, 0.99908895, 0.88079708], decimal=4)

### Matrix multiplication

When both inputs are matrices (2-D arrays), the function `np.dot()` performs a matrix multiplication.

- Given two matrices $A\in\mathbb{R}^{M\times N}$ and $B\in\mathbb{R}^{N\times K}$, their product is equal to the scalar product between the rows of $A$ and the columns of $B$:

$$
AB = 
\begin{bmatrix}
\_\!\_\; {\rm a}_1^\top \,\_\!\_ \\
\vdots\\
\_\!\_\; {\rm a}_M^\top \,\_\!\_ \\
\end{bmatrix}
\begin{bmatrix}
| & & |\\[-1em]
{\rm b}_1 & \dots & {\rm b}_K\\
| & & |\\
\end{bmatrix}
=
\begin{bmatrix}
{\rm a}_1^\top{\rm b}_1 & \dots & {\rm a}_1^\top{\rm b}_K\\
\vdots & & \vdots\\
{\rm a}_M^\top{\rm b}_1 & \dots & {\rm a}_M^\top{\rm b}_K
\end{bmatrix}.
$$


- In general, the inputs of `np.dot()` can be scalars (0-D arrays), vectors (1-D arrays), matrices (2-D arrays), or tensors (N-D arrays). 


- The exact meaning of `np.dot()` depends on the shape of its inputs.

#### Exercise - Multiclass prediction

In multiclass logistic regression, you have many parameter vectors ${\rm w}_1, \dots, {\rm w}_K$ (one for each class), and you need to compute the prediction for many samples ${\rm x}^{(1)}, \dots, {\rm x}^{(N)}$. This computation can be also vectorized. 

- Store the parameter vectors as columns of a matrix $W$ with size $(Q+1)\times K$.

$$
W = \begin{bmatrix}
| & & |\\[-1em]
{\rm w}_1 & \dots & {\rm w}_K\\
| & & |\\
\end{bmatrix}.
$$


- Compute the products $z_{n,k} = {{\rm x}^{(n)}}^\top {\rm w}_k$ for every index $k$ and $n$. This is equivalent to multiplying $X$ by $W$:

$$ Z = XW = 
\begin{bmatrix}
{{\rm x}^{(1)}}^\top{\rm w}_1 & \dots & {{\rm x}^{(1)}}^\top{\rm w}_K\\
\vdots & & \vdots\\
{{\rm x}^{(N)}}^\top{\rm w}_1 & \dots & {{\rm x}^{(N)}}^\top{\rm w}_K
\end{bmatrix}.
$$


- Compute the softmax on every row ${\rm z}_n$ of $Z$:

$$
{\sf softmax}(Z) = 
\begin{bmatrix}
{\sf softmax}({\rm z}_1)\\
\vdots\\
{\sf softmax}({\rm z}_N)
\end{bmatrix}.
$$

**Instructions**: Implement the softmax prediction using the function `np.dot()`. However, pay attention to the fact that `X` is a matrix of size $N\times Q$, while `W` is a matrix of size $(Q+1)\times K$. As before, you can either insert a column of **1** in front of `X`, or slice `W`.

In [None]:
def softmax_prediction(w, x):
    """Calculates the logistic prediction for each row of the input X.
     
    Arguments:
    w -- A numpy matrix of shape (Q+1,K)
    x -- A numpy matrix of shape ( N, Q)

    Returns:
    s -- A numpy vector of shape (N,K)
    """
    
    ### START CODE HERE ### (≈ 2 lines of code)
    
    # Compute w0 + w1*x1 + ... + wQ*xQ. Use np.dot(...).
    w_dot_x = None
    
    # Evaluate the softmax function in x_dot_w. Use your function implemented before.
    s = None
    
    ### END CODE HERE ###
    
    return s

In [None]:
W = np.array(
    [[-3,  2,  4,  5,  6,  7,  2], 
     [ 2, -1,  4, -2,  4,  2, -7], 
     [ 1,  2, -4,  6, -1,  5,  6], 
     [-1,  1,  6,  7,  3, -3,  5]
    ]
)

y = softmax_prediction(W, X)

print("y.shape: " + str(y.shape))
print("\nsoftmax_prediction(w,x):\n\n" + str(y))
print("\nargmax: " + str( np.argmax(y, axis=1) ))

**Expected Output**:

<table style="width:80%">
    <tr>
      <td>  ** y.shape **  </td>
      <td> (5,7) </td>
    </tr>
        <tr>
      <td>  ** softmax prediction **  </td>
      <td> [[  3.53262855e-24   1.18506486e-27   9.99999994e-01   1.46248622e-31
    5.60279641e-09   2.93748210e-30   2.74878499e-43]
 [  6.91440011e-13   1.56288219e-18   3.87399763e-21   5.24288566e-22
    1.87952882e-12   1.00000000e+00   1.46248623e-31]
 [  5.30303054e-09   1.31448985e-11   6.37744724e-03   2.13939521e-06
    9.46497093e-01   4.71233155e-02   2.00196538e-19]
 [  1.33361482e-34   1.97925988e-32   2.78946809e-10   1.00000000e+00
    2.54366565e-13   1.18506486e-27   3.22134029e-27]
 [  3.22134029e-27   1.97925988e-32   1.00000000e+00   1.64581143e-38
    2.78946809e-10   4.90609473e-35   2.08428284e-52]] </td>
    </tr>
    <tr>
      <td>  ** argmax **  </td>
      <td> [2 5 4 3 2] </td>
    </tr>
</table>

In [None]:
assert y.shape == (5,7)
np.testing.assert_almost_equal(y, 
    [[3.53262855e-24, 1.18506486e-27, 9.99999994e-01, 1.46248622e-31, 5.60279641e-09, 2.93748210e-30, 2.74878499e-43],
     [6.91440011e-13, 1.56288219e-18, 3.87399763e-21, 5.24288566e-22, 1.87952882e-12, 1.00000000e+00, 1.46248623e-31],
     [5.30303054e-09, 1.31448985e-11, 6.37744724e-03, 2.13939521e-06, 9.46497093e-01, 4.71233155e-02, 2.00196538e-19],
     [1.33361482e-34, 1.97925988e-32, 2.78946809e-10, 1.00000000e+00, 2.54366565e-13, 1.18506486e-27, 3.22134029e-27],
     [3.22134029e-27, 1.97925988e-32, 1.00000000e+00, 1.64581143e-38, 2.78946809e-10, 4.90609473e-35, 2.08428284e-52]],
    decimal=5)

---

#### Exercise - Linear regression

Linear regression is a machine learning technique that aims at predicting a continuous output by using a parametric linear model:

$$ f_{\rm w}({\rm x}) = {\rm w}^\top {\rm x} = w_0 + w_1 x_1 + \dots + w_Q x_Q. $$

Hereabove, ${\rm w}$ is a parameter vector to be learned from a set of input-output pairs $\{({\rm x}^{(n)}, y^{(n)})\}$. Specifically, the learning consists of finding the vector ${\rm w}$ such that the prediction $f_{\rm w}({\rm x}^{(n)})$ is close to the true output $y^{(n)}$ for every index $n$. Mathematically, this is equivalent to minimize the following loss function:

$$ J({\rm w}) = \sum_{n=1}^N \big(y^{(n)} - {\rm w}^\top {\rm x}^{(n)}\big)^2. $$

The loss is used to evaluate the performance of the model. The bigger the loss is, the more different the predictions $f_{\rm w}({\rm x}^{(n)})$ are from the true values $y^{(n)}$.


**Instruction**: Implement a function that computes the loss (a scalar) given three inputs: 
 - `w` - the parameter vector, 
 - `X` - the matrix storing all the inputs ${\rm x}^{(n)}$, 
 - `y` - the vector with all the outputs $y^{(n)}$.

Don't use loops. The implementation must be fully vectorized!

In [None]:
def linear_regression_loss(w, X, y):
    """Calculates the loss used in linear regression.
     
    Arguments:
    w -- Numpy vector of shape (Q+1,)
    X -- Numpy matrix of shape (N, Q)
    y -- Numpy vector of shape (N,)

    Returns:
    s -- Scalar
    """
    
    ### START CODE HERE ### (≈ 2 lines of code)
    
    # Compute w0 + w1*x1 + ... + wQ*xQ. Use np.dot(...).
    w_dot_x = None

    # Compute the squared distance between x_dot_w and y. Use np.sum()
    J = None
    
    ### END CODE HERE ###
    
    return J

In [None]:
import numpy as np

In [None]:
w = np.array([-3, 2, 0.5, -1])
X = np.array([[4, -5,  2],
              [5,  3, -3],
              [5,  2,  1],
              [8,  6,  9],
              [5, -6,  2]])
y = np.array([6, -3, 0.5, 1, 0.])

J = linear_regression_loss(w, X, y)

print("Shape: " + str(J.shape))
print("linear_regression_loss(w,X,y): " + str(J))

<table style="width:50%">
    <tr>
      <td>  ** Shape **  </td>
      <td> ( ) </td>
    </tr>
        <tr>
      <td>  ** Loss **  </td>
      <td> 322.75 </td>
    </tr>
</table>

In [None]:
assert J.shape == ()
np.testing.assert_almost_equal(J, 322.75, decimal=1)

## 4. Conclusion

Congratulations! You now have a pretty good understanding of Numpy, and have implemented a few functions that you will be using in deep learning. We hope this little warm-up tutorial helps you in the future assignments, which will be more exciting and interesting! 

**What have you learned:**

- How to use Numpy arrays and its efficient built-in functions
- The concept of Broadcasting
- How to vectorize code

For deeper insights, additional information can be found here:

- [Python](http://docs.python.org/tut/)

- [Numpy](https://docs.scipy.org/doc/numpy/user/quickstart.html)

- [Broadcasting](https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html)

- [Vectorization](http://www.labri.fr/perso/nrougier/from-python-to-numpy/)