In [1]:
import base64

# Derivatives

A derivative is a rate of change. We often speak of derivatives of things with respect to other things. In machine learning, you will often hear derivatives referred to as gradients. They mean the same thing.

Let's say $x$ is a real number. Consider the following function $f$.

$f(x) = 2x$

As $x$ changes, $f(x)$ changes by twice that mucn, so we can say that the derivative of $f(x)$ with respect to $x$ is 2.

Consider this function $z$.

$z(x, w, b) = wx + b$

What is the derivative of $z(x, w, b)$ with respect to $w$? Fill in your answer as a string (`str`) in the following cell.

In [2]:
answer = "x"  # <- Put your answer here!
base64.b64encode(answer.encode()) == b'eA=='

True

Because people are lazy, they may just say $z$ instead of $z(x, w, b)$ to refer to the output of $z$.

I will do that here. What is the derivative of $z$ with respect to $x$?

In [3]:
answer = "w"
base64.b64encode(answer.encode()) == b'dw=='

True

There are many ways of notating derivatives in math. All these mean the same thing.
> the derivative of $z$ with respect to $b$

> $\dfrac{\partial z}{\partial b}$

> $D_b z$

> $\nabla_b z$

What is $\nabla_b z$?

In [7]:
answer = "1"
base64.b64encode(answer.encode()) == b'MQ=='

True

In [5]:
hint = True # <- Change this to True to see a hint!
if hint:
   print(base64.b64decode(b'SG93IG11Y2ggZG9lcyB6IGNoYW5nZSBpZiB5b3UgYWRkIDEgdG8gYj8=').decode())

How much does z change if you add 1 to b?


## The power rule

This trick might come in handy.
> The derivative of $x^p$ with respect to $x$ is $px^{p-1}$.

Using notation from before,
> $\nabla_x x^p = px^{p-1}$

Using different notation,
> $\dfrac{\partial x^p}{\partial x} = px^{p-1}$

What's $\nabla_x \; 5x + 4x^2 + 3x^3$?

In [8]:
answer = "5 + 8x + 9x^2"  # <- Replace the underscores!
base64.b64encode(answer.encode()) == b'NSArIDh4ICsgOXheMg=='

True

## The chain rule

The chain rule helps you determine derivatives of functions inside functions. The rule states that if you have two functions $f$ and $g$, and you compose them into a function $g(f(x))$, then

> $\nabla_x g = \nabla_{f} g \nabla_x f$

Here it is again using different notation.

> $\dfrac{\partial g}{\partial x} = \dfrac{\partial g}{\partial f} \dfrac{\partial f}{\partial x}$



Let's say we have these two functions:

$f (x) = 2x + 3$

$g (x) = 3 f(x) - 1$

What is $\nabla_{x} g$?

In [9]:
answer = "6"
base64.b64encode(answer.encode()) == b'Ng==' # You are encoding the string literal 6 in base64 and then comparing it with Ng==' whoch is the base64 representation of 6

True

## The average derivative

Consider again $z(x, w, b) = wx + b$. Let's evaluate $z$ using some $w$, some $b$ and a bunch of different $x$s, let's say $n$ of them, so that $x$ is the tuple $(x_1, x_2, x_3, \dots, x_n)$.

Then, what is the average of $\nabla_w z$ over all of $x$? Said differently, what is $\dfrac{\sum_{i=1}^{n} \nabla_w z(x_i, w, b)}{n}$?

In [10]:
answer = "the sum of x divided by n"  # <- Replace the underscores!
base64.b64encode(answer.encode()) == b'dGhlIHN1bSBvZiB4IGRpdmlkZWQgYnkgbg=='

True

Uncomment the following cell to see a hint.

In [11]:
print(base64.b64decode(b'SG93IG11Y2ggZG9lcyB6IGNoYW5nZSBpZiB5b3UgYWRkIDEgdG8gYj8=').decode())

How much does z change if you add 1 to b?


Another hint.

In [12]:
print(base64.b64decode(b'VGhlIGRlcml2YXRpdmUgb2YgYSBzdW0gaXMgdGhlIHN1bSBvZiB0aGUgZGVyaXZhdGl2ZXMgb2YgZWFjaCBpdGVtLg==').decode())

The derivative of a sum is the sum of the derivatives of each item.


# Basic linear algebra operations

## Vector addition

Vectors are elements of a vector space. In order to be a vector, you have to be able to add two of them together and get a vector in the same space. Let's say we have a tuple of four real numbers. Let's name it $\mathbf{x}$.

$\mathbf{x} = \begin{bmatrix} 2 \\ 0 \\ 2 \\ 3 \end{bmatrix}$

All of the following statements mean the same thing.
> $\mathbf{x}$ is an element of the set of all tuples of four real numbers. 

> $\mathbf{x}$ is an element in the Cartesian product of four real number spaces.

> $\mathbf{x} \in \mathbb{R}^4$.

Let's make another vector in $\mathbb{R}^4$ and call it $\mathbf{y}$.

$\mathbf{y} = \begin{bmatrix} -1 \\ 3 \\ 3 \\ 7 \end{bmatrix}$

We can add $\mathbf{x}$ and $\mathbf{y}$ together and get another vector in $\mathbb{R}^4$. To do so, we add corresponding elements.

$\mathbf{x} + \mathbf{y} = \begin{bmatrix} 2 \\ 0 \\ 2 \\ 3 \end{bmatrix} + \begin{bmatrix} -1 \\ 3 \\ 3 \\ 7 \end{bmatrix} = \begin{bmatrix} 2-1 \\ 0+3 \\ 2+3 \\ 3+7 \end{bmatrix} = \begin{bmatrix} 1 \\ 3 \\ 5 \\ 10 \end{bmatrix}$

What's $\mathbf{y} + \mathbf{y}$?

In [13]:
answer = (-2,
          6,
          6,
          14)  # <- Replace the underscores!
base64.b64encode(str(answer).encode()) == b'KC0yLCA2LCA2LCAxNCk='

True

## Vector multiplication

Another thing you can do with vectors is multiply them by a scalar, and still end up with a vector in the same space. To do this, just multiply each element of the vector by the scalar.

What's $-2 \mathbf{x}$?

In [15]:
answer = (-4,
          0,
          -4,
          -6)
base64.b64encode(str(answer).encode()) == b'KC00LCAwLCAtNCwgLTYp'

True

Vectors have other features.
> If you subtract a vector from itself (through multiplying one of them by -1 and vector addition), you will get a zero vector $\mathbf{0}$ in the same space.

> If you add $\mathbf{0}$ to any vector, you get the same vector.

> If you multiply a vector by 1, you get the same vector.

# Matrix addition

Matrices are tuples of vectors. Using column vectors, as we have so far, the vectors are shown next to each other horizontally. Let's put $\mathbf{x}$ and $\mathbf{y}$ from before in a matrix. We'll end up with


$\begin{bmatrix} 2 & -1 \\ 0 & 3 \\ 2 & 3 \\ 3 & 7 \end{bmatrix}$

Matrix addition is like vector addition: just add the corresponding elements to each other.

What's $\begin{bmatrix} 2 & -1 \\ 0 & 3 \end{bmatrix} + \begin{bmatrix} 2 & 3 \\ 3 & 7 \end{bmatrix}$ ?

In [16]:
answer = ((4, 2),
          (3, 10))
base64.b64encode(str(answer).encode()) == b'KCg0LCAyKSwgKDMsIDEwKSk='

True

## Matrix multiplication

There are at least these kinds of matrix multiplication:
* Multiplication by a matrix (what matrix multiplication usually means)
* Multiplication by a scalar.
* Element-wise multiplication (also called Hadamard product)

### Multiplication by a scalar

Multiplication by a scalar is simple: just multiply all elements by that scalar.

What's $\; 2\begin{bmatrix} 2 & -1 \\ 0 & 3 \end{bmatrix} \;$?

In [17]:
answer = ((4, -2),
          (0, 6))
base64.b64encode(str(answer).encode()) == b'KCg0LCAtMiksICgwLCA2KSk='

True

### Multiplication by a matrix

Multiplication by a matrix is a bit more complex. You have to multiply the rows of the first matrix with the columns of the second matrix element-wise, sum them together, and arrange the result in a matrix accordingly. Let's start with a simple case: Let matrix $\mathbf{A}$ have one row and matrix $\mathbf{B}$ have one column.

$\mathbf{A} = \begin{bmatrix} a_1 & a_2 & a_3 \end{bmatrix}, \quad \mathbf{B} = \begin{bmatrix} b_1 \\ b_2 \\ b_3\end{bmatrix}$

The product $\mathbf{AB}$ is a matrix whose only element is the sum of the element-wise product of this row and column.

$\mathbf{AB} = \begin{bmatrix} a_1 b_1 + a_2 b_2 + a_3 b_3 \end{bmatrix}$.

Now let's add another row to $\mathbf{A}$, such that

$\mathbf{A} = \begin{bmatrix} a_{1,1} & a_{1,2} & a_{1,3} \\
                              a_{2,1} & a_{2,2} & a_{2,3} \end{bmatrix}$

Now $\mathbf{AB}$ has two rows: one for each row of $\mathbf{A}$.

$\mathbf{AB} = \begin{bmatrix} a_{1,1} b_1 + a_{1,2} b_2 + a_{1,3} b_3 \\
                               a_{2,1} b_1 + a_{2,2} b_2 + a_{2,3} b_3 \end{bmatrix}$.

What's $\begin{bmatrix} 1 & 2 & 3 \\
                        4 & 5 & 6 \end{bmatrix} 
\begin{bmatrix} -3 \\
                -2 \\
                -1 \end{bmatrix}\;$?

In [21]:
answer = (-10,
          -28)
base64.b64encode(str(answer).encode()) == b'KC0xMCwgLTI4KQ=='

True

Now, let's add a row to $\mathbf{B}$.

$\mathbf{A} = \begin{bmatrix} a_{1,1} & a_{1,2} & a_{1,3} \\
                              a_{2,1} & a_{2,2} & a_{2,3} \end{bmatrix} 
\quad 
\mathbf{B} = \begin{bmatrix} b_{1,1} & b_{1,2} \\
                              b_{2,1} & b_{2,2} \\
                              b_{3,1} & b_{3,2} \end{bmatrix}$

Now that $\mathbf{B}$ as two columns, $\mathbf{AB}$ has two columns. The first column is the same as before. The second column is $\mathbf{A} \begin{bmatrix} b_{1,2} \\
                                      b_{2,2} \\
                                      b_{3,2} \end{bmatrix}$.

Now what's $\begin{bmatrix} 1 & 2 & 3 \\
                        4 & 5 & 6 \end{bmatrix} 
\begin{bmatrix} -3 & 0 \\
                -2 & 1 \\
                -1 & 2 \end{bmatrix}\;$?

In [22]:
answer = ((-10, 8),
          (-28, 17))
base64.b64encode(str(answer).encode()) == b'KCgtMTAsIDgpLCAoLTI4LCAxNykp'

True

Notice that in general $\mathbf{AB} \neq \mathbf{BA}$.

### Element-wise multiplication

To do this, multiply the corresponding elements of two matrices.

Be very careful when you talk about this; people can confuse this with matrix multiplication. I'll use the symbol $\odot$ to mean element-wise multiplication.

Let's say $\mathbf{A} = \begin{bmatrix} a_{1,1} & a_{1,2} \\
                                        a_{2,1} & a_{2,2}  \end{bmatrix}$
and
$\mathbf{B} = \begin{bmatrix} b_{1,1} & b_{1,2} \\
                              b_{2,1} & b_{2,2} \end{bmatrix}$.
                              
Then $\mathbf{A} \odot \mathbf{B} = 
\begin{bmatrix} a_{1,1} b_{1,1} & a_{1,2} b_{1,2} \\
                a_{2,1} b_{2,1} & a_{2,2} b_{2,2} \end{bmatrix}$.

## Broadcasting

Broadcasting is repeating matrix elements many times in order to make normally incompatible shapes compatible. Strict linear algebra doesn't allow this but broadcasting is common in machine learning, and is often implied in machine learning-related math notation.

I'll show you some things you can do with broadcasting. I'll use these things to demonstrate:
* scalar $s$
* vector $\mathbf{v} = \begin{bmatrix} v_1 \\
                                       v_2 \end{bmatrix}$
* matrix $\mathbf{M} = \begin{bmatrix} m_{1,1} & m_{1,2} \\
                              m_{2,1} & m_{2,2} \end{bmatrix}$


You can add a scalar to a matrix.

$\mathbf{M} + s = \begin{bmatrix} m_{1,1} + s & m_{1,2} + s \\
                                  m_{2,1} + s & m_{2,2} + s \end{bmatrix}$

Add a vector to a differently-shaped matrix.

$\mathbf{M} + \mathbf{v} = \begin{bmatrix} m_{1,1} + v_1 & m_{1,2} + v_1 \\
                                           m_{2,1} + v_2 & m_{2,2} + v_2 \end{bmatrix}$
                                           
You can multiply a matrix element-wise with a differently-shaped vector. 

$\mathbf{M} \odot \mathbf{v} = \begin{bmatrix} m_{1,1} v_1 & m_{1,2} v_1 \\
                                               m_{2,1} v_2 & m_{2,2} v_2 \end{bmatrix}$

What's 
$\begin{bmatrix} 1 & 3 \end{bmatrix}
\odot
\begin{bmatrix} 2 & 4 \\
                6 & 8 \end{bmatrix}$?

In [None]:
answer = ((_, _),
          (_, _))
if base64.b64encode(str(answer).encode()) in (b'KDIwLCAyOCk=', b'KCgyMCwpLCAoMjgsKSk='):
    print(base64.b64decode(b'V3Jvbmcga2luZCBvZiBtYXRyaXggbXVsdGlwbGljYXRpb24u').decode())
base64.b64encode(str(answer).encode()) == b'KCgyLCAxMiksICg2LCAyNCkp'