<a href="https://colab.research.google.com/github/MJMortensonWarwick/AI-DL/blob/main/0_4-fundamental_maths_for_deep_learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Fundamental Mathematics for Deep Learning 

In a change of direction from our previous Notebooks in this session (on Bayesian methods), here will cover (in as gentle a fashion as possible) some of the underlying mathmatical concepts underpinning deep learning. (We'll also sneak in a bit of an intro to some concepts in TensorFlow, which will be the solution we use in the module). Now I'm sure that has whet your appetite enough so let's begin!

## Scalars, Vectors, Matrices and Tensors
Although we've actually seen some of these concepts already in a programming sense, its worth going over some of them from a more mathematical perspective (and also some of the operations we can apply to them). Let's start with the simplest of these, a scalar:

In [1]:
# import the packages we need in this tutorial
import tensorflow as tf
import numpy as np
import sympy as sym

my_scalar = tf.constant(13)
my_scalar

<tf.Tensor: shape=(), dtype=int32, numpy=13>

Really simple, a scalar is just a single numerical value we may want to use in our caclulations or reporting. It can be an integer or a float or any other numerical type.

A matrix (plural matrices), is a little more nuanced (but not a lot):

In [2]:
basic_matrix = tf.constant([[1, 2, 3, 4], [5, 6, 7, 8]])
basic_matrix

<tf.Tensor: shape=(2, 4), dtype=int32, numpy=
array([[1, 2, 3, 4],
       [5, 6, 7, 8]], dtype=int32)>

With our programmer's hat on we may say we have built a list of 2x lists (with the square brackets). However, by virtue of the fact each list is of the same length, we have effectively built a two-dimensional table (as we would build a DataFrame) which in this case has two rows and four columns. Often you would here this described as an $M$x$N$ matrix (where $M$ is the number of rows and $N$ the number of columns). Confusingly you'll sometimes see it described as an $N$x$M$ matrix where $N$ is rows and $M$ columns ... TL;DR its always rows by columns. 

Also confusing sometimes, but we see that this TensorFlow (hereafter TF) object is stored as an array (although declared as a _constant_ - which just means this is a value we expect to remain the same). An array is a more flexible object of which a matrix is a subset ... as is a vector: 

In [3]:
eg_vector = tf.constant([1, 2, 3, 4])
eg_vector

<tf.Tensor: shape=(4,), dtype=int32, numpy=array([1, 2, 3, 4], dtype=int32)>

Effectively a vector is a $M$x$1$ matrix (a single column). Again, this is slightly confusing when programmed as we code it horizontally although conceptually we would consider it as a vertical slice.

Our final type for this Notebook is the tensor ... which have been popularised (in ML) by deep learning and tools like TensorFlow. Actually everything we have produced so far is a tensor as we can see in the outputs:

In [4]:
print(my_scalar)
print(basic_matrix)
print(eg_vector)

tf.Tensor(13, shape=(), dtype=int32)
tf.Tensor(
[[1 2 3 4]
 [5 6 7 8]], shape=(2, 4), dtype=int32)
tf.Tensor([1 2 3 4], shape=(4,), dtype=int32)


As we can see a tensor acts as basically a container for our other numeric types. We can see "my_scalar" returns a tensor with no shape (basically our single value - 13); "basic_matrix" contains our matrix which is _shape(2, 4)_ (2x rows, 4x columns); and "eg_vector" conains the vector of _shape(4, 0) (4x rows, 1x column).

We often refer to these different shapes as rank-$n$ tensors, such that:
* A rank-0 tensor stores a scalar (e.g. "my_scalar")
* A rank-1 tensor stores a vector
* A rank-2 tensor stores a 2D matrix (e.g, a standard DataFrame)
* A rank-3 tensor has three dimensions (e.g. a digital image stored in RGB format - rows, columns and a colour dimensions)
* A rank-4 tensor adds a fourth dimension - e.g. a batch of rank-3 images.

So what do we gain from putting our objects in tensors? We could go into a long discussion into what a tensor really is, from a mathematical and/or physics sense, but in practice we just care about two things:
1. Tensors are a slightly more efficient and when we work at scale (and deep learning loves big datasets) small efficiencies can make a big difference. In particular, compared to something like _numpy_, tensors can be used more easily with GPUs;
2. Tensors can be more connected into a system and can change their values when other values in the system change. In deep learning, this means keeping track of gradients and compuatational graphs (which we'll discuss in the module).

## Matrix Algebra
Matrix algebra is a big topic, and we don't need to go too far down the rabbit hole. However, there are some key topics that underpin a lot of deep learning (and ML for that matter) which, while more a backend operation than a frontend (i.e. you don't typically need to do the calculations yourself), it helps understand how these algorithms work.

Our first topic will be multipling matrices. There are two main ways we can do this - _element\-wise_ and _matrix_ multiplication.<br><br>


### Element-wise Multiplication
Element-wise is probably the more obvious. It depends on both elements being of the same size. Let's look at some examples (using _tf.multiply_):

In [5]:
matrix_one = tf.constant([1, 2, 3, 4])
matrix_two = tf.constant([5, 6, 7, 8])
ew_matrix = tf.multiply(matrix_one, matrix_two)
ew_matrix

<tf.Tensor: shape=(4,), dtype=int32, numpy=array([ 5, 12, 21, 32], dtype=int32)>

In [6]:
matrix_three = tf.constant([[1, 2, 3, 4], [5, 6, 7, 8]])
matrix_four = tf.constant([[8, 7, 6, 5], [4, 3, 2, 1]])
ew_matrix_two = tf.multiply(matrix_three, matrix_four)
ew_matrix_two

<tf.Tensor: shape=(2, 4), dtype=int32, numpy=
array([[ 8, 14, 18, 20],
       [20, 18, 14,  8]], dtype=int32)>

As we can see, effectively we for loop for each list and multiply each item with its item in the corresponding list. So the 1st item of the 1st list (1) is multiplied with the 1st item of the 2nd list (5) and this produces the first item of the output (5). In the two row version we effectively multiple the top left item with the bottom left item, and so on.

### Matrix by Vector Multiplication
Although different sizes, matrix by vector multiplication is always element-wise. The size of the matrix is the size of the output:

In [7]:
matrix_one = tf.constant([1, 2, 3, 4])
matrix_four = tf.constant([[8, 7, 6, 5], [4, 3, 2, 1]])
ew_matrix_three = tf.multiply(matrix_one, matrix_four)
ew_matrix_three

<tf.Tensor: shape=(2, 4), dtype=int32, numpy=
array([[ 8, 14, 18, 20],
       [ 4,  6,  6,  4]], dtype=int32)>

### Matrix by Scalar Multiplication
Similarly, multiplying by a scalar is element-wise:

In [8]:
another_scalar = 10
matrix_four = tf.constant([[8, 7, 6, 5], [4, 3, 2, 1]])
ew_matrix_four = tf.multiply(another_scalar, matrix_four)
ew_matrix_four

<tf.Tensor: shape=(2, 4), dtype=int32, numpy=
array([[80, 70, 60, 50],
       [40, 30, 20, 10]], dtype=int32)>

### Matrix Multiplication
Matrix multiplication is more flexible than element-wise in that it doesn't require the matrices to be of the same size. However, it is slightly less obvious how it works. Again, we'll look at a couple of examples (using _tf.matmul_ ... as in __mat__rix __mul__tplication):

In [9]:
matrix_five = tf.constant([[1, 2], [3, 4], [5, 6]])
matrix_six = tf.constant([[100], [200]])
matmul_matrix = tf.matmul(matrix_five, matrix_six)
matmul_matrix

<tf.Tensor: shape=(3, 1), dtype=int32, numpy=
array([[ 500],
       [1100],
       [1700]], dtype=int32)>

This may be a little less obvious so let's go through the math. Our output is a 3x row vector so lets see the maths of each row:
* $1 \times 100 + 2 \times 200 = 100 + 400 = 500$
* $3 \times 100 + 4 \times 200 = 300 + 800 = 1100$
* $5 \times 100 + 6 \times 200 = 500 + 1200 = 1700$

In [10]:
matrix_seven = tf.constant([[1, 2, 3] , [4, 5, 6]])
matrix_eight = tf.constant([[100, 200], [300, 400], [500, 600]])
matmul_matrix_two = tf.matmul(matrix_seven, matrix_eight)
matmul_matrix_two

<tf.Tensor: shape=(2, 2), dtype=int32, numpy=
array([[2200, 2800],
       [4900, 6400]], dtype=int32)>

Effectively here we have a $3\times2$ matrix and a $2\times3$ matrix. When we multiply the two together we effectively take the first row of the first matrix and multiply it by the first column of the second to form the first value in our output; then the first row of the first matrix by the second column of the second to form the second output, and so on. Again, in the form of our $2\times2$ output, the maths is:<br><br>
$1\times100 + 2\times300 + 3\times500 = 100 + 600 + 1500 = 2200$
<br>
$1\times200 + 2\times400 + 3\times600 = 200 + 800 + 1800 = 2800$
<br>
$4\times100 + 5\times300 + 6\times500 = 400 + 1500 + 3000 = 4900$
<br>
$4\times200 + 5\times400 + 6\times600 = 800 + 2000 + 3600 = 6400$

### Vector Dot Product
The dot product of two vectors is an equivalent calculation to matrix multiplication via _matmul_. However, given that we are working with vectors we end up with a single number (a scalar). For example: 

In [11]:
vector_one = tf.constant([1, 2, 3, 4])
vector_two = tf.constant([8, 7, 6, 5])
# axes=1 to say we calculate each item with its corresponiding item
vector_dot_product = tf.linalg.tensordot(vector_one, vector_two, axes=1)
vector_dot_product

<tf.Tensor: shape=(), dtype=int32, numpy=60>

Let's check the math again:<br><br>
$ 1 \times 8 + 2 \times 7 + 3 \times 6 + 4 \times 5 = 8 + 14 + 18 + 20 = 60$

### Matrix Addition and Reduce-Sum
Matrix addition works as you may expect, but does rely on equal size matrices. Let's see an example again:

In [12]:
matrix_nine = tf.constant([1, 2, 3])
matrix_ten = tf.constant([6, 5, 4])
matrix_addition = tf.add(matrix_nine, matrix_ten)
matrix_addition

<tf.Tensor: shape=(3,), dtype=int32, numpy=array([7, 7, 7], dtype=int32)>

Ultimately this is just an element-wise addition. E.g.
<br><br>
$ 1 + 6 = 7$
<br>
$ 2 + 5 = 7$
<br>
$ 3 + 4 = 7$

Another related concept is _reduce-sum_, which is related to a MapReduce approach. Nothing like an example amiright?

In [13]:
matrix_eleven = tf.Variable([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
matrix_sr = tf.reduce_sum(matrix_eleven)
matrix_sr

<tf.Tensor: shape=(), dtype=int32, numpy=45>

Essentially the function has reduced our matrix to a single value - by simply summing up all of the nine values. 

As an additional note, through to this point we have used _tf.constant_ to store our data, whereas here we use _tf.Variable_. The difference between the two, as the name suggests, is that we expect constants to remain the same and variables to change. Although we create a new variable here ("matrix_str") it is created as a transformation of our original variable rather than a combination of (constant) variables. In machine learning and deep learning, where we are updating parameter values during the training process, _tf.Variable_ is a useful concept.

We can also sum by columns or rows: 

In [14]:
matrix_sr_cols = tf.reduce_sum(matrix_eleven, 0)
matrix_sr_cols

<tf.Tensor: shape=(3,), dtype=int32, numpy=array([12, 15, 18], dtype=int32)>

In [15]:
matrix_sr_rows = tf.reduce_sum(matrix_eleven, 1)
matrix_sr_rows

<tf.Tensor: shape=(3,), dtype=int32, numpy=array([ 6, 15, 24], dtype=int32)>

Now we have an output of _shape=(3,)_ in each case. In the first ("matrix_sr_cols") the calculations is:
<br><br>
$1+4+7=12 $
<br>
$2+5+8=15 $
<br>
$3+6+9=18 $
<br><br>

In the case of "matrix_sr_rows", we do each row (each list in the list of lists):
<br><br>
$1+2+3=6 $
<br>
$4+5+6=15 $
<br>
$7+8+9=24 $

## Identity Matrices and Diagonal Matrices
An identity matrix is a matrix that if multiplied by another will return that matrix. Its the equivalent of multiplication of 1 if we are dealing with scalars/single values. I.e. $x \times 1 = x$ irrespective of what value $x$ takes. In practive this means a matrix filled with zeros except for ones on the diagonal. As an example:

In [16]:
identity_matrix = tf.eye(3, dtype=tf.dtypes.int32)
identity_matrix

<tf.Tensor: shape=(3, 3), dtype=int32, numpy=
array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 1]], dtype=int32)>

Let's confirm this is indeed an indentity matrix:

In [17]:
test_matrix = tf.constant([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
output_matrix = tf.matmul(test_matrix, identity_matrix)
output_matrix

<tf.Tensor: shape=(3, 3), dtype=int32, numpy=
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]], dtype=int32)>

(Note, we needed to confirm a datatype in _tf.eye_ of "int32" for this to work. By default _tf.eye_ produces floats and _tf.matmul_ requires both matrices to be the same data type ... we are a bit more particular in TF than we often are in vanilla Python).

An identity matrix is a special case of diagonal matrices, which come up regularly in other settings as well. A diagonal matrix is any matrix that is all zeros except for on its diagonal (in an identity matrix recall the diagonal is filled with ones). As example:

In [18]:
diagonal = np.array([1, 2, 3, 4])
diagonal_matrix = tf.linalg.diag(diagonal)
diagonal_matrix

<tf.Tensor: shape=(4, 4), dtype=int64, numpy=
array([[1, 0, 0, 0],
       [0, 2, 0, 0],
       [0, 0, 3, 0],
       [0, 0, 0, 4]])>

We can calculate the sum of the diagonal use TF's _trace_ function:

In [19]:
diagonal_matrix_trace = tf.linalg.trace(diagonal_matrix)
diagonal_matrix_trace

<tf.Tensor: shape=(), dtype=int64, numpy=10>

Quick math check:
<br><br>
$1+2+3+4=10$

## Inverse Matrices
An inverse matrix is a pair of matrices (let say $a$ and $b$) where multipling the first ($a$) by its inverse matrix ($b$) results in an identity matrix. Let's see this in action:

In [20]:
a_matrix = tf.constant([[1, 2, 1], [4, 4, 5], [6, 7, 7]])
print("A matrix")
print(a_matrix.numpy())
print("\n")
b_matrix = tf.constant([[-7, -7, 6], [2, 1, -1], [4, 5, -4]])
print("B matrix")
print(b_matrix.numpy())
print("\n")
inverse_it = tf.matmul(a_matrix, b_matrix)
print("Inverse a -> b")
print(inverse_it.numpy())
print("\n")
inverse_it_again = tf.matmul(b_matrix, a_matrix)
print("Inverse b -> a")
print(inverse_it_again.numpy())
print("\n")

A matrix
[[1 2 1]
 [4 4 5]
 [6 7 7]]


B matrix
[[-7 -7  6]
 [ 2  1 -1]
 [ 4  5 -4]]


Inverse a -> b
[[1 0 0]
 [0 1 0]
 [0 0 1]]


Inverse b -> a
[[1 0 0]
 [0 1 0]
 [0 0 1]]




## Transpose and Orthagonal Matrices
We are familiar with the idea of transposing from our work with _pandas_ DataFrames. Basically a transpose flips a matrix so the columns become rows and vice versa. Let's visualise this:

In [21]:
pre_transpose_matrix = tf.Variable([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print("Original")
# uses .numpy() to print just the data
print(pre_transpose_matrix.numpy())
transpose_matrix = tf.transpose(pre_transpose_matrix)
print("\n")
print("Transposed!")
print(transpose_matrix.numpy())

Original
[[1 2 3]
 [4 5 6]
 [7 8 9]]


Transposed!
[[1 4 7]
 [2 5 8]
 [3 6 9]]


An orthagonal matrix is any matrix which remains the same when transposed. As an example:

In [22]:
pre_transpose_matrix_two = tf.Variable([[1, 2, 3], [2, 0, 2], [3, 2, 1]])
print("Original")
# uses .numpy() to print just the data
print(pre_transpose_matrix_two.numpy())
transpose_matrix_two = tf.transpose(pre_transpose_matrix_two)
print("\n")
print("Transposed!")
print(transpose_matrix_two.numpy())

Original
[[1 2 3]
 [2 0 2]
 [3 2 1]]


Transposed!
[[1 2 3]
 [2 0 2]
 [3 2 1]]


## Argmax Operations
Argmax means to find the maximum value in a set of potential values. This could be the posterior distribution of a Bayesian function or the output of a function(s) applied to a specific dataset. E.g. we may have a function $y = 10x^2 - 2x^3$ (where $x$ is a positive integer). The maximum value of $y$ increases while $x$ is less than nine, but decrease from nine onwards (check it out in Excel!). Therefore, argmax will tells us that we achieve the maximum value of $y$ when $x=8$.

In terms of vectors and matrices, we can use argmax to find the location of the maximum value:

In [23]:
print("Argmax of a vector")
another_vector = tf.Variable([4, 12, 42, 5])
vector_arg_max = tf.argmax(another_vector)
print(vector_arg_max)
print("\n")

print("Argmax of a matrix")
another_matrix = tf.Variable([[1, 12, 5], [6, 5, 42]])
matrix_arg_max = tf.argmax(another_matrix, axis=0)
print(matrix_arg_max)
print("\n")

Argmax of a vector
tf.Tensor(2, shape=(), dtype=int64)


Argmax of a matrix
tf.Tensor([1 0 1], shape=(3,), dtype=int64)




In the first case ("vector_arg_max") our algorithm finds the maximum value as 42 and returns the index of this item (remember we count from 0 ... so therefore it is 2).

In the second case we compare both rows and return the index of which is highest (0 if the top row and 1 if the bottom). In other words, our output is ["bottom row", "top row", "bottom row"].

## Derivatives
The essence of a derivative (a key concept of differential calculus) is to evaluate a function at some given point, and calculate the current rate of change. In a purely linear function the rate of change will be the same at any point ... i.e. in the function $y = \beta x$ the rate of chfange associated with $x$ is $\beta$ at any point. However, in a non-linear function we need to work a bit harder to get this rate of change.

More formally we want to know the rate of change in $y$ (written as $\Delta y$) as a ratio to the rate of change in $x$ (again ... $\Delta x$). Fortunately there are lots of rules (it is mathematics after all) to calcualte this.


### Power Rule
One key concept you have likely seen in a calculus class somewhere is the _power rule_ for calculating the derivative of a function applied to a single variable (e.g. $x$). The power rule states:<br><br>
$f(x) = x^n \rightarrow f'(x) = nx^{n-1}$
<br>(i.e. we calculate the derivative ($f'$) of the function ($x^n$) by calculating $ nx^{n-1}$).

Let's see an example using the Python package _sympy_ for pretty outputs (inspired by Dario Radečić's [Medium post](https://towardsdatascience.com/taking-derivatives-in-python-d6229ba72c64)):

In [24]:
x = sym.Symbol('x')

# differentiate a function that is x^n and n=4 ... i.e. the function is x^4
sym.diff(x**4)

4*x**3

Python tells us the answer is $4x^3$ but let's be sure by doing the math:
* $ f(x) = x^4 $
* $ f'(x) = 4x^{4-1} = 4x^{3}$

Well done Python. Sorry I doubted you.

### The Product Rule
The product rule applies when we want to calculate the product (multiplication) of two functions. For example we may the following function:<br><br>
$ F(x) = f(x) \times g(x) $
<br><br>
In such cases we can use the product rule defined as:<br><br>
$ F(x) = f(x) \times g'(x) + f'(x) \times g(x) $<br><br>
I.e. we multiply each function with the derivate of the other and add these together. 

We can see it in action. Given:<br><br>
$ F(x) = f(x) \times g(x) $<br>
$ f(x) = x^3 $<br>
$ g(x) = x^5 $
<br><br>
We can calculate the derivative of each as above:<br><br>
$ f'(x) = 3x^{2-1} = 3x^{1} = 3x$ <br>
$ g'(x) = 5x^{5-1} = 5x^{4} $
<br><br>
We then need to multiply each together and add them:<br><br>
$ F'(x) = f(x) \times g'(x) + f'(x) \times g(x) $<br>
$ F'(x) = x^3 \times 5x^{4} + 3x \times x^5  = 5x^7 + 3x^5 = 8x^7$
<br><br>
Let's verify this in sympy:



In [25]:
sym.diff(x**3 * x**5)

8*x**7

### The Chain Rule
So far we've seen single functions (via the power rule) and multiplicative functions (via the product rule) ... now we will look at functions inside functions (i.e. nested functions) via the _chain rule_. The chain rule gets super-relevant to deep learning as ultimately the very nature of having multiple hidden layers means we have functions inside functions. 

Consider the following function:<br><br>
$ F(x) = (x^3 - 2x + 4)^3 $
<br><br>
We have an ($x^3 - 2x + 4)$ as an innner function and an outer function that raises the inner function to the power 3. The chain rule says:<br><br>
$ F(x) = f(g(x)) \rightarrow F'(x) = f'(g(x)) \times g'(x)$ 
<br><br>
In other words we reach the overall derivative by taking the derivative of the outer function multiplied by the inner function (kind of like the first half of the product rule), multiplied by the derivative of the inner function. It is almost certainly clearer if we look at the math on our earlier example function.<br><br>
$ F(x) = (x^3 - 2x + 4)^3 $<br><br>
$ F'(x) = 3(x^3 - 2x + 4)^{3-1} \times 3x^2 - 2x^{1-1}$<br>
_A note on the calculation of the inner function here. We can consider $2x$ as effectively $2x^1$ in terms of doing our power rule calculations. This means we end at the power $1-1$ and the $x$ will be cancelled out. We also ignore constants_ ($+4$) _when calculating a derivative. Note over, let's do some simplifications_ <br><br>
$F'(x) = 3(x^3 - 2x + 4)^{2} \times 3x^2 - 2$
<br><br>
$F'(x) = (9x^2 -6) \times (x^3 - 2x + 4)^{2} $
<br><br>

In [26]:
sym.diff((x**3 - 2 * x + 4)**3)

(9*x**2 - 6)*(x**3 - 2*x + 4)**2

### Partial Derivatives
The chain rule gets us a big chunk of the way towards what we need in terms of using derivatives to understand parameter optimisation in deep learning (which will be explained in the module don't worry), except for one thing. Everything we've looked at so far has looked at changes in $y$ with respect to a single $x$. We theoretically could have just one feature, but in ML/AI practice that is extremely unlikely. If we have multiple features ($x$'s), which we will, we need _partial derivatives_.

Partial derivatives deal with multi-variable functions by applying the usual rules we've just seen to a single variable in the function and effectively freezing the others (keeping them constant). We can follow this process for each of the variable in the function.

Again let's make up a function to work with:<br><br>
$ f(x_{1}, x_{2}, x_{3}) = {x_{1}}^2 \times x_{2} \times {x_{3}}^4 $
<br><br>
As above, the derivatives we seek will be partial ... i.e. we will find the derivative of $x_1$ independently of the other $x$'s. In our earlier discussion we were finding the ration between $\Delta y$ and $\Delta x$. Given this is a subset of the overall problem we use the lower case version of delta to show the partial ... so the notation would be $\delta_{x_{1}}$ (double subscripts is a bit yucky in LaTex ... sorry). Let's look at the partial derivative of $X_{3}$:
<br><br>
$ \delta_{x_{3}} = 4 \times ({x_{1}}^2 \times x_{2} \times {x_{3}}^3)$
<br><br>
Essentially we are doing normal power rule stuff here. ${x_{3}}^4$ becomes $ 4 \times {x_{3}}^3 $. The only difference is we keep the rest of the formula in and as-is. For completion, we can write out all three partial derivatives (but without discussion of the calculations - its the same as we've already seen):<br><br>
$ f(x_{1}, x_{2}, x_{3}) = {x_{1}}^2 \times x_{2} \times {x_{3}}^4 $
<br>
$ \delta_{x_{1}} = 2 \times x_{1} \times x_{2} \times {x_{3}}^4 $
<br>
$ \delta_{x_{2}} = {x_{1}}^2 \times {x_{3}}^4$
<br>
$ \delta_{x_{3}} = 4 \times ({x_{1}}^2 \times x_{2} \times {x_{3}}^3)$
<br><br>
Let's verify in sympy, and to make things easier we'll also write out the function rather than typing it each time:


In [27]:
x1, x2, x3 = sym.symbols('x1 x2 x3')
f = x1**2 * x2 * x3**4

print("Delta for x1")
print(sym.diff(f, x1))
print("\n")

print("Delta for x2")
print(sym.diff(f, x2))
print("\n")

print("Delta for x3")
print(sym.diff(f, x3))

Delta for x1
2*x1*x2*x3**4


Delta for x2
x1**2*x3**4


Delta for x3
4*x1**2*x2*x3**3


Everything checks out! And the good news is now the math is over - we will touch on how these concepts fit into the deep learning process in class ... but these aren't caculations we'll have to make ourselves. However, it is useful to get some understanding on what's happening under the tin of tools like TF and Keras. 

See you on the module!