# Partial Derivatives

You should be familiar with differentiation of a functions with a single variable. Partial differentiation is concerned with functions of more than one variable or variables that are vectors or matrices. The basic rules translate across to partial differentiation - including the important and useful product rule

## 1st Order Derivates of functions of more than one variable

A function such as $f(x) = x^2 + 5x$ has a single indepdent variable, $x$. If we differentiate with respect to $x$, i.e. we take $\frac{df}{dx}$, we get $\frac{df}{dx} = 2x + 5$.

A function such as $f(x, y) = x^{3}y^{2} + x^{2} + y^{2}$ has two indepedent variables, $x$ and $y$. Since we have two indepedent variables, we can differentiate with respect to either one. When we differeniate with respect to a variable, we treat the other variables as though they are constants. For example:

$\frac{\partial f}{\partial x} = 3x^{2}y^{2} + 2x$ and $\frac{\partial f}{\partial x} = 2x^{3}y + 2y$.

Note that instead of writing $\frac{ \partial f}{ \partial x} = f_{x}(x, y)$ and  $\frac{ \partial f}{\partial y} = f_{y}(x, y)$

#### Excercises 
Find the following:
1. $\frac{ \partial f}{\partial x}$ and $\frac{ \partial f}{\partial y}$ of $2x + 3y^{2}$
2. $\frac{ \partial f}{\partial x}$ and $\frac{ \partial f}{\partial y}$ of $\operatorname{ln}(x) + 3y^{2}$
3. $\frac{ \partial f}{\partial x}$ and $\frac{ \partial f}{\partial y}$ of $\operatorname{sin}(xy)$
4. $\frac{ \partial f}{\partial x}$ and $\frac{ \partial f}{\partial y}$ of $e^{xy} + 2x + 4$
5. $\frac{ \partial f}{\partial x}$ and $\frac{ \partial f}{\partial y}$ and $\frac{ \partial f}{\partial z}$ of $x + 2y + zy$

### Vector Calculus

Vector calculus has an intimate relationship with partial derivatives. Consider a function that takes a vector of size $n$ called $x$ and computes the dot product with it and a constant vector $b$
$$f(x) = b^{T}x$$

If we expand this out and label the vector components by their indicies, we get 
$$f(x) = b_{1}x_{1} + b_{2}x_{2} + \ldots + b_{n}x_{n}$$

Note that for any $x_{i}$, $\frac{\partial f}{\partial x_{i}} = b_{i}$

A derivative of $f$ by the vector $x$ as opposed to any component of $x$ will be a vector
$$\nabla_{x}f = \left[\frac{\partial f}{\partial x_{1}},\frac{\partial f}{\partial x_{2}}, \ldots ,\frac{\partial f}{\partial x_{n}} \right]$$
However, since we know that $\frac{\partial f}{\partial x_{i}} = b_{i}$, this means that $\nabla_{x}f = \left[b_1, b_2, \ldots, b_n\right]$. Since $\left[b_1, b_2, \ldots, b_n\right]$ = $b$, this means that $\nabla_{x}f = b$

### Matrix Calculus

Matrix calculus is very similar to vector calculus. To determine the gradient with respect to a matrix is very similar to determining the gradient with respect to a vector. Note, however, that if the result of the function is a vector, then the gradient is a rank 3 tensor. If the output of the function is a scalar, then the gradient is a matrix itself. In this course, we will only consider the scalar case (we use matrix gradients with respect to loss functions that have scalar output), so we will look at such an example now.


Suppose $y = \operatorname{sum}(Ax)$ where $A \in \mathbb{R}^{m \times n} \text{ and } x \in \mathbb{R}^{n}$. Can we compute $\nabla_{A} y$ by expanding out the expression $\operatorname{sum}(Ax)$

$$Ax = \begin{bmatrix}
a_{11}x_{1} &+& a_{12}x_{2} &+& \ldots &+& a_{1n}x_{n} \\ 
a_{21}x_{1} &+& a_{22}x_{2} &+& \ldots &+& a_{2n}x_{n} \\
\vdots  &+&  \vdots  &+&  \ldots  &+&  \vdots \\ 
a_{m1}x_{1}  &+&  a_{m2}x_{2}  &+&  \ldots  &+&  a_{mn}x_{n} \\
\end{bmatrix}$$

$$y = \operatorname{sum}(Ax) = a_{11}x_{1} + a_{12}x_{2} + \ldots + a_{1n}x_{n} + \ldots + a_{mn}x_{n}$$

Now consider $\frac{\partial y}{\partial a_{ij}} = x_{j}$, wrting this in matrix form, we have that $\frac{\partial y}{\partial A} = \nabla_{A}y = (x \otimes \mathbf{1})^{T}$

You can use the [Matrix Calculus Calculator](http://www.matrixcalculus.org) to get a better handle this topic
