## Matrix calculus on Deep learning

For example, the activation of a single computation unit in a neural network is typically calculated using the dot product (from linear algebra) of an edge weight vector w with an input vector x plus a scalar bias (threshold): 
![](https://explained.ai/matrix-calculus/images/eqn-EEDCFA4252D0992243A283CE0EB777A6-depth003.31.svg)
Function  is called the unit's affine function and is followed by a rectified linear unit, which clips negative values to zero: . Such a computational unit is sometimes referred to as an “artificial neuron” and looks like:
<div>
<img src="https://explained.ai/matrix-calculus/images/neuron.png" width="400">
</div>

Neural networks consist of many of these units, organized into multiple collections of neurons called layers. The activation of one layer's units become the input to the next layer's units. The activation of the unit or units in the final layer is called the network output.

Training this neuron means choosing weights w and bias b so that we get the desired output for all N inputs x. To do that, we minimize a loss function that compares the network's final  with the  (desired output of x) for all input x vectors. To minimize the loss, we use some variation on gradient descent, such as plain stochastic gradient descent (SGD), SGD with momentum, or Adam. All of those require the partial derivative (the gradient) of  with respect to the model parameters w and b. Our goal is to gradually tweak w and b so that the overall loss function keeps getting smaller across all x inputs.

But this is just one neuron, and neural networks must train the weights and biases of all neurons in all layers simultaneously. Because there are multiple inputs and (potentially) multiple network outputs, we really need general rules for the derivative of a function with respect to a vector and even rules for the derivative of a vector-valued function with respect to a vector.This field is known as **matrix calculus**.

## Scalar derivative rules

![](https://miro.medium.com/max/1500/1*ZR50K2cDpl1um4S-aOeWQw.png)

Example:
![](https://explained.ai/matrix-calculus/images/blkeqn-A1EC7F214318E08949CC8BFCED138D94.svg)

## Vector calculus

Neural network layers are not single functions of a single parameter, f(x). So, let’s move on to
functions of multiple parameters such as f(x, y). 

For example, what is the derivative of xy (i.e.,
the multiplication of x and y)? 

We compute derivatives with respect to one variable (parameter) at a time, giving us two different **partial derivatives** for this two parameter function (one for x and one for y). 

Instead of using operator d/dx , the partial derivative operator is ∂/∂x

So, ∂/∂x(xy) and ∂/∂y(xy) are the partial derivatives of xy. often, these are just called the partials. 

The partial derivative with respect to x is just the usual scalar derivative, simply treating any other
variable in the equation as a constant.

Consider a function ![](https://explained.ai/matrix-calculus/images/eqn-D6DEAE7E403381C2C425D4B40CCA936E-depth003.25.svg)

The partial derivative with respect to x is ![](https://explained.ai/matrix-calculus/images/eqn-D981BA4BD14AC44C43A4E4E0EC750B4A-depth004.67.svg)

The partial derivative with respect to y treats x like a constant: ![](https://explained.ai/matrix-calculus/images/eqn-55A3A400FAD3326FEF1BB9DDD2658383-depth006.34.svg)

let’s organize them into a horizontal vector. We call this vector the gradient of f(x, y) and write it as:

![](https://explained.ai/matrix-calculus/images/blkeqn-0C95BB61B2BFFB0C2A95A9DC5D8AF44E.svg)

### Derivatives with vectors

#### Vector-by-scalar
The derivative of a vector 
![](https://wikimedia.org/api/rest_v1/media/math/render/svg/598107eef2ea088f6bf06ffced12d581ca330e43)
by a scalar x is written (in numerator layout notation) as
![](https://wikimedia.org/api/rest_v1/media/math/render/svg/94d7a5da91350e1178afb955a08a6317dc8de783)

#### Scalar-by-vector
The derivative of a scalar y by a vector ![](https://wikimedia.org/api/rest_v1/media/math/render/svg/a73d2b3df42b37c7d42eab9dd34070aba9feff1c)
is written (in numerator layout notation) as
![](https://wikimedia.org/api/rest_v1/media/math/render/svg/b02ac6c5072b46943d3406a440659f2b2d1e54cc)

#### Vector-by-vector
The derivative of a vector function ![](https://wikimedia.org/api/rest_v1/media/math/render/svg/598107eef2ea088f6bf06ffced12d581ca330e43)
with respect to an input vector,
![](https://wikimedia.org/api/rest_v1/media/math/render/svg/a73d2b3df42b37c7d42eab9dd34070aba9feff1c)
is written (in numerator layout notation) as
![](https://wikimedia.org/api/rest_v1/media/math/render/svg/43b8c31f61f8b2bbe6f0df4d62012d8ba86ba420)