# Chapter 8: Gradients, Partial Derivatives, and the Chain Rule

### Partial Derivatives
- the derivatives we’ve solved thus far have been cases where there is only one independent variable ($x$)
- our neural network's loss function, however, has many independent variables 
- hence, we will be using **Partial Derivatives**
- the partial derivative measures how much impact a single variable has on a function’s output
- the method for calculating a partial derivative is the same as earlier; we simply have to repeat this process for each of the independent variables
---
- here are some examples: 
- $f(x, y) = 2x + 3y^2 = 6y$
- $f(x, y) = 3x^3 - y^2 + 5x +2 = -2y$
- $f(x, y, z) = 3x^3z - y^2 + 5z +2yz = 3x^3 + 5 + 2y$
- $f(x, y) = x ⋅ y = 1$ (input variables are multipled instead of summed)

### Max Function
- the next type of derivative is the derivative of the `max()` function:
- $f(x, y) = max(x, y) = 1(x >= y)$
- the `max()` function returns whatever input is biggest
- one special case for the derivative of the `max()` function is when we have only one variable parameter while the other parameter is always 0
---
- with that being said, this will be useful when we calculate the derivative of the ReLU activation function as said function is defined as:
- $max(x, 0).f(x) = max(x, 0) = 1(x >= 0)$
- in this case, the partial derivative is 1 when $x$ is greater than 0 (otherwise it's 0)


### Building the Gradient
- when we solve these partial derivatives, we can build the **gradient**
- the gradient is a vector comprised of the partial derivatives calculated with respect to each input variable
---
- let’s return to the partial derivatives of the sum operation that we calculated earlier:
- $f(x, y, z) = 3x^3z - y^2 + 5z +2yz = 3x^3 + 5 + 2y$
- we can now denote the gradient as: 
- $f(x, y, z) = [-9x^2z]$ (d/dx)
- $f(x, y, z) = [-2y + 2z$] (d/dy)
- $f(x, y, z) = [3x^3 + 5 + 2y]$ (d/dz)
---
- recall that these partial derivatives tell us the impact of each variable on a function
- in our case, this is the impact of weights and biases on the loss function
- **our goal is to decrease the loss**, so we need to raise the weights and biases that decrease the output of the loss function and lower the weights and biases that increase the output of the loss function
- we now know that we can calculate each weight and bias’ impact on the loss function
---
- **the gradient points in the direction that maximizes the function the most so the opposite direction minimizes it the most**
- we want to adjust the model parameters by small increments at a time because the impact of inputs in these functions will change non-linearly as we change their respective values
---
- the **learning rate** is a value that affects how quickly parameters in neural networks are adjusted
---
- we can solve the derivatives of more complex functions using the **chain rule**