# Calculus

## Introduction

You need to know some basic calculus in order to understand how functions change over time (derivatives), and to calculate the total amount of a quantity that accumulates over a time period (integrals). The language of calculus will allow you to speak precisely about the properties of functions and better understand their behaviour.

Normally taking a calculus course involves doing lots of tedious calculations by hand, but having the power of computers on your side can make the process much more fun. This section describes the key ideas of calculus which you'll need to know to understand machine learning concepts.


## Derivatives

A derivative can be defined in two ways:

1. Instantaneous rate of change (Physics)
2. Slope of a line at a specific point (Geometry)

Both represent the same principle, but for our purposes it's easier to explain using the geometric definition.



### Geometric definition

In geometry slope represents the steepness of a line. It answers the question: how much does or change given a specific change in ?

![slope_formula.png](./assets/slopeFormula.png)

Using this definition we can easily calculate the slope between two points. But what if I asked you, instead of the slope between two points, what is the slope at a single point on the line? In this case there isn't any obvious "rise-over-run" to calculate. Derivatives help us answer this question.

A derivative outputs an expression we can use to calculate the _instantaneous rate of change_, or slope, at a single point on a line. After solving for the derivative you can use it to calculate the slope at every other point on the line.



### Taking the derivative

Consider the graph below, where .

![calculus_slope_intro.png](./assets/takingDerivative.png)

The slope between (1,4) and (3,12) would be:

\begin{equation*}
slope = \frac {y2-y1}{x2-x1} = \frac{12-4}{3-1} = 4
\end{equation*}


But how do we calculate the slope at point (1,4) to reveal the change in slope at that specific point?

One way would be to find the two nearest points, calculate their slopes relative to and take the average. But calculus provides an easier, more precise way: compute the derivative. Computing the derivative of a function is essentially the same as our original proposal, but instead of finding the two closest points, we make up an imaginary point an infinitesimally small distance away from and compute the slope between and the new point.

In this way, derivatives help us answer the question: how does change if we make a very very tiny increase to x? In other words, derivatives help _estimate_ the slope between two points that are an infinitesimally small distance away from each other. A very, very, very small distance, but large enough to calculate the slope.

In math language we represent this infinitesimally small increase using a limit. A limit is defined as the output value a function approaches as the input value approaches another value. In our case the target value is the specific point at which we want to calculate slope.



### Step-by-step

Calculating the derivative is the same as calculating normal slope, however in this case we calculate the slope between our point and a point infinitesimally close to it. We use the variable to represent this infinitesimally distance. Here are the steps:

* Given the function:

\begin{equation*}
f(x) = x^2
\end{equation*}

* Increment x by a very small value $h(h = \Delta x)$

\begin{equation*}
f(x+h) = (x+h)^2
\end{equation*}

* Apply the slope formula

\begin{equation*}
\frac{f(x+h)-f(x)}{h}
\end{equation*}

* Simplify the equation

\begin{equation*}
\frac{x^2 + 2xh + h^2 -x^2}{h}
\end{equation*}

\begin{equation*}
\frac{2xh + h^2}{h} = 2x+h
\end{equation*}

* Set h to 0 (the limit as h heads towards 0)

\begin{equation*}
2x+0 = 2x
\end{equation*}

So what does this mean? It means for the function $f(x)=x^2$, the slope at any point equals to $2x$. <br>
The formula is defined as:

\begin{equation*}
\lim_{h \to 0} \frac{f(x+h) - f(x)}{h}
\end{equation*}






In [1]:
'''Code

Let's write code to calculate the derivative of any function f(x). We test our function works as expected on the input f(x)=x^2 producing a value close to the actual derivative 2x.
In general it's preferable to use the math to obtain exact derivative formulas, but keep in mind you can always compute derivatives numerically by computing the rise-over-run for a "small step" .

'''
def get_derivative(func, x):
    """Compute the derivative of `func` at the location `x`."""
    h = 0.0001                          # step size
    return (func(x+h) - func(x)) / h    # rise-over-run

def f(x): return x**2                   # some test function f(x)=x^2
x = 3                                   # the location of interest
computed = get_derivative(f, x)
actual = 2*x

print(computed, actual)   # = 6.0001, 6        # pretty close if you ask me...




6.000100000012054 6


### Machine learning use cases

Machine learning uses derivatives in optimization problems. Optimization algorithms like _gradient descent_ use derivatives to decide whether to increase or decrease weights in order to maximize or minimize some objective (e.g. a model's accuracy or error functions). Derivatives also help us approximate nonlinear functions as linear functions (tangent lines), which have constant slopes. With a constant slope we can decide whether to move up or down the slope (increase or decrease our weights) to get closer to the target value (class label).



## Chain rule

The chain rule is a formula for calculating the derivatives of composite functions. Composite functions are functions composed of functions inside other function(s).



### How It Works

Given a composite function $f(x)=A(B(x))$, the derivative of $f(x)$ equals the product of the derivative of $A$ with respect to $B(x)$ and the derivative of $B$ with respect to $x$.

\begin{equation*}
compositeFunctionDerivative = outerFunctionDerivative∗innerFunctionDerivative
\end{equation*}

For example, given a composite function $f(x)$, where:

\begin{equation*}
f(x) = h(g(x))
\end{equation*}

The chain rule tells us that the derivative of $f(x)$ equals:

\begin{equation*}
\frac {df}{dx} = \frac {dh}{dg} . \frac {dg}{dx}
\end{equation*}


**Step By Step**<br>

Say $f(x)$ is composed of two functions $h(x)=x^3$ and $g(x)=x^2$. and that:

\begin{equation*}
f(x)=h(g(x))=(x^2)^3
\end{equation*}

The derivative of $f(x)$ would equal:

\begin{equation*}
\frac {df}{dx} = \frac {dh}{dg} . \frac {dg}{dx} = \frac {dh}{d(x^2)} . \frac {dg}{dx}
\end{equation*}

**Steps**
* Solve for the inner derivative of $g(x)=x^2$
* Solve for the outer derivative of $h(x)=x^3$, using a placeholder $b$ to represent the inner function $x^2$
\begin{equation*}
\frac {dh}{db} = 3b^2
\end{equation*}
* Swap out the placeholder variable $(b)$ for the inner function $(g(x))$
\begin{equation*}
3(x^2)^2
\end{equation*}

\begin{equation*}
3x^4
\end{equation*}
* Return the product of the two derivatives
\begin{equation*}
3x^4.2x = 6x^5
\end{equation*}

**Multiple Functions**<br>
In the above example we assumed a composite function containing a single inner function. But the chain rule can also be applied to higher-order functions like:<br>

\begin{equation*}
f(x) = A(B(C(x)))
\end{equation*}

The chain rule tells us that the derivative function equals:
\begin{equation*}
\frac {df}{dx} = \frac {dA}{dB} . \frac {dB}{dC} . \frac {dC}{dx}
\end{equation*}
We can also write this derivative equation $f'$ notation:
\begin{equation*}
f'(x) = A'(B(C(x))) . B'(C(x)) . C'(x)
\end{equation*}

**Steps**<br>
Given the function $f(x) = A(B(C(x)))$, lets assume:

\begin{equation*}
A(x) = sin(x) \\
B(x) = x^2 \\
C(x) = 4x \\
\end{equation*}

The derivatives of this functions would be: 

\begin{equation*}
A'(x) = cos(x) \\
B'(x) = 2x \\
C'(x) = 4 \\
\end{equation*}

We can calculate the derivative of $f(x)$ using the following formula:

\begin{equation*}
f'(x) = A'((4x)^2) . B'(4x) . C'(4)
\end{equation*}

we then input the derivatives and simplify the expression:
\begin{equation*}
f'(x) = cos((4x)^2).2(4x).4 \\
      = cos(16x^2).8x.4 \\
      = 32x.cos(16x^2)
\end{equation*}


### Partial derivatives

In functions with 2 or more variables, the partial derivative is the derivative of one variable with respect to the others. If we change , but hold all other variables constant, how does change? That's one partial derivative. The next variable is . If we change but hold constant, how does change? We store partial derivatives in a gradient, which represents the full derivative of the multivariable function.<br>
**Step by Step**<br>
Here are the steps to calculate the gradient for a multivariable function

* given a multivariable function

\begin{equation*}
f(x,z) = 2z^3x^2
\end{equation*}

* calculate the derivate with respect to x

\begin{equation*}
\frac {df}{dx}(x,z)
\end{equation*}

* swap $2z^3$ with a constant value $b$

\begin{equation*}
f(x,z)=bx^2
\end{equation*}

* Calculate the derivative with $b$ constant

\begin{equation*}
\frac {df}{dx} = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h} \\
               = \lim_{h \to 0} \frac{b(x+h)^2 - b(x^2)}{h} \\
               = \lim_{h \to 0} \frac{b(x^2 + 2xh + h^2) - b(x^2)}{h} \\
               = \lim_{h \to 0} \frac{bx^2 + 2bxh + bh^2 - bx^2}{h} \\
               = \lim_{h \to 0} \frac{2bxh + bh^2}{h} \\
               = \lim_{h \to 0} 2bxh + bh 
\end{equation*}


As $h->0....$
\begin{equation*}
       2bx + 0
\end{equation*}


* Swap $2z^3$ back into the equation, to find the derivative with respect to $x$.

\begin{equation*}
\frac {df}{dx}(x,z) = 2(2z^3)x
                    = 4z^3x
\end{equation*}

* Repeat the above steps to calculate the derivative with respect to $z$

\begin{equation*}
\frac {df}{dx}(x,z) = 6x^2z^2
\end{equation*}

* Store the partial derivatives in a gradient

\begin{equation*}
\nabla f(x,z) = \begin{bmatrix} \frac {df}{dx}  \\ \frac {df}{dz} \end{bmatrix} = \begin{bmatrix} 4z^3x \\ 6x^2z^2 \end{bmatrix}
\end{equation*}



##### Examples

![derivatives.png](./assets/derivatives.png)
![derivatives1.png](./assets/derivatives1.png)

#### References

1. [https://en.wikipedia.org/wiki/Derivative](https://en.wikipedia.org/wiki/Derivative)

2. [https://www.khanacademy.org/math/multivariable-calculus/multivariable-derivatives/partial-derivative-and-gradient-articles/a/directional-derivative-introduction](https://www.khanacademy.org/math/multivariable-calculus/multivariable-derivatives/partial-derivative-and-gradient-articles/a/directional-derivative-introduction) 

3. [https://en.wikipedia.org/wiki/Partial_derivative](https://en.wikipedia.org/wiki/Partial_derivative) 

4. [https://en.wikipedia.org/wiki/Gradient](https://en.wikipedia.org/wiki/Gradient) 

5. [https://betterexplained.com/articles/vector-calculus-understanding-the-gradient](https://betterexplained.com/articles/vector-calculus-understanding-the-gradient) 

6. [https://www.mathsisfun.com/calculus/derivatives-introduction.html](https://www.mathsisfun.com/calculus/derivatives-introduction.html) 

7. [http://tutorial.math.lamar.edu/Classes/CalcI/DefnOfDerivative.aspx](http://tutorial.math.lamar.edu/Classes/CalcI/DefnOfDerivative.aspx) 

8. [https://www.khanacademy.org/math/calculus-home/taking-derivatives-calc/chain-rule-calc/v/chain-rule-introduction](https://www.khanacademy.org/math/calculus-home/taking-derivatives-calc/chain-rule-calc/v/chain-rule-introduction) 

9. [http://tutorial.math.lamar.edu/Classes/CalcI/ChainRule.aspx](http://tutorial.math.lamar.edu/Classes/CalcI/ChainRule.aspx) |

10. [https://youtu.be/pHMzNW8Agq4?t=1m5s](https://youtu.be/pHMzNW8Agq4?t=1m5s)

11. [https://en.wikipedia.org/wiki/Dot_product](https://en.wikipedia.org/wiki/Dot_product) 

