# Derivatives

We use the sympy library of python to compute the derivatives. Derivatives are integral part of optimization algoritms which helps us finding the optimal weights and bias for the model

The formal definition of derivatives can be a bit daunting with limits and values 'going to zero'. The idea is really much simpler. 

The derivative of a function describes how the output of a function changes when there is a small change in an input variable.

Let's use the cost function $J(w)$ as an example. The cost $J$ is the output and $w$ is the input variable.  
Let's give a 'small change' a name *epsilon* or $\epsilon$. We use these Greek letters because it is traditional in mathematics to use *epsilon*($\epsilon$) or *delta* ($\Delta$) to represent a small value. You can think of it as representing 0.001 or some other small value.  

$$
\begin{equation}
\text{if } w \uparrow \epsilon \text{ causes }J(w) \uparrow \text{by }k \times \epsilon \text{ then}  \\
\frac{\partial J(w)}{\partial w} = k \tag{1}
\end{equation}
$$

This just says if you change the input to the function $J(w)$ by a little bit and the output changes by $k$ times that little bit, then the derivative of $J(w)$ is equal to $k$.

In [3]:
import sympy

In [8]:
j=3**2
j_epsilon = 3.001**2 # small change in j by  0.001

# difference divided by epsilon
t = (j_epsilon - j)/0.001
print(f"j = {j} , j_epsilon = {j_epsilon} \nincrease in j by epsilon increase = {t:0.6f}")

j = 9 , j_epsilon = 9.006001 
increase in j by epsilon increase = 6.001000


We have increased the input value a little bit (0.001), causing the output to change from 9 to 9.006001, an increase of 6 times the input increase. Referencing (1) above, this says that $k=6$, so $\frac{\partial J(w)}{\partial w} \approx 6$. Using calculus,this can be written symbolically,  $\frac{\partial J(w)}{\partial w} = 2 w$. With $w=3$ this is 6. Our calculation above is not exactly 6 because to be exactly correct $\epsilon$ would need to be [infinitesimally small](https://www.dictionary.com/browse/infinitesimally) or really, really small. That is why we use the symbols $\approx$ or ~= rather than =. Let's see what happens if we make $\epsilon$ smaller.

In [9]:
J = (3)**2
J_epsilon = (3 + 0.000000001)**2
k = (J_epsilon - J)/0.000000001
print(f"J = {J}, J_epsilon = {J_epsilon}, dJ_dw ~= k = {k} ")

J = 9, J_epsilon = 9.000000006, dJ_dw ~= k = 6.000000496442226 


The value gets close to exactly 6 as we reduce the size of $\epsilon$. Feel free to try reducing the value further.

## Computing derivatives using sympy

### $J = w^2$
Define the python variables and their symbolic names.

In [11]:
J,w = sympy.symbols('J,w')


In [13]:
J = w**2
J

w**2

Use SymPy's `diff` to differentiate the expression for $J$ with respect to $w$. Note the result matches our earlier example.

In [16]:
dj_dw = sympy.diff(J,w)
dj_dw

2*w

Evaluate the derivative at a few points by 'substituting' numeric values for the symbolic values. In the first example, $w$ is replaced by $2$.

In [21]:
dj_dw.subs([(w,2)]) # derivative at the point w = 2

4

In [22]:
dj_dw.subs([(w,3)]) # derivative at the point w = 3

6

## $J = 2w$

In [23]:
J,w = sympy.symbols('J,w')

In [24]:
J = 2*w
J

2*w

In [26]:
dj_dw = sympy.diff(J,w)
dj_dw

2

In [27]:
dj_dw.subs([(w,2)]) # derivative at the point w = 2

2

In [28]:
dj_dw.subs([(w,3)]) # derivative at the point w = 3

2

In [29]:
J = 2*3
J_epsilon = 2*(3 + 0.001)
k = (J_epsilon - J)/0.001
print(f"J = {J}, J_epsilon = {J_epsilon}, dJ_dw ~= k = {k} ")

J = 6, J_epsilon = 6.002, dJ_dw ~= k = 1.9999999999997797 


For the function $J=2w$, it is easy to see that any change in $w$ will result in 2 times that amount of change in the output $J$, regardless of the starting value of $w$. Our NumPy and arithmetic results confirm this. 

## $J = w^3$

In [32]:
J,w = sympy.symbols('J,w')
J = w**3
J

w**3

In [33]:
dj_dw = sympy.diff(J,w)
dj_dw

3*w**2

In [34]:
dj_dw.subs([(w,2)])

12

In [35]:
J = (2)**3
J_epsilon = (2+0.001)**3
k = (J_epsilon - J)/0.001
print(f"J = {J}, J_epsilon = {J_epsilon}, dJ_dw ~= k = {k} ")

J = 8, J_epsilon = 8.012006000999998, dJ_dw ~= k = 12.006000999997823 


## $J = \frac{1}{w}$

In [39]:
J,w = sympy.symbols('J,w')
J = 1/w
J

1/w

In [41]:
dj_dw = sympy.diff(J,w)
dj_dw

-1/w**2

In [43]:
dj_dw.subs([(w,2)])

-1/4

In [44]:
J = 1/2
J_epsilon = 1/(2+0.001)
k = (J_epsilon - J)/0.001
print(f"J = {J}, J_epsilon = {J_epsilon}, dJ_dw ~= k = {k} ")

J = 0.5, J_epsilon = 0.49975012493753124, dJ_dw ~= k = -0.2498750624687629 


## $J = \frac{1}{w^2}$

In [45]:
J,w = sympy.symbols('J,w')
J= 1/w**2
J

w**(-2)

In [47]:
dj_dw = sympy.diff(J,w)
dj_dw

-2/w**3

In [48]:
dj_dw.subs([(w,2)])

-1/4

In [49]:
J = 1/4**2
J_epsilon = 1/(4+0.001)**2
k = (J_epsilon - J)/0.001
print(f"J = {J}, J_epsilon = {J_epsilon}, dJ_dw ~= k = {k} ")

J = 0.0625, J_epsilon = 0.06246876171484496, dJ_dw ~= k = -0.031238285155041345 
