# What is a derivative? A gradient?
> CogWorks 2018 (Long Nguyen)

This is an informal treatment of basic calculus concepts such as the derivative, the partial derivative and the gradient, concepts that are foundational to understanding gradient descent. 

### Intuitive Definition of a Derivative

Intuitively, the slope of a line measures the steepness of a line. It is a measure of the rate of change of the output y value with respect to the input x value. 


The slope of the line $y=\frac{2}{3}x+1$ is $2/3$. If the input x value changes by 2
 units, the output y value changes by 3 units. 



#### The Ant Analogy: Suppose a small ant is on a line of slope m. The slope is a measure of how difficult for the ant to walk up the line. 

**Example:** Consider the picture below. The slope at A is 2. The slope at B is 1/2. For an ant at A to move 1 unit to the right, it has to move 2 units up. A 1 unit move to the right only requires an ant at B to move 0.5 unit up. 

Thus, it is 4 times more difficult for an ant to move up the green line as it
does to move up the blue line. 



![check out this diagram](pics/c1.png)

#### The derivative is a generalization of a slope of a line. The derivative of a curve measures the steepness or slope of the curve.  The derivative of the function $f(x)$ is denoted by $f'(x)$. If a function has a derivative at a point, it is differential at that point.	

**Example:** Consider the picture below. Using the ant analogy, an ant 
climbing the curve experiences a greater
steepness at point A than at point B. The
derivative at A, therefore, should be greater
than at B. 



![check out this diagram](pics/c1a.png)

#### The derivative is a function at a point is slope of the tangent line to the graph at the point. Intuitively, the tangent line is the “line of sight”.  

**Example:** An ant at point A has a "line of 
sight" given by the tangent line. The
derivative of the function at A is the 
slope of this line. The derivative or slope at A is 4. Thus, $f'(-1)=4$. See the picture below.


![check out this diagram](pics/c2.png)

If we zoom in to the point A, the function flattens out and looks more and more like the tangent line! **The tangent line is a linear approximation to the curve at the point A.** 

**Example:**
A very small ant at point A does not see
the curvature of the function. It thinks it
lives on a line of slope 4. 

This is similar to an ant (or human) on the
spherical earth. It doesn’t see the curvature
of the earth and thinks it lives on a flat plane. See below


![check out this diagram](pics/c3.png)

**Example:**
As the ant walks up the curve starting at A,
the derivative (slope of tangent line at each
point) decreases. The walk gets 
easier as the ant gets closer to the top. In the picture below, the derivatives at A, B, and C are 4, 2, 0, respectively.



![check out this diagram](pics/c4.png)

#### Given a function $f(x)$. We define $f'(x)$ or $\frac{df}{dx}$ to be the derivative function.   

**Example:** Below, the purple graph is $f(x)$
and the black graph is the graph of 
its derivative $f'(x)$. 
The tangent line to $f(x)$ at A has 
slope -1. Thus, the derivative at x=0 is -1. 
Note that $f'(x)$ evaluated at x=0
is also -1, that is, $f'(0)=-1$.

![check out this diagram](pics/c5.png)

### Here are the basic rules of derivatives. 
![image.png](attachment:image.png)

You can think of the symbol $\frac{d}{dx}$ as a function or an operator.
It takes a function as an input and outputs its derivative:

The two derivative rules 

![image.png](attachment:image.png)

show that $\frac{d}{dx}$ is a linear operator. 

In week 1, we saw that the Discrete Fourier Transform was a 
linear operator, that is, the DFT of a sum of two signals is 
the sum of the DFT of the individual signals. 


**Example:** Using the rules above, we can differentiate $f(x)=x^3-x$.


![image.png](attachment:image.png)

**Example:** Using the rules above, we can differentiate $f(x)=2x^3-5x^2+4x-1$.

![image.png](attachment:image.png)

### Partial Derivatives

Now we consider derivatives of functions of several 
variables $f:\mathbb{R}^n\rightarrow\mathbb{R}$.

**The partial derivative of a function $f(x_1,x_2,...x_n)$ with respect to $x_i$ is denoted by $\dfrac{\partial f}{\partial x}(x_1,x_2,...x_n)$ or simply $\dfrac{\partial f}{\partial x}$.**

**This partial derivative is simply the derivative of $f$ with 
respect to $x_i$ treating the remaining variables $x_j\neq x_i$ as 
constants.** Computationally, a partial derivative is exactly the same as
a derivative of a single variable since we are keeping all of the 
remaining variables constant. 


**Example:** Let $f(x,y)=x^2-3y^2+2x^4y^3$. Find $\dfrac{\partial f}{\partial x}$ and $\dfrac{\partial f}{\partial y}$.

![image.png](attachment:image.png)

Similarly, $\dfrac{\partial f}{\partial y}=-6y+6x^4y^2.$



Let’s understand the intuition behind a partial derivative. Let $f(x,y)=-x^2-y^2+4$. Its graph is a surface, a paraboloid. Let $(x_0,y_0)=(1,1)$. Then $f(x_0,y_0)=f(1,1)=2.$ Denote this point on the graph $A=(1,1,2)$. Here's a picture of the graph.


![check out this diagram](pics/c8.png)

The partial derivative of $f$ with respect to $y$ is $\dfrac{\partial f}{\partial y}=-2y$. Evaluating at $x=1, y=1$ yields $\dfrac{\partial f}{\partial y}(1,1)=-2$. We like to understand what this number means. 

By definition, $\dfrac{\partial f}{\partial y}(1,1)$ is the derivative of $f$ with respect to $y$ keeping $x$ constant at $x=1$. On the graph of $f(x,y)$, keeping $x$ constant at $x=1$ means $$f(x,y)=f(1,y)=f(y)=3-y^2.$$ This is now a function of one variable, y, and is a parabola. On the picture below, the parabola is the intersection of the plane $x=1$ and the surface given by $f(x,y)=-x^2-y^2+4$.  
![check out this diagram](pics/c9.png)

Thus, $\dfrac{\partial f}{\partial y}$ is geometrically simply the slope of the tangent line of this parabola. In fact, $$\dfrac{\partial f}{\partial y}=\dfrac{df}{dy}=f'(y)$$
and we can understand partial derivatives the same way we understood single-variable derivatives! See the picture below.
![check out this diagram](pics/c10.png)

Fixing x = 1 and letting y varies traces out the green curve below. Thus the slope of the green tangent line is $\dfrac{\partial f}{\partial y}(1,1)=-2$.                     

Fixing y = 1 and letting x varies traces out the blue curve so that $\dfrac{\partial f}{\partial x}(1,1)=-2$ represents the slope of the blue line. 
                      . 


![check out this diagram](pics/c11.png)

Using the ant analogy, if an ant is walking parallel to the y-z plane in the direction of increasing y and keeping $x=1$ constant, it is walking downhill on the green parabola. The steepness that it experienced at the point A is $\dfrac{\partial f}{\partial y}(1,1)=-2.$ 

Similarly, if the ant rotates $90$ degrees, facing the positive x-direction and starts walking, it is also walking down hill on the blue parabola and the steepness that it experienced is $\dfrac{\partial f}{\partial x}(1,1)=-2$.

### The Gradient

Let $f:\mathbb{R}^n\rightarrow\mathbb{R}$, the gradient of $f$ is is a vector-valued function $\nabla f:\mathbb{R}^n\rightarrow\mathbb{R}^n$ given by 
![image.png](attachment:image.png)

**Example:** Let $f(x,y)=2x^2y+y^3$. The gradient is ![image.png](attachment:image.png)

Intuitively, what is the gradient? It is a generalization of the derivative. The gradient is a vector. It has a magnitude and a direction. 

**Example:** Consider an ant at A.
The ant can move up the hill in
many directions. Which direction
will allow it to get to the top
the fastest? The direction given by 
the gradient. 
The magnitude of the gradient is a
measure of the steepness of going in that direction. In the picture below, if the ant walks in the direction of the gradient, its path will trace out the orange curve.  
![check out this diagram](pics/c12.png)

The slope of the tangent line in the direction of the gradient is given by the magnitude of the gradient. 

Since $f(x,y)=-x^2-y^2+4$ and $\nabla f(1,1)=(-2,-2)$, the gradient's direction is towards the origin and its magnitude $$|\nabla f(1,1)|=\sqrt{4+4}=\sqrt{8}$$ is the slope of the tangent line(black line below) in the direction of the gradient. 
![check out this diagram](pics/c13.png)