# Calculus

For a long time, how to calculate the area of a circle remained a mystery. Then, in Ancient Greece, the mathematician Archimedes came up with the clever idea to inscribe a series of polygons with increasing numbers of vertices on the inside of a circle.

For a polygon with *n* vertices, we obtain *n* triangles. The height of each triangle approaches the radius *r* as we partition the circle more finely. At the same time, its base approaches 2ùúãùëü/ùëõ, since the ratio between arc and secant approaches 1 for a large number of vertices. Thus, the area of the polygon approaches *n* * *r* * 1/2(2ùúãùëü/ùëõ) = ùúãùëü<sup>2</sup>. This can also be described as finding the area of a circle by a limit procedure. 

This limit procedure is at the root of both **differential calculus** and **integral calculus**. The former can tell us how to increase or decrease a function's value by manipulating its arguments. This comes in handy for the *optimization problems* that we face in deep learning, where we repeatedly update our paramters in order to decrease the loss function. Optimization addresses how to fit our models to training data, and calculus is its key prerequisite. However, it's important not to forget that our ultimate goal is to perform well on **previously unseen** data. That problem is called **generalization**. 

In [None]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from matplotlib_inline import backend_inline

## Derivatives and Differentiation

Simply put, a **derivative** is a rate of change in a function with respect to changes in its arguments. Derivatives can tell us how rapidly a loss function would increase or decrease were we to increase or decrease each parameter by an infintesimally small amount. Formally, for functions f: R --> R, that map from scalars to scalars, the derivative of f at a point x is defined as f<sup>'</sup>(x) = lim<sub>h->0</sub> f(x + h) - f(x) / h

This term on the right hand side is called a **limit** and it tells us what happens to the value of an expression as a specified variable approaches a particular value. This limit tells us what the ratio between a pertubation *h* and the change in the function value f(x + h) - f(x) converges to as we shrink its size to zero. 

When f<sup>'</sup>(x) exists, f is said to be **differentiable** at x: and when *f*<sup>'</sup>(x) exists for all x on a set, e.g., the interval [a,b], *we say that *f is differentiable on this set.* Not all functions are differentiable, including many that we wish to optimize, such as accuracy and the area under the receiving operating characteristic (AUC). However, because computing the derivative of the loss is a crucial step in nearly all algorithms for training deep neural networks, we often optimize a **differentiable surrogate** instead. 

We can interpret the derivative f<sup>'</sup>(x) as the **instantaneous rate of change** of f(x) with respect to x. Let's solidify our understanding with an example. Let's define u = f(x) = 3x<sup>2</sup> - 4x.

In [None]:
def f(x):
    return 3 * x ** 2 - 4 * x

Setting x = 1, we see that f(x + h) - f(x) / h approaches 2 as h approaches 0. While this experiment lacks the rigor of a mathematical proof, we can quickly see that indeed f<sup>'</sup>(1) = 2

In [None]:
for h in 10.0**np.arange(-1, -6, -1):
    print(f'h={h:.5f}, numerical limit={(f(1+h)-f(1))/h:.5f}')

There are several equivalent notational conventions for derivatives. Given y = f(x), the following expressions are equivalent: f<sup>'</sup>(x) = y<sup>'</sup> = (dy / dx) = (df / dx) = (d / dx) f(x) = Df(x) = D<sub>x</sub>f(x),

where the symbols d / dx and D are **differentiation operators**. Below, we present the derivatives of some common functions:

(d / dx)C = 0

(d / dx)x<sup>n</sup> = nx<sup>n - 1</sup> for n ‚â† 0

(d / dx)e<sup>x</sup> = e<sup>x</sup>

(d / dx)ln x = x<sup>-1</sup>

Functions composed from differentiable functions are often themselves differentiable. The following rules come in handy for working with compositions of any differetial functions f and g, and constant C. See the **Constant multiple rule**, the **Sum rule**, the **Product rule**, and the **Quotient rule**. 

Using this, we can apply the rules to find the derivative of 3x<sup>2</sup> - 4x via: (d / dx)[3x<sup>2</sup> - 4x] = 3(d / dx)x<sup>2</sup> - 4(d / dx)x = 6x - 4.

Plugging in x = 1 shows that, indeed, the derivative equals 2 at this location. Not that derivatives tell us the **slope** of a function at a particular location. 

## Visualization Utilities

We can visualize the slopes of functions using the `matplotlib` library. We need to define a few functions. As its name indicates, `use_svg_display` tells `matplotlib` to output graphics in SVG format for crisper images. 

In [None]:
def use_svg_display():
    """Set the figure size for matplotlib."""
    use_svg_display()
    plt.rcParams['figure.figsize'] = figsize

Conveniently, we can set figure sizes with `plt.rcParams['figure.figsize'] = figsize` 

In [None]:
def set_axes(axes, xlabel, ylabel, xlim, ylim, xscale, yscale, legend):
    """Set the axes for matplotlib."""
    axes.set_xlabel(xlabel), axes.set_ylabel(ylabel)
    axes.set_xscale(xscale), axes.set_yscale(yscale)
    axes.set_xlim(xlim), axes.set_ylim(ylim)
    if legend:
        axes.legend(legend)
    axes.grid()

The `set_axes` function can associate axes with properties, including labels, ranges, and scales.

With these three functions, we can definite a `plot` function to overlay multiple curves. Much of the code here is just ensuring that the sizes and shapes of the inputs match. 

In [None]:
def plot(X, Y=None, xlabel=None, ylabel=None, legend=[], xlim=None, 
        ylim=None, xscale='linear', yscale='linear', 
        fmts=('-', 'm--', 'g-.', 'r:'), figsize=(3.5, 2.5), axes=None):
    """Plot data points"""
    
    if isinstance(X, np.ndarray) and X.ndim == 1:
        X = [X]
    if Y is None:
        X, Y = [[]] * len(X), X
    elif isinstance(Y, np.ndarray) and Y.ndim == 1:
        Y = [Y]
    if len(X) != len(Y):
        X = X * len(Y)
    
    plt.figure(figsize=figsize)
    if axes is None:
        axes = plt.gca()
    axes.cla()
    for x, y, fmt in zip(X, Y, fmts):
        axes.plot(x, y, fmt) if len(x) else axes.plot(y, fmt)
    axes.set_xlabel(xlabel)
    axes.set_ylabel(ylabel)
    axes.set_xscale(xscale)
    axes.set_yscale(yscale)
    if xlim:
        axes.set_xlim(xlim)
    if ylim:
        axes.set_ylim(ylim)
    if legend:
        axes.legend(legend)
    axes.grid()
    plt.show

In [None]:
x = np.arange(0, 3, 0.1)
y = f(x)
tangent_line = 2 * x - 3

plot(x, [y, tangent_line], xlabel='x', ylabel='f(x)', legend=['f(x)', 'Tangent line (x=1)'])