# Derivatives and Their Applications in Machine Learning

Just as linear algebra forms the foundation of many machine learning algorithms, derivatives are crucial for understanding how these algorithms learn and optimize. In this notebook, we'll explore various aspects of derivatives and their applications in machine learning and deep learning.

## Topics Covered:
1. Basic derivative functions
2. Nested functions and their derivatives
3. Chain rule implementation
4. Derivatives with multiple inputs
5. Derivatives with multiple vector inputs

Let's dive in!

## 1. Basic Derivative Functions

Derivatives measure the rate of change of a function with respect to its input. In machine learning, we use derivatives to understand how changes in our model's parameters affect its performance, allowing us to optimize these parameters.

Let's start with a basic quadratic function and its derivative:

$$f(x) = x^2$$
$$f'(x) = 2x$$

Here's a visualization of this function and its derivative:

In [None]:
import numpy as np
import matplotlib.pyplot as plt

def f(x):
    return x**2

def df(x):
    return 2*x

x = np.linspace(-5, 5, 100)
y = f(x)
dy = df(x)

plt.figure(figsize=(10, 6))
plt.plot(x, y, label='f(x) = x^2')
plt.plot(x, dy, label="f'(x) = 2x")
plt.legend()
plt.title('Function and its Derivative')
plt.xlabel('x')
plt.ylabel('y')
plt.grid(True)
plt.show()

## 2. Nested Functions and Their Derivatives

In machine learning models, we often deal with nested functions. Understanding how to differentiate these is key to implementing backpropagation in neural networks.

Let's look at a nested function and its derivative:

$$g(x) = \sin(x^2)$$
$$g'(x) = 2x \cos(x^2)$$

Here's a visualization of this nested function and its derivative:

In [None]:
def g(x):
    return np.sin(x**2)

def dg(x):
    return 2*x * np.cos(x**2)

x = np.linspace(-3, 3, 200)
y = g(x)
dy = dg(x)

plt.figure(figsize=(10, 6))
plt.plot(x, y, label='g(x) = sin(x^2)')
plt.plot(x, dy, label="g'(x) = 2x * cos(x^2)")
plt.legend()
plt.title('Nested Function and its Derivative')
plt.xlabel('x')
plt.ylabel('y')
plt.grid(True)
plt.show()

## 3. Chain Rule Implementation

The chain rule is a fundamental concept in calculus that allows us to compute the derivative of composite functions. In the context of neural networks, the chain rule is the key principle behind backpropagation.

The chain rule states that for two functions $f(x)$ and $g(x)$, the derivative of their composition is:

$$(f \circ g)'(x) = f'(g(x)) \cdot g'(x)$$

Let's implement the chain rule for a simple composite function:

$$h(x) = e^{x^3}$$
$$h'(x) = 3x^2 \cdot e^{x^3}$$

Here's a visualization of this composite function and its derivative:

In [None]:
def outer(x):
    return np.exp(x)

def inner(x):
    return x**3

def composite(x):
    return outer(inner(x))

def d_composite(x):
    # Chain rule: d/dx(outer(inner(x))) = d(outer)/d(inner) * d(inner)/dx
    return np.exp(x**3) * 3*x**2

x = np.linspace(0, 2, 100)
y = composite(x)
dy = d_composite(x)

plt.figure(figsize=(10, 6))
plt.plot(x, y, label='h(x) = exp(x^3)')
plt.plot(x, dy, label="h'(x) = 3x^2 * exp(x^3)")
plt.legend()
plt.title('Composite Function and its Derivative')
plt.xlabel('x')
plt.ylabel('y')
plt.yscale('log')
plt.grid(True)
plt.show()

## 4. Derivatives with Multiple Inputs

In machine learning, we often work with functions that have multiple inputs. The partial derivatives of these functions form the gradient, which is crucial for optimization algorithms.

For a function $f(x, y)$, the gradient is defined as:

$$\nabla f(x, y) = \left(\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}\right)$$

Let's look at a simple function with two inputs and its gradient:

$$f(x, y) = x^2 + y^2$$
$$\nabla f(x, y) = (2x, 2y)$$

Here's a visualization of this function and its gradient:

In [None]:
def f(x, y):
    return x**2 + y**2

def grad_f(x, y):
    return np.array([2*x, 2*y])

x = np.linspace(-2, 2, 20)
y = np.linspace(-2, 2, 20)
X, Y = np.meshgrid(x, y)
Z = f(X, Y)

fig = plt.figure(figsize=(12, 5))

ax1 = fig.add_subplot(121, projection='3d')
ax1.plot_surface(X, Y, Z, cmap='viridis')
ax1.set_title('f(x, y) = x^2 + y^2')

ax2 = fig.add_subplot(122)
U, V = grad_f(X, Y)
ax2.quiver(X, Y, U, V)
ax2.set_title('Gradient of f(x, y)')

plt.tight_layout()
plt.show()

## 5. Derivatives with Multiple Vector Inputs

In deep learning, we often deal with functions that take multiple vectors as inputs. The Jacobian matrix represents the derivatives of these functions.

For a function $\mathbf{f}: \mathbb{R}^n \rightarrow \mathbb{R}^m$, the Jacobian matrix is defined as:

$$J = \begin{bmatrix}
    \frac{\partial f_1}{\partial x_1} & \cdots & \frac{\partial f_1}{\partial x_n} \\
    \vdots & \ddots & \vdots \\
    \frac{\partial f_m}{\partial x_1} & \cdots & \frac{\partial f_m}{\partial x_n}
\end{bmatrix}$$

Let's implement a simple neural network layer and compute its Jacobian. We'll use the sigmoid activation function:

$$\sigma(x) = \frac{1}{1 + e^{-x}}$$

The layer function is defined as:

$$\mathbf{y} = \sigma(W\mathbf{x} + \mathbf{b})$$

Where $W$ is the weight matrix and $\mathbf{b}$ is the bias vector.

The Jacobian of this layer with respect to the input $\mathbf{x}$ is:

$$J = \text{diag}(\sigma(W\mathbf{x} + \mathbf{b}) \odot (1 - \sigma(W\mathbf{x} + \mathbf{b}))) \cdot W$$

Where $\odot$ represents element-wise multiplication.

Here's the implementation:

In [None]:
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def layer(x, W, b):
    return sigmoid(np.dot(W, x) + b)

def jacobian(x, W, b):
    z = np.dot(W, x) + b
    s = sigmoid(z)
    return np.diag(s * (1 - s)) @ W

# Example usage
x = np.array([1, 2, 3])
W = np.array([[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]])
b = np.array([0.1, 0.2])

print("Layer output:")
print(layer(x, W, b))

print("\nJacobian:")
print(jacobian(x, W, b))

## Conclusion

In this notebook, we've explored various aspects of derivatives and their applications in machine learning. We've covered basic derivative functions, nested functions, the chain rule, derivatives with multiple inputs, and derivatives with multiple vector inputs.

Understanding these concepts is crucial for:
1. Implementing gradient descent and other optimization algorithms
2. Designing and training neural networks
3. Understanding backpropagation
4. Analyzing the sensitivity of models to their inputs and parameters

As you continue your journey in machine learning and deep learning, you'll find these concepts appearing repeatedly, forming the foundation of many advanced techniques and algorithms.