# Auto Diff

Goal: 
- how to implement auto diff in python

Resources:
1. [Auto differentiation in machine learning: a survey](https://arxiv.org/pdf/1502.05767.pdf)
1. https://mostafa-samir.github.io/auto-diff-pt1/



## Dual Numbers
Numerical differentiation suffers from numerical errors due to truncation error and machine precision. 
Symbolic differentiation is accurate but hard to implement for general functions.
Autodiff can address both issues. 

Let's look at a general function $f(x)$ around the point $x=a$ using Taylor series expansion:

$$
f(x) = f(a) + f'(a)(x-a) + \sum_{k=2}^{\infty} \frac{f^{k}(a)}{k!}(x-a)^k
$$

It would be nice if we can calculate $f(x)$, and simulatenaously preserve the term containing the derivative $f'(a)$, and make the rest of the inifnite sum to `exactly zero`.
In this way, there would be no trunction error, and we can get the exact derivative by just evaluating the function itself. 

To make the trunction error to zero, we need: either 1) all the higher order derivatives are zero, or 2) $(x-a)^k = 0$, which means $x=a$.
Case 1) is possible (e.g., $f$ is linear function) but not general to all possible functions. 
Case 2) that $x=a$ will also lead to the first-order derivative term to be zero, which erase the first-order derivative information that we want to preserve.

This sounds no real solutions to such a problem. 
However, researchers designed a clever trick to solve this problem just like introducing a new variable $i$ so that $i^2=-1$ in complex numbers.
The trick is called **dual numbers**.

Say we introduce a new set of numbers in the form of $a + b\epsilon$ where $a, b \in \mathbb{R}$ and $\epsilon^2=0$ but $\epsilon \neq 0$.
$a$ is called the **real part** and $b$ is called the **dual part**.
Then we can rewrite the Taylor series expansion of $f(x)$ as: 

\begin{align}
f(a+b\epsilon) &= f(a) + f'(a)b\epsilon + \sum_{k=2}^{\infty} \frac{f^{k}(a)}{k!}b^k\epsilon^k \\
            &= f(a) + f'(a)b\epsilon
\end{align}


If we set $b=1$, then we have:

$$
f(a+\epsilon) = f(a) + f'(a)\epsilon
$$

which means that using dual numbers with a real part $a$ and a dual part $1$, we can get the exact first-order derivative of $f$ at $a$.

### Operations on Dual Numbers

Addition:

$$ (a + b\epsilon) + (c + d\epsilon) = (a+c) + (b+d)\epsilon $$

Multiplication:

$$ (a + b\epsilon) \times (c + d\epsilon) = ac + (ad + bc)\epsilon $$

These numbers will "automatically" provide us with the simple rules of diferentiation. For example, for two general functions $f$ and $g$:

\begin{align}
f(a+\epsilon) + g(a+\epsilon) &= f(a) + f'(a)\epsilon + g(a) + g'(a)\epsilon \\
                            &= [f(a) + g(a)] + [f'(a) + g'(a)]\epsilon \\
f(a+\epsilon) \times g(a+\epsilon) &= [f(a) + f'(a)\epsilon] \times [g(a) + g'(a)\epsilon] \\
                                 &= [f(a)g(a)] + [f(a)g'(a) + f'(a)g(a)]\epsilon
\end{align}

which simulatenaously capture the chain rule of differentiation.

## AD: Forward Mode



In [2]:
from numbers import Number
from math import log

# Define a class for dual numbers
# Define operaiton overloads for the basic arithmetic operations
class DualNumber:

    def __init__(self, real, dual):
        """
        Constructs a dual number: z = real + dual * e: e^2 = 0
        Parameters:
        ----------
        real: Number
            The real component in the dual number
        dual: Number
            The coeffcient of the dual component in the number
        """

        self.real = real
        self.dual = dual

    def _add(self, other):
        """
        Defines addition operation logic for dual numbers
        Parameters:
        ----------
        other: DualNumber | Number
            The other dual/real number to add to the current one
        Returns: DualNumber
            A new dual number containing the addition result
        """
        if isinstance(other, DualNumber):
            return DualNumber(self.real + other.real, self.dual + other.dual)
        elif isinstance(other, Number):
            return DualNumber(self.real + other, self.dual)
        else:
            raise TypeError("Unsupported Type for __add__")

    def _sub(self, other, self_first=True):
        """
        Defines subtractio  operation logic for dual numbers
        Parameters:
        ----------
        other: DualNumber | Number
            The other dual/real number to subtract to the current one
        self_first: Boolean
            An indicator if the current dual is the first operand
            if True, then the operation is (self - other)
            otherwise, the operation is (other - self)
        Returns: DualNumber
            A new dual number containing the subtraction result
        """
        if self_first and isinstance(other, DualNumber):
            return DualNumber(self.real - other.real, self.dual - other.dual)
        elif self_first and isinstance(other, Number):
            return DualNumber(self.real - other, self.dual)
        elif not self_first and isinstance(other, Number):
            return DualNumber(other - self.real, -1 * self.dual)
        else:
            raise TypeError("Unsupported Type for __sub__")

    def _mul(self, other):
        """
        Defines multiplication operation logic for dual numbers
        Parameters:
        -----------
        other: DualNumber | Number
            The other dual/real number to add to multiply
        Returns: DualNumber
            A new dual number containing the multiplication result
        """
        if isinstance(other, DualNumber):
            return DualNumber(self.real * other.real, self.real * other.dual + self.dual * other.real)
        elif isinstance(other, Number):
            return DualNumber(self.real * other, self.dual * other)
        else:
            raise TypeError("Unsupported Type for __mul__")

    def _div(self, other, self_numerator=True):
        """
        Defines division operation logic for dual numbers
        Parameters:
        -----------
        other: DualNumber | Number
            The other dual/real number to add to division
        self_numerator: Boolean
            A flag determining if the current dual is the numerator in the operation
            if True then the operation is (self / other)
            otherwise, other is the base and the operation is (other / self)
        Returns: DualNumber
            A new dual number containing the division result
        """
        if self_numerator and isinstance(other, DualNumber):
            if other.real == 0:
                raise ZeroDivisionError("Attempting to divide by a zero")
            else:
                div_real = self.real / other.real
                div_dual = -1 * (self.real * other.dual - self.dual * other.real) / other.real ** 2
                return DualNumber(div_real, div_dual)
        elif self_numerator and isinstance(other, Number):
            if other == 0:
                raise ZeroDivisionError("Attempting to divide by a zero")
            else:
                return DualNumber(self.real / other, self.dual / other)
        elif not self_numerator and isinstance(other, Number):
            if self.real == 0:
                raise ZeroDivisionError("Attempting to divide by a zero")
            else:
                return DualNumber(other / self.real, -1 * (other * self.dual) / self.real ** 2)
        else:
            raise TypeError("Unsupported Type for __div__")

    def _pow(self, other, self_base=True):
        """
        Defines exponentiation logic for dual numbers
        Parameters:
        -----------
        other: DualNumber | Number
            The exponent of the operation (or the base if self_base is False)
        self_base: Boolean
            A flag determining if the current dual is the base in the operation
            if True then the operation is (self ^ other)
            otherwise, other is the base and the operation is (other ^ self)
        Returns: DualNumber
            A new dual number containing
        """
        if self_base and isinstance(other, Number):
            return DualNumber(self.real ** other, self.dual * other * (self.real ** (other - 1)))
        elif self_base and isinstance(other, DualNumber):
            new_real = self.real ** other.real
            new_dual = (self.real ** (other.real - 1)) * (self.real * other.dual * log(self.real) + other.real * self.dual)
            return DualNumber(new_real, new_dual)
        elif not self_base and isinstance(other, Number):
            return DualNumber(other ** self.real, (other ** self.real) * self.dual * log(other))
        else:
            raise TypeError("Unsupported Type for __pow__")


    def __add__(self, other):
        """
        Overloads the + operator for dual numbers
        """
        return self._add(other)

    def __radd__(self, other):
        """
        Overloads the reverese + operator for dual numbers
        """
        return self._add(other)

    def __sub__(self, other):
        """
        Overloads the - operator for dual numbers
        """
        return self._sub(other)

    def __rsub__(self, other):
        """
        Overloads the reverese - operator for dual numbers
        """
        return self._sub(other, self_first=False)


    def __mul__(self, other):
        """
        Overloads the * operator for dual numbers
        """
        return self._mul(other)

    def __rmul__(self, other):
        """
        Overloads the reverese * operator for dual numbers
        """
        return self._mul(other)

    def __truediv__(self, other):
        """
        Overloads the / operator for dual numbers
        """
        return self._div(other)

    def __rtruediv__(self, other):
        """
        Overloads the reverese / operator for dual numbers
        """
        return self._div(other, self_numerator=False)

    def __div__(self, other):
        """
        Overloads the / operator for dual numbers
        """
        return self._div(other)

    def __rdiv__(self, other):
        """
        Overloads the reverese / operator for dual numbers
        """
        return self._div(other, self_numerator=False)

    def __pow__(self, other):
        """
        Overloads the ** operator for dual numbers
        """
        return self._pow(other)

    def __rpow__(self, other):
        """
        Overloads the reverse ** operator for dual numbers
        """
        return self._pow(other, self_base=False)

    def __cmp__(self, other):
        """
        Overloads comparison operators for dual numbers
        """
        if isinstance(other, DualNumber):
            return self.real - other.real
        elif isinstance(other, Number):
            return self.real - other
        else:
            raise TypeError("Unsupported Type for __cmp__")

    def __repr__(self):
        """
        Provides the string representation of the dual number
        """
        return "%s %s %sɛ" % (self.real, '+' if self.dual > 0 else '-', abs(self.dual))
    

See how it works on a simple example:

In [3]:
import math as rmath

def sin(x):
    if isinstance(x, DualNumber):
        return DualNumber(rmath.sin(x.real), rmath.cos(x.real)*x.dual)
    else:
        return rmath.sin(x)

x = DualNumber(1,1)
sin(x)

0.8414709848078965 + 0.5403023058681398ɛ

Now we can implement the forward mode of AD in python:

In [8]:

def derivative(fx, x):
    return fx(DualNumber(x, 1)).dual

fx = lambda x: sin(x)
derivative(fx, rmath.pi/2)

6.123233995736766e-17

So far, we can only calculate derivatives, but in deep learning, we also need to calculate gradients, which requires partial derivatives.
