In [36]:
from collections import defaultdict
from typing import Callable, Tuple

# Gradient descent and automatic differentiation

### Exercise 1:

Write an algorithm for gradient descent using the following scheme and try it on some functions

In [52]:
def gradient_descent_1dim(
    f: Callable[[float], float],
    grad_f: Callable[[float], float],
    num_steps=50,
    step_size=1e-4,
    initial_x: float = 0
) -> Tuple[float, float]:
    stop_at_gradient_below = 1e-3
    current_pos = initial_x
    for n in range(num_steps):
        ...

    print(f"Failed to converge after {num_steps} steps, returning current candidates for argmin and min")
    return current_pos, f(current_pos)

In [53]:
def f1(x: float):
    return (6*x -2)**4 - 2*x


def grad_f1(x: float):
    return 4 * ((6*x - 2)**3) * 6 - 2

In [54]:
gradient_descent_1dim(f1, grad_f1, num_steps=1000, step_size=2e-3)

Failed to converge after 1000 steps, returning current candidates for argmin and min


(0, 16)

Our main problem is that we need to pass the gradient on our own. Instead, we would like to just pass a function
and the gradient to be computed automatically / numerically for us. The solution that we are going to use is at
the very core of most machine learning libraries - _autodiff_. Essentially, it is just the use of the chain rule
and the computation of complicated functions by composition of simple functions.

Our presentation is largely based on [this post](https://sidsite.com/posts/autodiff/).

In [55]:
class Variable:
    def __init__(self, value: float, local_gradients=()):
        self.value = value
        self.local_gradients = local_gradients

def add(a: Variable, b: Variable):
    """Create the variable that results from adding two variables."""
    value = a.value + b.value
    local_gradients = (
        (a, 1),  # the local derivative with respect to a is 1
        (b, 1)   # the local derivative with respect to b is 1
    )
    return Variable(value, local_gradients)

def mul(a: Variable, b: Variable):
    """Create the variable that results from multiplying two variables."""
    value = a.value * b.value
    local_gradients = (
        (a, b.value), # the local derivative with respect to a is b.value
        (b, a.value)  # the local derivative with respect to b is a.value
    )
    return Variable(value, local_gradients)


def get_gradients(variable: Variable):
    """ Compute the first derivatives of `variable`
    with respect to child variables.
    """
    gradients = defaultdict(lambda: 0)

    def compute_gradients_recursively(var: Variable, path_value: float):
        for child_variable, local_gradient in var.local_gradients:
            # Multiply the edges of a path
            value_of_path_to_child = path_value * local_gradient
            # Add together the different paths
            gradients[child_variable] += value_of_path_to_child
            # recurse through graph
            compute_gradients_recursively(child_variable, value_of_path_to_child)

    compute_gradients_recursively(variable, path_value=1)
    # (path_value=1 is from `variable` differentiated w.r.t. itself)
    return dict(gradients)

## Note:

The above code may have elements that are new to you: classes, defaultdict are recursive functions. Make sure you
understand them before you proceed (or ask me for support).


In [56]:
a = Variable(4)
b = Variable(12)

d = mul(add(a, b), add(a, b))

In [57]:
get_gradients(d)

### Exercise 2:

Implement gradient descent with autodiff for simple functions. Add more possibilities like powers, multiplication
with a constant (not actually done in the link above) and use python magic for overloading operators.
