<a href="https://colab.research.google.com/github/danieljackson007/learning-nlp/blob/main/micrograd_exercises.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# micrograd exercises

1. watch the [micrograd video](https://www.youtube.com/watch?v=VMj-3S1tku0) on YouTube
2. come back and complete these exercises to level up :)

credit to solutions: spolivin on github

## section 1: derivatives

In [20]:
# here is a mathematical expression that takes 3 inputs and produces one output
from math import sin, cos

def f(a, b, c):
  return -a**3 + sin(3*b) - 1.0/c + b**2.5 - a**0.5

print(f(2, 3, 4))

6.336362190988558


## Computing analytical gradient

we need to compute the derivatives with respect to all parameters of the following function:

$$f(a, b, c) = -a^{3} + sin(3b) - \frac{1}{c} + b^{2.5} - a^{0.5}$$

just use partial differentiation to calculate analytical gradients of $f$ as follows:

$$\frac{df}{da} = -3a^{2} - 0.5a^{-0.5}$$

$$\frac{df}{db} = 3 cos(3b) + 2.5b^{1.5}$$

$$\frac{df}{dc} = c^{-2}$$

now we can use these formulas to compute all gradients required in code:

In [23]:
# write the function df that returns the analytical gradient of f
# i.e. use your skills from calculus to take the derivative, then implement the formula
# if you do not calculus then feel free to ask wolframalpha, e.g.:
# https://www.wolframalpha.com/input?i=d%2Fda%28sin%283*a%29%29%29

def gradf(a, b, c):
  df_da = -3*a**2 - 0.5*a**-0.5
  df_db = 3*cos(3*b) + 2.5*b**1.5
  df_dc = c**-2
  return [df_da, df_db, df_dc] # todo, return [df/da, df/db, df/dc]

# expected answer is the list of
ans = [-12.353553390593273, 10.25699027111255, 0.0625]
yours = gradf(2, 3, 4)
for dim in range(3):
  ok = 'OK' if abs(yours[dim] - ans[dim]) < 1e-5 else 'WRONG!'
  print(f"{ok} for dim {dim}: expected {ans[dim]}, yours returns {yours[dim]}")


OK for dim 0: expected -12.353553390593273, yours returns -12.353553390593273
OK for dim 1: expected 10.25699027111255, yours returns 10.25699027111255
OK for dim 2: expected 0.0625, yours returns 0.0625


## Computing numerical gradient (Derivative definition)

next task is to compute the gradient numerically by using the definition of the derivative:

$$f'(a) = \lim_{h \rightarrow 0} \frac{f(a + h) - f(a)}{h}$$

this essentially means that by considering smaller and smaller values of $h$, we will get better and better approximations of the derivative at some point $a$.

In [28]:
# now estimate the gradient numerically without any calculus, using
# the approximation we used in the video.
# you should not call the function df from the last cell

h = 1*10**-8

a = 2; b = 3; c = 4

df_da = (f(a + h, b, c) - f(a, b, c)) / h
df_db = (f(a, b + h, c) - f(a, b, c)) / h
df_dc = (f(a, b, c + h) - f(a, b, c)) / h

# -----------
numerical_grad = [df_da, df_db, df_dc] # TODO
# -----------

for dim in range(3):
  ok = 'OK' if abs(numerical_grad[dim] - ans[dim]) < 1e-5 else 'WRONG!'
  print(f"{ok} for dim {dim}: expected {ans[dim]}, yours returns {numerical_grad[dim]}")


OK for dim 0: expected -12.353553390593273, yours returns -12.353553380251014
OK for dim 1: expected 10.25699027111255, yours returns 10.256990368162633
OK for dim 2: expected 0.0625, yours returns 0.0624999607623522


## section 2: support for softmax

In [30]:
# Value class starter code, with many functions taken out
from math import exp, log

class Value:

  def __init__(self, data, _children=(), _op='', label=''):
    # put a docstring for good luck
    '''initializes the object of Value class'''

    # attribute for storing values
    self.data = data

    # attribute for storing gradients
    self.grad = 0.0

    # attribute for storing backprop function
    self._backward = lambda: None

    # attribute for storing children nodes in the computational graph
    self._prev = set(_children)

    # attribute for storing a label for the operation
    self._op = _op

    # attribute for storing a label of the Value instance
    self.label = label

  def __repr__(self):
    '''retruns a string representation of the object'''
    return f"Value(data={self.data})"

  def __add__(self, other): # exactly as in the video
    '''adds values of two instances together'''

    # converts a number to Value object if it isn't "value" already
    other = other if isinstance(other, Value) else Value(other)

    # does the addition
    out = Value(self.data + other.data, (self, other), '+')

    # computes the gradient for the addition operation
    def _backward():
      self.grad += 1.0 * out.grad
      other.grad += 1.0 * out.grad
    out._backward = _backward

    return out

  # ------
  # re-implement all the other functions needed for the exercises below
  # your code here
  # TODO
  # ------

  def __mul__(self, other):
    '''multiplies values of two instances together'''

    # converts a number to Value object if needed
    other = other if isinstance(other, Value) else Value(other)

    # does the multiplication
    out = Value(self.data * other.data, (self, other), '*')

    # computes gradients for the multiplication op
    def _backward():
      self.grad += other.data * out.grad
      other.grad += self.data * out.grad
    out._backward = _backward

  def __rmul__(self, other):
    return self * other

  def __neg__(self):
    return self * -1

  def __pow__(self, other):
    '''computes value of one instance to the power of another instance'''
    # makes sure that the values supplied are integers or floats
    assert isinstance(other, (int, float)), "must be int or float"

    # power operation
    out = Value(self.data ** other, (self,), f'**{other}') # don't fully understand this line

    # computes gradient for power op
    def _backward():
      self.grad += other * (self.data ** (other - 1)) * out.grad # this is the power rule * out.grad for the chain rule
    out._backward = _backward

  def __div__(self, other):
    return self * other**-1

  def __sub__(self, other):
    return self - other

  def __rsub__(self, other): # reverse subtraction --> self.__rsub__(other) = other - self
    return self - other

  def exp(self):
    '''does exponentiation'''
    # exponent of the power op
    x = self.data
    out = Value(exp(x), (self, ), 'exp')

    # compute gradient
    def _backward():
      self.grad += out.data * out.grad # the gradient is the result of the exponentiation itself bc d/dx(e^x) = e^x
    out._backward = _backward

    def log(self):
      '''takes the natural log'''
      # natural log op
      x = self.data
      out = Value(log(x), (self, ), 'log')
      # compute gradient for the natural log op
      def _backward():
          self.grad += (1/x) * out.grad
      out._backward = _backward
      return out

  def backward(self): # exactly as in video
    topo = []
    visited = set()
    def build_topo(v):
      if v not in visited:
        visited.add(v)
        for child in v._prev:
          build_topo(child)
        topo.append(v)
    build_topo(self)

    self.grad = 1.0
    for node in reversed(topo):
      node._backward()