# *Iris* flower classification with micrograd
In this notebook, you will implement a 2-layer (4-16-3) fully connected
feed-forward neural network to classify species of of *Iris* flowers, based on
four different measurements (sepal length, sepal width, petal length, and petal
width).

In [10]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import numpy as np
import random
import micrograd.nn as nn
from micrograd.engine import Value
import torch
%matplotlib inline

np.random.seed(42)
random.seed(42)

First, let's setup our dataset. 

The "x" variables are the images, while the "y" variables are the ground truths.

In [11]:
iris = load_iris()
train_x, test_x, train_y, test_y = train_test_split(iris.data, iris.target, test_size=0.2)
print(f"{train_x.shape=}\n{train_y.shape=}\n{test_x.shape=}\n{test_y.shape=}")

train_x.shape=(120, 4)
train_y.shape=(120,)
test_x.shape=(30, 4)
test_y.shape=(30,)


Now that we have our data loaded, we can initialize our model, using the
abstractions we wrote in `micrograd/nn.py` (imported above as `nn`).

Remember that we are looking to create a multi-layer perceptron, with one hidden
layer of dimension 16, an input layer of 4, and output layer of 3.

In [12]:
model = nn.MLP(4, [16, 3])

Now, let's try evaluating our model on the first flower in our training set.

In [13]:
model(train_x[0])

[Value(data=1.3497568231790034, grad=0),
 Value(data=-10.558013135005577, grad=0),
 Value(data=1.1832076310091515, grad=0)]

Now, of course, this returns total garbage, because our model is just running on
initialized random values for its weights. First, let's make sure that our model
is actually outputting a list of probabilities for each digit. We will use the
softmax function to do this:
$$\sigma(z)_i=\frac{\exp(z_i)}{\sum_{c=1}^C\exp(z_c)}$$
Where $C$ is number of classes (3). Implement this function below. Remember,
you are working with `Value`s, so you can only use the operations you defined
in `micrograd/engine.py`.

In [14]:
def softmax(z: list[Value], C:int=10) -> list[Value]:
  denom = sum(list(zi.exp() for zi in z))
  return list(zi.exp() / denom for zi in z)

Now we can try evaluating the model again, with our new softmax function.

In [15]:
softmax(model(train_x[0]))

[Value(data=0.5415393414771973, grad=0),
 Value(data=3.6488098187818702e-06, grad=0),
 Value(data=0.458457009712984, grad=0)]

This looks a little better, doesn't it? Of course, it's still not an accurate
prediction, but at least it is a prediction. Now we need to worry about training
the model. 

In order to train the model, we first need to find a way to measure how wrong
its predictions are, that is we need to define our cost (loss) function. We will
use cross entropy loss, which is defined by the following equation:
$$\ell (H(x),y)=-\log\frac{\exp(H(x)_{y})}{\sum_{c=1}^C\exp(H(x)_c)}$$
Where $x$ is the input image, $H$ is our model, $y$ is the ground truth
corresponding to $x$, and $C$ is the number of classes (3). But note that this
is the same as:
$$\ell (H(x),y)=-\log(\sigma(H(x))_y)$$
Where $\sigma$ is softmax from above.

Remember that since you are working with `Value`'s you only have access to the
operations defined in `micrograd/engine.py`.

In [16]:
def loss(pred: list[Value], truth: int) -> Value:
  return -(softmax(pred)[truth].log())

Now that our loss function is defined, finally, we can write the code for
gradient descent.

In [17]:
def argmax(arr: list[Value]) -> int:
  return arr.index(max(arr, key=lambda x: x.data))

for epoch in range(100):
  # forward
  scores = list(map(lambda x: model(x), train_x))
  losses = list(map(lambda tup: loss(tup[1], train_y[tup[0]]), enumerate(scores)))
  total_loss = sum(losses) * (1.0 / len(losses))

  # backward
  model.zero_grad()
  total_loss.backward()

  # update (sgd)
  learning_rate = (1.0 - 0.9*epoch/100) / 10
  for p in model.parameters():
    p.data -= learning_rate * p.grad

  if epoch % 10 == 0: print(f"Epoch {epoch}: Accuracy={sum([argmax(softmax(model(x))) == y for x, y in zip(test_x, test_y)])/len(test_y)}, Loss={total_loss.data}")

Epoch 0: Accuracy=0.36666666666666664, Loss=7.824446113869851
Epoch 10: Accuracy=0.7, Loss=0.5285752809681077
Epoch 20: Accuracy=0.7, Loss=0.43878918785606336
Epoch 30: Accuracy=0.8333333333333334, Loss=0.3698044685069986
Epoch 40: Accuracy=1.0, Loss=0.3311467328890145
Epoch 50: Accuracy=1.0, Loss=0.30756669967310735
Epoch 60: Accuracy=1.0, Loss=0.2897295185150001
Epoch 70: Accuracy=1.0, Loss=0.27605731900340535
Epoch 80: Accuracy=1.0, Loss=0.2659335484175744
Epoch 90: Accuracy=1.0, Loss=0.25890412924031925


In [18]:
print("Final accuracy:", sum([argmax(softmax(model(x))) == y for x, y in zip(test_x, test_y)])/len(test_y))

Final accuracy: 1.0
