# Micrograd - Arnie's version

I built this project as a way to solidify my understanding of the Stanford/DeepLearningAI Coursera course.

Before taking the class (June start - July end), I had watched Andrej Karpathy's Micrograd lecture (around May), but not since then.

To really test my understanding, I built this out *without* re-watching Karpathy's lecture. That is, I built this with a rough understanding of autograd, and thought through first principles of differentiation, partial derivatives, chain_rule, software design, back_propagation, linear regression, optimizers, loss functions.

Note:
- I did have a working micrograd implementation, however my loss was incorrect.
- After manually calculating the backprop and still not figuring out, I skimmed through Karpathy's micrograd code.
- I took the opportunity to:
-> Add in additional methods I hand't thought like __sub__, __rsub__, ....
-> Clean-up my implementation of Neuron (notably, passing in input_feature and a single x sample, rather than the x as array itself)

In the end, I realized the problem was with that I forgot to call the zero_grad() which led to the loss bouncing since the grads were compounding across iterations

In principle, I don't take shortcuts, but given my urgency to progress as fast as possible in ML and since I already had a working implementation (with correct intuition), it did not make sense to spend more time debugging (I can always do that later).

Extra notes: Obviously I did not use Copilot xD.

## Operation Tests

In [1]:
from micrograd.engine.value import Value

a = Value(6.0)
b = Value(3.0)

c = a * b
print(c)

d = a / b
print(d)

e = Value(100.0)
f = e.log()
# print(f)

Val(value=18.0000, grad=0.0000, parents=(6.0 * 3.0))
Val(value=2.0000, grad=0.0000, parents=(6.0 * 0.3333333333333333))


In [2]:
f = f.log()
# print(f)

## Simple Linear Regression Example

In [3]:
from micrograd.nets.SimpleLinearRegression import SimpleLinearRegression
from micrograd.loss_functions.MSE import MSE
from micrograd.optimizers.SimpleOptimizer import SimpleOptimizer

from micrograd.engine.value import Value

# Set-up model, optimizers and loss function
model = SimpleLinearRegression()
criterion = MSE()

num_epochs = 1000
lr = 0.01

optimizer = SimpleOptimizer(model.parameters(), lr)

# Dataset
x = [Value(1.0), Value(4.0), Value(9.0)]
y = [Value(1.0), Value(8.0), Value(18.0)]

for epoch in range(num_epochs):
    # Need to applying the logistic activation!
    y_pred = [model([xi]) for xi in x]

    loss = criterion(y_pred, y)

    optimizer.zero_grad()

    loss.backward()
    optimizer.step()

    if (epoch + 1) % 100 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.value:.4f}')

test_x = [Value(6.0)]
test_y_pred = model(test_x)

print(f"Prediction for x={test_x[0].value} is y_pred={test_y_pred.value} (Expected is 12)")

Epoch [100/1000], Loss: 0.0529
Epoch [200/1000], Loss: 0.0434
Epoch [300/1000], Loss: 0.0426
Epoch [400/1000], Loss: 0.0425
Epoch [500/1000], Loss: 0.0425
Epoch [600/1000], Loss: 0.0425
Epoch [700/1000], Loss: 0.0425
Epoch [800/1000], Loss: 0.0425
Epoch [900/1000], Loss: 0.0425
Epoch [1000/1000], Loss: 0.0425
Prediction for x=6.0 is y_pred=11.816327056431131 (Expected is 12)


## Hot Dog Classifier
Inspired by Jian-Yang from Silicon Valley, I built out a simple hot dog classifer.

I downloaded a dataset from HuggingFace and used my Micrograd implementation to train and evaluate test images :)

In [4]:
!pip install Pillow
!pip install numpy



In [5]:
# Load data and pre-process it
import os
from PIL import Image
import numpy as np
from micrograd.engine.value import Value

IMAGE_SIZE = 225

def load_and_preprocess_images(folder_path, label, image_size):
    images = []
    labels = []
    for filename in os.listdir(folder_path):
        if filename.endswith(('.jpg', '.jpeg', '.png')):
            img_path = os.path.join(folder_path, filename)
            with Image.open(img_path) as img:
                img = img.resize((image_size, image_size))
                img = img.convert('L')  # Convert to grayscale
                img_array = np.array(img).flatten() / 255.0  # Normalize to [0, 1]
                images.append([Value(float(pixel)) for pixel in img_array])
                labels.append(Value(float(label)))
    return images, labels

# Load training data
train_hotdog_path = 'dataset/hotdog_nothotdog/train/hotdog'
train_nothotdog_path = 'dataset/hotdog_nothotdog/train/not_hotdog'

x_train_hotdog, y_train_hotdog = load_and_preprocess_images(train_hotdog_path, 1, IMAGE_SIZE)
x_train_nothotdog, y_train_nothotdog = load_and_preprocess_images(train_nothotdog_path, 0, IMAGE_SIZE)

x_train = x_train_hotdog + x_train_nothotdog
y_train = y_train_hotdog + y_train_nothotdog

# Load validation data
val_hotdog_path = 'dataset/hotdog_nothotdog/val/hotdog'
val_nothotdog_path = 'dataset/hotdog_nothotdog/val/not_hotdog'

x_valid_hotdog, y_valid_hotdog = load_and_preprocess_images(val_hotdog_path, 1, IMAGE_SIZE)
x_valid_nothotdog, y_valid_nothotdog = load_and_preprocess_images(val_nothotdog_path, 0, IMAGE_SIZE)

x_valid = x_valid_hotdog + x_valid_nothotdog
y_valid = y_valid_hotdog + y_valid_nothotdog

# Shuffle the training data
combined = list(zip(x_train, y_train))
np.random.shuffle(combined)
x_train, y_train = zip(*combined)

print(f"Training samples: {len(x_train)}")
print(f"Validation samples: {len(x_valid)}")

Training samples: 200
Validation samples: 50


In [6]:
from micrograd.nets.ComplexLogisticRegression import ComplexLogisticRegression

from micrograd.loss_functions.BinaryCrossEntropy import BinaryCrossEntropy
from micrograd.optimizers.SimpleOptimizer import SimpleOptimizer

import random


model = ComplexLogisticRegression(image_size=IMAGE_SIZE)
criterion = BinaryCrossEntropy()

num_epochs = 1000
lr = 0.02

optimizer = SimpleOptimizer(model.parameters(), lr)

# Train the model
for epoch in range(num_epochs):

    y_pred = [model(xi) for xi in x_train]

    # for prediction in y_pred:
        # print(f"Prediction: {prediction}")

    loss = criterion(y_pred, y_train)

    optimizer.zero_grad()

    loss.backward()

    # Dynamically adjust the learning rate
    if (loss.value < 20.0):
        optimizer.set_learn_rate(0.02)    

    if (loss.value < 10.0):
        optimizer.set_learn_rate(0.001)

    if (loss.value < 2.0):
        optimizer.set_learn_rate(random.uniform(0.0001, 0.002))

    if (loss.value < 0.85):
        optimizer.set_learn_rate(random.uniform(0.0000001, 0.00001))

    # Early stopping
    if (loss.value < 0.70):
        break

    optimizer.step()

    if (epoch + 1) % 1 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.value} - (step: {optimizer.get_learn_rate()})')

Epoch [1/1000], Loss: 16.74500741199737 - (step: 0.02)
Epoch [2/1000], Loss: 15.274396967145977 - (step: 0.02)
Epoch [3/1000], Loss: 0.913862302068128 - (step: 0.00040946396473271184)
Epoch [4/1000], Loss: 1.8847307975486143 - (step: 0.00043039983075195647)
Epoch [5/1000], Loss: 1.2451070174722156 - (step: 0.0002118502356626603)
Epoch [6/1000], Loss: 1.2386506537410762 - (step: 0.0016896293335791539)
Epoch [7/1000], Loss: 1.2331419018233445 - (step: 0.0019838251696586945)
Epoch [8/1000], Loss: 1.281563892455867 - (step: 0.0006519202676529401)
Epoch [9/1000], Loss: 1.176251056197677 - (step: 0.00012568049427623163)
Epoch [10/1000], Loss: 1.2122863239164492 - (step: 0.001604228288120476)
Epoch [11/1000], Loss: 1.4143988579222595 - (step: 0.000774691484114436)
Epoch [12/1000], Loss: 0.9827225319650521 - (step: 0.0008212440072663189)
Epoch [13/1000], Loss: 2.4318005171876194 - (step: 0.001)
Epoch [14/1000], Loss: 0.7396792858725348 - (step: 2.8068175173366256e-06)
Epoch [15/1000], Loss: 0.

In [10]:
print(loss)
print(loss.value)
print(f'Loss: {loss.value:.4f}')
print(f'Epoch [], Loss: {loss.value:.4f}')

Val(value=0.7196, grad=1.0000, parents=(143.914047016878 * 0.005))
0.71957023508439
Loss: 0.7196
Epoch [], Loss: 0.7196


In [18]:
# Test the model
n = len(x_valid)
correct_guesses = 0
wrong_guesses = 0

y_pred = [model(xi) for xi in x_valid]

predictions = []

for pred in y_pred:
    if pred.value <= 0.5:
        predictions.append(0)
    else:
        predictions.append(1)
    
for i in range(n):
    print(f'Prediction: {predictions[i]} with confidence {y_pred[i].value:.2f} for actual: {y_valid[i].value}')

    if predictions[i] == y_valid[i].value:
        correct_guesses += 1
    else:
        wrong_guesses += 1

print(f"Correct guesses: {correct_guesses}, Wrong guesses: {wrong_guesses}, Total: {n} - Accuracy: {(correct_guesses/n)*100}%")
print(f"Loss: {loss}")


Prediction: 1 with confidence 0.61 for actual: 1.0
Prediction: 0 with confidence 0.40 for actual: 1.0
Prediction: 1 with confidence 0.51 for actual: 1.0
Prediction: 1 with confidence 0.51 for actual: 1.0
Prediction: 1 with confidence 0.54 for actual: 1.0
Prediction: 1 with confidence 0.65 for actual: 1.0
Prediction: 1 with confidence 0.51 for actual: 1.0
Prediction: 0 with confidence 0.47 for actual: 1.0
Prediction: 1 with confidence 0.62 for actual: 1.0
Prediction: 1 with confidence 0.55 for actual: 1.0
Prediction: 1 with confidence 0.52 for actual: 1.0
Prediction: 1 with confidence 0.52 for actual: 1.0
Prediction: 0 with confidence 0.49 for actual: 1.0
Prediction: 1 with confidence 0.51 for actual: 1.0
Prediction: 1 with confidence 0.51 for actual: 1.0
Prediction: 0 with confidence 0.49 for actual: 1.0
Prediction: 1 with confidence 0.60 for actual: 1.0
Prediction: 0 with confidence 0.49 for actual: 1.0
Prediction: 1 with confidence 0.52 for actual: 1.0
Prediction: 0 with confidence 0

In [9]:
y_pred = Value(0.1)
y = Value(1.0)

loss_test = Value(0.0)
loss_test += (-(y * y_pred.log()) - ((Value(1.0)-y)*(Value(1.0)-y_pred).log()))
loss_test += (-(y * y_pred.log()) - ((Value(1.0)-y)*(Value(1.0)-y_pred).log()))


print(loss_test)

Val(value=4.6052, grad=0.0000, parents=(2.3025850929940455 + 2.3025850929940455))
