We will try out linear regression using the first principle back propagation and using the micrograd back propagation on the same set of data. 

In [26]:
# Imports
import linear_reg.src.first_principle as first_principle
import linear_reg.src.use_micrograd as use_micrograd
from linear_reg.optim.optimizer import AdagradOptimizer, SGDOptimizer, MomentumOptimizer, AdamOptimizer

In [4]:
# Data
X = [[1.0, 2.0], [2.0, 3.0], [3.0, 4.0], [4.0, 5.0], [5.0, 6.0]]
y = [3.0, 5.0, 7.0, 9.0, 11.0]

## Linear regression from first principle
Here the whole dataset is trained in each step. So the loss reported is the total loss. Lets train the model for 10 steps.

In [5]:
model = first_principle.LinearRegression(batch_size=5, num_features=2)
for index in range(11):
    pred, loss = model.train(features=X, labels=y, learning_rate=0.009)
    print(f'Step {index}. Total loss: {loss:5f}')

test_features = [[5.0, 6.0]]
prediction = model.forward(test_features)
print(f"Prediction for {test_features[-1]}: {prediction[-1]}")

Step 0. Total loss: 49.492646
Step 1. Total loss: 10.724582
Step 2. Total loss: 2.390859
Step 3. Total loss: 0.598926
Step 4. Total loss: 0.213139
Step 5. Total loss: 0.129605
Step 6. Total loss: 0.111043
Step 7. Total loss: 0.106451
Step 8. Total loss: 0.104867
Step 9. Total loss: 0.103933
Step 10. Total loss: 0.103143
Prediction for [5.0, 6.0]: 10.68999693146993


## Linear regression with micrograd + SGD optimizer

In [6]:
model = use_micrograd.LinearRegression(num_features=2)
optimizer = SGDOptimizer(model.parameters(), learning_rate=0.001)
use_micrograd.train(model, optimizer, X, y, epochs=30, batch_size=2)
# Make a prediction
test_features = [5.0, 6.0]
prediction = model.forward(test_features)
print(f"Prediction for {test_features}: {prediction.value}")

Epoch [5/30], Total Loss per epoch: 46.3274, Loss per sample: 9.2655
Epoch [10/30], Total Loss per epoch: 8.0531, Loss per sample: 1.6106
Epoch [15/30], Total Loss per epoch: 1.2622, Loss per sample: 0.2524
Epoch [20/30], Total Loss per epoch: 0.2508, Loss per sample: 0.0502
Epoch [25/30], Total Loss per epoch: 0.0579, Loss per sample: 0.0116
Epoch [30/30], Total Loss per epoch: 0.0297, Loss per sample: 0.0059
Prediction for [5.0, 6.0]: 10.851056455851072


## Linear regression with micrograd + Adagrad optimizer

In [7]:
model = use_micrograd.LinearRegression(num_features=2)
optimizer = AdagradOptimizer(model.parameters(), learning_rate=0.001)
use_micrograd.train(model, optimizer, X, y, epochs=30, batch_size=2)
# Make a prediction
test_features = [5.0, 6.0]
prediction = model.forward(test_features)
print(f"Prediction for {test_features}: {prediction.value}")

Epoch [5/30], Total Loss per epoch: 38.4283, Loss per sample: 7.6857
Epoch [10/30], Total Loss per epoch: 41.5411, Loss per sample: 8.3082
Epoch [15/30], Total Loss per epoch: 45.6037, Loss per sample: 9.1207
Epoch [20/30], Total Loss per epoch: 34.4996, Loss per sample: 6.8999
Epoch [25/30], Total Loss per epoch: 40.4363, Loss per sample: 8.0873
Epoch [30/30], Total Loss per epoch: 44.5673, Loss per sample: 8.9135
Prediction for [5.0, 6.0]: 5.844947463517881


The learning rate is too low for AdagradOptimizer. Lets increase it and try again.

In [8]:
model = use_micrograd.LinearRegression(num_features=2)
optimizer = AdagradOptimizer(model.parameters(), learning_rate=0.15)
use_micrograd.train(model, optimizer, X, y, epochs=30, batch_size=2)
# Make a prediction
test_features = [5.0, 6.0]
prediction = model.forward(test_features)
print(f"Prediction for {test_features}: {prediction.value}")

Epoch [5/30], Total Loss per epoch: 24.5341, Loss per sample: 4.9068
Epoch [10/30], Total Loss per epoch: 3.5693, Loss per sample: 0.7139
Epoch [15/30], Total Loss per epoch: 0.4398, Loss per sample: 0.0880
Epoch [20/30], Total Loss per epoch: 0.0787, Loss per sample: 0.0157
Epoch [25/30], Total Loss per epoch: 0.0149, Loss per sample: 0.0030
Epoch [30/30], Total Loss per epoch: 0.0049, Loss per sample: 0.0010
Prediction for [5.0, 6.0]: 11.005392753769122


## Linear regression with micrograd + Momentum optimizer
With momentum, the model converges faster than all previous optimizers.

In [9]:
model = use_micrograd.LinearRegression(num_features=2)
optimizer = MomentumOptimizer(model.parameters(), learning_rate=0.005, momentum=0.90)
use_micrograd.train(model, optimizer, X, y, epochs=30, batch_size=2)
# Make a prediction
test_features = [5.0, 6.0]
prediction = model.forward(test_features)
print(f"Prediction for {test_features}: {prediction.value}")

Epoch [5/30], Total Loss per epoch: 13.9431, Loss per sample: 2.7886
Epoch [10/30], Total Loss per epoch: 3.8992, Loss per sample: 0.7798
Epoch [15/30], Total Loss per epoch: 0.1253, Loss per sample: 0.0251
Epoch [20/30], Total Loss per epoch: 0.0231, Loss per sample: 0.0046
Epoch [25/30], Total Loss per epoch: 0.0056, Loss per sample: 0.0011
Epoch [30/30], Total Loss per epoch: 0.0020, Loss per sample: 0.0004
Prediction for [5.0, 6.0]: 10.967606595019898


## Linear regression with micrograd + Adam optimizer
Adam converges faster than Momentum optimizer.

In [25]:
model = use_micrograd.LinearRegression(num_features=2)
optimizer = AdamOptimizer(model.parameters(), learning_rate=0.01, beta1=0.9, beta2=0.999)
use_micrograd.train(model, optimizer, X, y, epochs=30, batch_size=2)
# Make a prediction
test_features = [5.0, 6.0]
prediction = model.forward(test_features)
print(f"Prediction for {test_features}: {prediction.value}")

Epoch [5/30], Total Loss per epoch: 0.9220, Loss per sample: 0.1844
Epoch [10/30], Total Loss per epoch: 0.0660, Loss per sample: 0.0132
Epoch [15/30], Total Loss per epoch: 0.0664, Loss per sample: 0.0133
Epoch [20/30], Total Loss per epoch: 0.0153, Loss per sample: 0.0031
Epoch [25/30], Total Loss per epoch: 0.0109, Loss per sample: 0.0022
Epoch [30/30], Total Loss per epoch: 0.0097, Loss per sample: 0.0019
Prediction for [5.0, 6.0]: 10.961203645023467
