We will try out linear regression using the first principle back propagation and using the micrograd back propagation on the same set of data. 

In [24]:
# Imports
import first_principle as first_principle
import use_micrograd as use_micrograd
from micrograd import AdagradOptimizer, SGDOptimizer, MomentumOptimizer, AdamOptimizer

In [2]:
# Data
X = [[1.0, 2.0], [2.0, 3.0], [3.0, 4.0], [4.0, 5.0], [5.0, 6.0]]
y = [3.0, 5.0, 7.0, 9.0, 11.0]

## Linear regression from first principle
Here the whole dataset is trained in each step. So the loss reported is the total loss. Lets train the model for 10 steps.

In [3]:
model = first_principle.LinearRegression(batch_size=5, num_features=2)
for index in range(11):
    pred, loss = model.train(features=X, labels=y, learning_rate=0.009)
    print(f'Step {index}. Total loss: {loss:5f}')

test_features = [[5.0, 6.0]]
prediction = model.forward(test_features)
print(f"Prediction for {test_features[-1]}: {prediction[-1]}")

Step 0. Total loss: 51.468681
Step 1. Total loss: 11.123484
Step 2. Total loss: 2.450947
Step 3. Total loss: 0.586373
Step 4. Total loss: 0.185156
Step 5. Total loss: 0.098488
Step 6. Total loss: 0.079434
Step 7. Total loss: 0.074917
Step 8. Total loss: 0.073528
Step 9. Total loss: 0.072814
Step 10. Total loss: 0.072248
Prediction for [5.0, 6.0]: 10.740159881695753


## Linear regression with micrograd + SGD optimizer

In [4]:
model = use_micrograd.LinearRegression(num_features=2)
optimizer = SGDOptimizer(model.parameters(), learning_rate=0.001)
use_micrograd.train(model, optimizer, X, y, epochs=30, batch_size=2)
# Make a prediction
test_features = [5.0, 6.0]
prediction = model.forward(test_features)
print(f"Prediction for {test_features}: {prediction.value}")

Epoch [5/30], Total Loss per epoch: 38.2867, Loss per sample: 7.6573
Epoch [10/30], Total Loss per epoch: 7.7243, Loss per sample: 1.5449
Epoch [15/30], Total Loss per epoch: 0.7200, Loss per sample: 0.1440
Epoch [20/30], Total Loss per epoch: 0.1939, Loss per sample: 0.0388
Epoch [25/30], Total Loss per epoch: 0.1092, Loss per sample: 0.0218
Epoch [30/30], Total Loss per epoch: 0.0654, Loss per sample: 0.0131
Prediction for [5.0, 6.0]: 10.802168781842427


## Linear regression with micrograd + Adagrad optimizer

In [11]:
model = use_micrograd.LinearRegression(num_features=2)
optimizer = AdagradOptimizer(model.parameters(), learning_rate=0.001)
use_micrograd.train(model, optimizer, X, y, epochs=30, batch_size=2)
# Make a prediction
test_features = [5.0, 6.0]
prediction = model.forward(test_features)
print(f"Prediction for {test_features}: {prediction.value}")

Epoch [5/30], Total Loss per epoch: 495.9855, Loss per sample: 99.1971
Epoch [10/30], Total Loss per epoch: 582.4815, Loss per sample: 116.4963
Epoch [15/30], Total Loss per epoch: 467.3942, Loss per sample: 93.4788
Epoch [20/30], Total Loss per epoch: 529.3330, Loss per sample: 105.8666
Epoch [25/30], Total Loss per epoch: 465.3122, Loss per sample: 93.0624
Epoch [30/30], Total Loss per epoch: 527.2458, Loss per sample: 105.4492
Prediction for [5.0, 6.0]: -8.483620290906742


The learning rate is too low for AdagradOptimizer. Lets increase it and try again.

In [17]:
model = use_micrograd.LinearRegression(num_features=2)
optimizer = AdagradOptimizer(model.parameters(), learning_rate=0.15)
use_micrograd.train(model, optimizer, X, y, epochs=30, batch_size=2)
# Make a prediction
test_features = [5.0, 6.0]
prediction = model.forward(test_features)
print(f"Prediction for {test_features}: {prediction.value}")

Epoch [5/30], Total Loss per epoch: 30.9045, Loss per sample: 6.1809
Epoch [10/30], Total Loss per epoch: 7.3630, Loss per sample: 1.4726
Epoch [15/30], Total Loss per epoch: 1.8242, Loss per sample: 0.3648
Epoch [20/30], Total Loss per epoch: 0.6512, Loss per sample: 0.1302
Epoch [25/30], Total Loss per epoch: 0.2103, Loss per sample: 0.0421
Epoch [30/30], Total Loss per epoch: 0.1572, Loss per sample: 0.0314
Prediction for [5.0, 6.0]: 10.69637069075595


## Linear regression with micrograd + Momentum optimizer
With momentum, the model converges faster than all previous optimizers.

In [18]:
model = use_micrograd.LinearRegression(num_features=2)
optimizer = MomentumOptimizer(model.parameters(), learning_rate=0.005, momentum=0.90)
use_micrograd.train(model, optimizer, X, y, epochs=30, batch_size=2)
# Make a prediction
test_features = [5.0, 6.0]
prediction = model.forward(test_features)
print(f"Prediction for {test_features}: {prediction.value}")

Epoch [5/30], Total Loss per epoch: 39.4127, Loss per sample: 7.8825
Epoch [10/30], Total Loss per epoch: 35.5727, Loss per sample: 7.1145
Epoch [15/30], Total Loss per epoch: 3.1929, Loss per sample: 0.6386
Epoch [20/30], Total Loss per epoch: 1.0540, Loss per sample: 0.2108
Epoch [25/30], Total Loss per epoch: 0.6013, Loss per sample: 0.1203
Epoch [30/30], Total Loss per epoch: 0.0250, Loss per sample: 0.0050
Prediction for [5.0, 6.0]: 10.828494179869537


## Linear regression with micrograd + Adam optimizer
Adam converges faster than Momentum optimizer.

In [23]:
model = use_micrograd.LinearRegression(num_features=2)
optimizer = AdamOptimizer(model.parameters(), learning_rate=0.05, beta1=0.9, beta2=0.999)
use_micrograd.train(model, optimizer, X, y, epochs=30, batch_size=2)
# Make a prediction
test_features = [5.0, 6.0]
prediction = model.forward(test_features)
print(f"Prediction for {test_features}: {prediction.value}")

Epoch [5/30], Total Loss per epoch: 2.5861, Loss per sample: 0.5172
Epoch [10/30], Total Loss per epoch: 0.3481, Loss per sample: 0.0696
Epoch [15/30], Total Loss per epoch: 0.0498, Loss per sample: 0.0100
Epoch [20/30], Total Loss per epoch: 0.0162, Loss per sample: 0.0032
Epoch [25/30], Total Loss per epoch: 0.0101, Loss per sample: 0.0020
Epoch [30/30], Total Loss per epoch: 0.0055, Loss per sample: 0.0011
Prediction for [5.0, 6.0]: 11.012254686097531
