This code uses the tinygrad library to build and train a variety of simple models on the MNIST dataset, which consists of 28x28 pixel images of handwritten digits (0-9). It's a bit more complicated than the previous one, and it has more components to discuss.

In [None]:
import unittest
import numpy as np
from tinygrad.state import get_parameters
from tinygrad.tensor import Tensor, Device
from tinygrad.nn import optim, BatchNorm2d
from extra.training import train, evaluate
from datasets import fetch_mnist


In addition to the previously mentioned tinygrad components, it also uses unittest for structuring tests, numpy for numerical computations, get_parameters function to fetch parameters from the model, and BatchNorm2d for batch normalization, a technique that helps neural networks train faster and more stably. The train and evaluate functions are imported from the extra.training module, and the fetch_mnist function is imported from the datasets module to load the MNIST dataset.

In [None]:
X_train, Y_train, X_test, Y_test = fetch_mnist()

This line fetches the MNIST data and divides it into training and test datasets.

In [None]:
class TinyBobNet:
  def __init__(self):
    self.l1 = Tensor.scaled_uniform(784, 128)
    self.l2 = Tensor.scaled_uniform(128, 10)

  def parameters(self):
    return get_parameters(self)

  def forward(self, x):
    return x.dot(self.l1).relu().dot(self.l2).log_softmax()

The TinyBobNet is similar to the one in the first code but has a new method, parameters, that uses the get_parameters function to collect the model's parameters, in this case, the tensors l1 and l2.

In [None]:
# create a model with a conv layer
class TinyConvNet:
  def __init__(self, has_batchnorm=False):
    # https://keras.io/examples/vision/mnist_convnet/
    conv = 3
    #inter_chan, out_chan = 32, 64
    inter_chan, out_chan = 8, 16   # for speed
    self.c1 = Tensor.scaled_uniform(inter_chan,1,conv,conv)
    self.c2 = Tensor.scaled_uniform(out_chan,inter_chan,conv,conv)
    self.l1 = Tensor.scaled_uniform(out_chan*5*5, 10)
    if has_batchnorm:
      self.bn1 = BatchNorm2d(inter_chan)
      self.bn2 = BatchNorm2d(out_chan)
    else:
      self.bn1, self.bn2 = lambda x: x, lambda x: x

This class, TinyConvNet, represents a simple Convolutional Neural Network, which is particularly useful for image processing tasks.

In the __init__ method, we're setting up the layers for this network:

conv is the size of the convolution filter, set to 3, which means we'll use 3x3 filters.
inter_chan and out_chan represent the number of channels or 'depth' of the output from the first and second convolutional layers, respectively.
self.c1 and self.c2 are the first and second convolutional layers. They are initialized with weights from a scaled uniform distribution.
self.l1 is a fully connected layer that comes after the convolutional layers, useful for classifying the features extracted by the convolutions.
If has_batchnorm is True, we create two batch normalization layers (self.bn1 and self.bn2), which can help improve the speed, performance, and stability of the network. If has_batchnorm is False, self.bn1 and self.bn2 will simply return their input unchanged (they're set as identity functions).

In [None]:
  def parameters(self):
    return get_parameters(self)

  def forward(self, x:Tensor):
    x = x.reshape(shape=(-1, 1, 28, 28)) # hacks
    x = self.bn1(x.conv2d(self.c1)).relu().max_pool2d()
    x = self.bn2(x.conv2d(self.c2)).relu().max_pool2d()
    x = x.reshape(shape=[x.shape[0], -1])
    return x.dot(self.l1).log_softmax()

These methods, parameters and forward, are crucial parts of our TinyConvNet class:

The parameters method returns a list of all the learnable parameters of the model, including the weights of the convolution and dense layers, and the parameters of the batch normalization layers (if they exist). It uses the get_parameters function from the tinygrad library, which retrieves these parameters from the instance (self).

The forward method is responsible for performing the actual computations of the model on the input data. The input tensor x passes through the following steps:

x is reshaped to have a 4D shape, typically corresponding to [Batch size, Channels, Height, Width].
The reshaped x is passed through the first convolutional layer c1, followed by the first batch normalization layer bn1 (or identity function if batch normalization is not used), a ReLU activation function, and a max pooling operation.
The output from step 2 is passed through the second convolutional layer c2, followed by the second batch normalization layer bn2, another ReLU activation function, and another max pooling operation.
The output is then reshaped again, flattened into a 2D tensor [Batch size, Features], preparing it for the fully connected layer.
The reshaped tensor is passed through the fully connected layer l1.
Finally, a log softmax function is applied, useful for classification tasks as it turns logits into probabilities.
The output of the forward method is the model's prediction given the input x.

In [None]:
class TestMNIST(unittest.TestCase):
  def test_sgd_onestep(self):
    np.random.seed(1337)
    model = TinyBobNet()
    optimizer = optim.SGD(model.parameters(), lr=0.001)
    train(model, X_train, Y_train, optimizer, BS=69, steps=1)
    for p in model.parameters(): p.realize()

  def test_sgd_threestep(self):
    np.random.seed(1337)
    model = TinyBobNet()
    optimizer = optim.SGD(model.parameters(), lr=0.001)
    train(model, X_train, Y_train, optimizer, BS=69, steps=3)

  def test_sgd_sixstep(self):
    np.random.seed(1337)
    model = TinyBobNet()
    optimizer = optim.SGD(model.parameters(), lr=0.001)
    train(model, X_train, Y_train, optimizer, BS=69, steps=6, noloss=True)

  def test_adam_onestep(self):
    np.random.seed(1337)
    model = TinyBobNet()
    optimizer = optim.Adam(model.parameters(), lr=0.001)
    train(model, X_train, Y_train, optimizer, BS=69, steps=1)
    for p in model.parameters(): p.realize()

  def test_adam_threestep(self):
    np.random.seed(1337)
    model = TinyBobNet()
    optimizer = optim.Adam(model.parameters(), lr=0.001)
    train(model, X_train, Y_train, optimizer, BS=69, steps=3)

  def test_conv_onestep(self):
    np.random.seed(1337)
    model = TinyConvNet()
    optimizer = optim.SGD(model.parameters(), lr=0.001)
    train(model, X_train, Y_train, optimizer, BS=69, steps=1, noloss=True)
    for p in model.parameters(): p.realize()

  def test_conv(self):
    np.random.seed(1337)
    model = TinyConvNet()
    optimizer = optim.Adam(model.parameters(), lr=0.001)
    train(model, X_train, Y_train, optimizer, steps=100)
    assert evaluate(model, X_test, Y_test) > 0.94   # torch gets 0.9415 sometimes

  def test_conv_with_bn(self):
    np.random.seed(1337)
    model = TinyConvNet(has_batchnorm=True)
    optimizer = optim.AdamW(model.parameters(), lr=0.003)
    train(model, X_train, Y_train, optimizer, steps=200)
    assert evaluate(model, X_test, Y_test) > 0.94

  def test_sgd(self):
    np.random.seed(1337)
    model = TinyBobNet()
    optimizer = optim.SGD(model.parameters(), lr=0.001)
    train(model, X_train, Y_train, optimizer, steps=600)
    assert evaluate(model, X_test, Y_test) > 0.94   # CPU gets 0.9494 sometimes

The TestMNIST class is a group of tests to validate the performance of the TinyBobNet and TinyConvNet models on the MNIST dataset. It uses the unittest framework in Python, which helps in automating and organizing tests for software. Each method in this class is an individual test case.

test_sgd_onestep, test_sgd_threestep, and test_sgd_sixstep: These methods train the TinyBobNet model using the Stochastic Gradient Descent (SGD) optimizer, with one, three, and six training steps respectively. The steps parameter defines how many times the model's weights will be updated during the training process.

test_adam_onestep and test_adam_threestep: These methods are similar to the previous ones but use the Adam optimizer instead of SGD. Adam is another type of optimization algorithm that can sometimes yield better results than SGD.

test_conv_onestep: This method tests the TinyConvNet model using the SGD optimizer with just one training step.

test_conv: This method tests the TinyConvNet model with the Adam optimizer, using a larger number of training steps (100). After training, it checks if the model's performance (accuracy) on the test data is above 0.94 (94%).

test_conv_with_bn: This method is similar to test_conv, but it uses a version of TinyConvNet with batch normalization enabled. It also uses a different optimizer, AdamW, with a larger learning rate and more training steps.

test_sgd: This method tests the TinyBobNet model with the SGD optimizer, using an even larger number of training steps (600). After training, it checks if the model's performance on the test data is above 0.94 (94%).

The phrase np.random.seed(1337) is repeated in every test. It ensures that the random numbers generated during each test are always the same, which makes the tests repeatable: running the test multiple times will always produce the same result.

In [None]:
if __name__ == '__main__':
  unittest.main()

This line of code is often used when scripting in Python, and it serves a dual purpose: It can allow or prevent parts of code from being run when the modules are imported.

The __name__ variable in Python represents the name of the current module (the file being run). When you run a Python script directly (like python my_script.py), the __name__ variable is set to __main__.

On the other hand, if you import your Python file as a module into another script (like import my_script), the __name__ variable is then set to the name of that script/module (my_script).

In this context, if __name__ == '__main__': is used to check if this script is being run directly. If it is, it runs the code within this if-statement, which is unittest.main().

The unittest.main() function is used to run all the test methods from the test class TestMNIST. It goes through all methods that start with 'test' in the class and executes them one by one.

So, in summary, if __name__ == '__main__': unittest.main() is saying: if this script is the main script being run, then execute all the tests defined in the script.