# LeNet5 Analog Training with Tiki Taka Optimizer Example
 Training the LeNet5 neural network with Tiki Taka analog optimizer on MNIST dataset, simulated on the the analog resistive random-access memory with soft bounds (ReRam) device.

<a href="https://colab.research.google.com/github/IBM/aihwkit/blob/master/notebooks/examples/analog_training_LeNet5_TT.ipynb" target="_parent">
    <img src="https://colab.research.google.com/assets/colab-badge.svg"/>
</a>

# IBM Analog Hardware Acceleration Kit

IBM Analog Hardware Acceleration Kit (AIHWKIT) is an open source Python toolkit for exploring and using the capabilities of in-memory computing devices in the context of artificial intelligence.
The pytorch integration consists of a series of primitives and features that allow using the toolkit within PyTorch. 
The github repository can be found at: https://github.com/IBM/aihwkit

There are two possible scenarios for using Analog AI, one where the Analog accelerator targets training of DNN and one where the Analog accelerator aims at accelerating the inference of DNN.
Employing Analog accelerator for training scenario requires innovation on the algorithm used for during the backpropagation (BP) algorithm which we will explore in this notebook.
Employing Analog accelerator for inference scenarion allow the use of a digital accelerator for the training part and then transfer the weights to the analog hardware for the inference, which we will explore in hardware aware training notebook.

## Training with Analog AI

Hardware architecture based on resistive cross-point arrays can provide significant improvement in performance, both in terms of speed and power performance. This new hardware architecture use existing technique such as stochastic gradient descent (SGD) and backpropagation (BP) algorithm to train the neural network. However the training accuracy is affected by non idealities of the device used in the cross-point array making necessary innovation also at the algorithm level.

IBM is developing new training algorithm which can alleviate the non-idealities of these devices achieving high network accuracy. In this notebook we will explore the Tiki-Taka algorithm which eliminates the stringent symmetry requirement for increase and decrease of device conductance. 
SGD and Tiki-Taka both use the error backpropagation. Still, they process the gradient information very differently and hence are fundamentally very different algorithms. Tiki-Taka replaces each weight matrix W of SGD with two matrices, referred to as matrix A and C, and creates a coupled dynamical system by exchanging information between the two. We showed that in the Tiki-Taka dynamics, the non-symmetric behavior is a valuable and needed property of the device; therefore, it is ideal for many non-symmetric device technologies.

<center><img src="imgs/tt.png" style="width:50%; height:50%"/></center> 

More details on the Tiki-Taka can be found at: 

https://www.frontiersin.org/articles/10.3389/fnins.2020.00103/full

https://www.frontiersin.org/articles/10.3389/frai.2021.699148/full

In this notebook we will usse the AIHWKIT to train a LeNet5 inspired analog network, using the Tiki-Taka algorithm.
The network will be trained using the MNIST dataset, a collection of images representing the digits 0 to 9.

The first thing to do is to install the AIHKIT and dependencies in your environment. The preferred way to install this package is by using the Python package index (please uncomment this line to install in your environment if not previously installed):

In [21]:
# To install the cpu-only enabled kit, uncommend the line below
#pip install aihwkit

# To install the gpu enabled wheel, use the commands below

!wget https://aihwkit-gpu-demo.s3.us-east.cloud-object-storage.appdomain.cloud/aihwkit-0.4.5-cp37-cp37m-manylinux2014_x86_64.whl
!pip install aihwkit-0.4.5-cp37-cp37m-manylinux2014_x86_64.whl

If the library was installed correctly, you can use the following snippet for creating an analog layer and predicting the output:

In [22]:
from torch import Tensor
from aihwkit.nn import AnalogLinear

model = AnalogLinear(2, 2)
model(Tensor([[0.1, 0.2], [0.3, 0.4]]))

tensor([[0.3765, 0.3294],
        [0.3294, 0.2824]], grad_fn=<AnalogFunctionBackward>)

Now that the package is installed and running, we can start working on creating the LeNet5 network.

AIHWKIT offers different Analog layers that can be used to build a network, including AnalogLinear and AnalogConv2d which will be the main layers used to build the present network. 
In addition to the standard input that are expected by the PyTorch layers (in_channels, out_channels, etc.) the analog layers also expect a rpu_config input which defines various settings of the RPU tile. Through the rpu_config parameter the user can specify many of the hardware specs such as: device used in the cross-point array, bit used by the ADC/DAC converters, noise values and many other. Additional details on the RPU configuration can be found at https://aihwkit.readthedocs.io/en/latest/using_simulator.html#rpu-configurations
For this particular case we will use two device per cross-point which will effectively allow us to enable the weight transfer needed to implement the Tiki-Taka algorithm.

In [23]:
def create_rpu_config():

    from aihwkit.simulator.presets import TikiTakaReRamSBPreset

    rpu_config = TikiTakaReRamSBPreset()

    return rpu_config

We can now use this rpu_config as input of the network model:

In [24]:
from torch.nn import Tanh, MaxPool2d, LogSoftmax, Flatten
from aihwkit.nn import AnalogConv2d, AnalogLinear, AnalogSequential

def create_analog_network(rpu_config):
    
    channel = [16, 32, 512, 128]
    model = AnalogSequential(
        AnalogConv2d(in_channels=1, out_channels=channel[0], kernel_size=5, stride=1,
                        rpu_config=rpu_config),
        Tanh(),
        MaxPool2d(kernel_size=2),
        AnalogConv2d(in_channels=channel[0], out_channels=channel[1], kernel_size=5, stride=1,
                        rpu_config=rpu_config),
        Tanh(),
        MaxPool2d(kernel_size=2),
        Tanh(),
        Flatten(),
        AnalogLinear(in_features=channel[2], out_features=channel[3], rpu_config=rpu_config),
        Tanh(),
        AnalogLinear(in_features=channel[3], out_features=10, rpu_config=rpu_config),
        LogSoftmax(dim=1)
    )

    return model

We will use the cross entropy to calculate the loss and the Stochastic Gradient Descent (SGD) as optimizer:

In [25]:
from torch.nn import CrossEntropyLoss

criterion = CrossEntropyLoss()


from aihwkit.optim import AnalogSGD

def create_analog_optimizer(model):
    """Create the analog-aware optimizer.

    Args:
        model (nn.Module): model to be trained

    Returns:
        Optimizer: created analog optimizer
    """
    
    optimizer = AnalogSGD(model.parameters(), lr=0.01) # we will use a learning rate of 0.01 as in the paper
    optimizer.regroup_param_groups(model)

    return optimizer

We can now write the train function which will optimize the network over the MNIST train dataset. The train_step function will take as input the images to train on, the model to train and the criterion and optimizer to train with:

In [26]:
from torch import device, cuda

DEVICE = device('cuda' if cuda.is_available() else 'cpu')
print('Running the simulation on: ', DEVICE)

def train_step(train_data, model, criterion, optimizer):
    """Train network.

    Args:
        train_data (DataLoader): Validation set to perform the evaluation
        model (nn.Module): Trained model to be evaluated
        criterion (nn.CrossEntropyLoss): criterion to compute loss
        optimizer (Optimizer): analog model optimizer

    Returns:
        train_dataset_loss: epoch loss of the train dataset
    """
    total_loss = 0

    model.train()

    for images, labels in train_data:
        images = images.to(DEVICE)
        labels = labels.to(DEVICE)
        optimizer.zero_grad()

        # Add training Tensor to the model (input).
        output = model(images)
        loss = criterion(output, labels)

        # Run training (backward propagation).
        loss.backward()

        # Optimize weights.
        optimizer.step()
        total_loss += loss.item() * images.size(0)
    train_dataset_loss = total_loss / len(train_data.dataset)

    return train_dataset_loss

Running the simulation on:  cuda


Since training can be quite time consuming it is nice to see the evolution of the training process by testing the model capabilities on a set of images that it has not seen before (test dataset). So we write a test_step function:

In [27]:
def test_step(validation_data, model, criterion):
    """Test trained network

    Args:
        validation_data (DataLoader): Validation set to perform the evaluation
        model (nn.Module): Trained model to be evaluated
        criterion (nn.CrossEntropyLoss): criterion to compute loss

    Returns: 
        test_dataset_loss: epoch loss of the train_dataset
        test_dataset_error: error of the test dataset
        test_dataset_accuracy: accuracy of the test dataset
    """
    total_loss = 0
    predicted_ok = 0
    total_images = 0

    model.eval()

    for images, labels in validation_data:
        images = images.to(DEVICE)
        labels = labels.to(DEVICE)

        pred = model(images)
        loss = criterion(pred, labels)
        total_loss += loss.item() * images.size(0)

        _, predicted = torch.max(pred.data, 1)
        total_images += labels.size(0)
        predicted_ok += (predicted == labels).sum().item()
        test_dataset_accuracy = predicted_ok/total_images*100
        test_dataset_error = (1-predicted_ok/total_images)*100

    test_dataset_loss = total_loss / len(validation_data.dataset)

    return test_dataset_loss, test_dataset_error, test_dataset_accuracy

To reach satisfactory accuracy levels, the train_step will have to be repeated mulitple time so we will implement a loop over a certain number of epochs:

In [28]:
def training_loop(model, criterion, optimizer, train_data, validation_data, epochs=15, print_every=1):
    """Training loop.

    Args:
        model (nn.Module): Trained model to be evaluated
        criterion (nn.CrossEntropyLoss): criterion to compute loss
        optimizer (Optimizer): analog model optimizer
        train_data (DataLoader): Validation set to perform the evaluation
        validation_data (DataLoader): Validation set to perform the evaluation
        epochs (int): global parameter to define epochs number
        print_every (int): defines how many times to print training progress

    """
    train_losses = []
    valid_losses = []
    test_error = []

    # Train model
    for epoch in range(0, epochs):
        # Train_step
        train_loss = train_step(train_data, model, criterion, optimizer)
        train_losses.append(train_loss)

        if epoch % print_every == (print_every - 1):
            # Validate_step
            with torch.no_grad():
                valid_loss, error, accuracy = test_step(validation_data, model, criterion)
                valid_losses.append(valid_loss)
                test_error.append(error)

            print(f'Epoch: {epoch}\t'
                  f'Train loss: {train_loss:.4f}\t'
                  f'Valid loss: {valid_loss:.4f}\t'
                  f'Test error: {error:.2f}%\t'
                  f'Test accuracy: {accuracy:.2f}%\t')

We will now download the MNIST dataset and prepare the images for the training and test:

In [29]:
import os
from torchvision import datasets, transforms
PATH_DATASET = os.path.join('data', 'DATASET')
os.makedirs(PATH_DATASET, exist_ok=True)

def load_images():
    """Load images for train from torchvision datasets."""

    transform = transforms.Compose([transforms.ToTensor()])
    train_set = datasets.MNIST(PATH_DATASET, download=True, train=True, transform=transform)
    test_set = datasets.MNIST(PATH_DATASET, download=True, train=False, transform=transform)
    train_data = torch.utils.data.DataLoader(train_set, batch_size=8, shuffle=True)
    test_data = torch.utils.data.DataLoader(test_set, batch_size=8, shuffle=False)

    return train_data, test_data

Put together all the code above to train

In [30]:
import torch

torch.manual_seed(1)

#load the dataset
train_data, test_data = load_images()

#create the rpu_config
rpu_config = create_rpu_config()

#create the model
model = create_analog_network(rpu_config).to(DEVICE)

#define the analog optimizer
optimizer = create_analog_optimizer(model)

training_loop(model, criterion, optimizer, train_data, test_data)

Epoch: 0	Train loss: 2.6727	Valid loss: 2.6809	Test error: 92.98%	Test accuracy: 7.02%	
Epoch: 1	Train loss: 2.7214	Valid loss: 2.4955	Test error: 84.50%	Test accuracy: 15.50%	
Epoch: 2	Train loss: 2.7623	Valid loss: 2.6031	Test error: 89.41%	Test accuracy: 10.59%	
Epoch: 3	Train loss: 2.7824	Valid loss: 2.9965	Test error: 89.66%	Test accuracy: 10.34%	
Epoch: 4	Train loss: 2.8161	Valid loss: 2.7675	Test error: 85.29%	Test accuracy: 14.71%	
Epoch: 5	Train loss: 2.7201	Valid loss: 2.6821	Test error: 85.80%	Test accuracy: 14.20%	
Epoch: 6	Train loss: 2.5905	Valid loss: 2.5884	Test error: 83.71%	Test accuracy: 16.29%	
Epoch: 7	Train loss: 2.3812	Valid loss: 1.9338	Test error: 66.76%	Test accuracy: 33.24%	
Epoch: 8	Train loss: 2.0749	Valid loss: 1.7923	Test error: 62.63%	Test accuracy: 37.37%	
Epoch: 9	Train loss: 1.7222	Valid loss: 1.4595	Test error: 50.33%	Test accuracy: 49.67%	
Epoch: 10	Train loss: 1.4847	Valid loss: 1.1521	Test error: 40.21%	Test accuracy: 59.79%	
Epoch: 11	Train loss: