# MemTorch Tutorial
## Introduction
In this tutorial, you will learn how to use MemTorch to convert Deep Neural Networks (DNNs) to Memristive Deep Neural Networks (MDNNs), and how to simulate non-ideal device characteristics and key peripheral circuitry. MemTorch is a Simulation Framework for Memristive Deep Learning Systems which integrates directly with the well-known PyTorch Machine Learning (ML) library, which is presented in *MemTorch: An Open-source Simulation Framework for Memristive Deep Learning Systems*, which has been released [here](https://arxiv.org/abs/2004.10971).

![Overview](https://raw.githubusercontent.com/coreylammie/MemTorch/master/overview.svg)


## 1. Installation
To install MemTorch from source:

```
git clone --recursive https://github.com/coreylammie/MemTorch
cd MemTorch
python setup.py install
```

*If CUDA is True in setup.py, CUDA Toolkit 10.1 and Microsoft Visual C++ Build Tools are required. If CUDA is False in setup.py, Microsoft Visual C++ Build Tools are required.*

Alternatively, MemTorch can be installed using the *pip* package-management system:

```
pip install memtorch-cpu # Supports normal operation
pip install memtorch # Supports CUDA and normal operation
```

A complete API is avaliable [here](https://memtorch.readthedocs.io/).

## 2. Training and Benchmarking a Deep Neural Network Using CIFAR-10

MemTorch can be used to simulate the inference routines of MDNNs. Consequently, prior to conversion, DNNs must be either defined and trained using PyTorch or imported using PyTorch. 

In this tutorial, the VGG-16 DNN architecture is trained and benchmarked using the CIFAR-10 image classification data set. 
* The CIFAR-10 data set consists of 60,000 32x32 color images in 10 classes, with 6,000 images per class. There are 50,000 training images and 10,000 test images. 
* In the cell below, the VG-16 DNN is trained for 50 epochs with a batch size, $\Im=256$. The initial learning rate is $\eta = 1e-2$, which is decayed by an order of magnitude every 20 training epochs. 
* Adam is used to optimize network parameters and Cross Entropy (CE) is used to determine network losses. 
* `memtorch.utils.LoadCIFAR10` is used to load the CIFAR-10 training and test sets. After each epoch the model is bench-marked using the CIFAR-10 test set. 
* The model that achieves the highest test set accuracy is saved as *trained_model.pt*.

In [None]:
import torch
from torch.autograd import Variable
import memtorch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from memtorch.utils import LoadCIFAR10
import numpy as np


class Net(nn.Module):
    def __init__(self, inflation_ratio=1):
        super(Net, self).__init__()
        self.conv0 = nn.Conv2d(in_channels=3, out_channels=128*inflation_ratio, kernel_size=3, stride=1, padding=1)
        self.bn0 = nn.BatchNorm2d(num_features=128*inflation_ratio, affline=False)
        self.act0 = nn.ReLU()
        self.conv1 = nn.Conv2d(in_channels=128*inflation_ratio, out_channels=128*inflation_ratio, kernel_size=3, padding=1)
        self.bn1 = nn.BatchNorm2d(num_features=128*inflation_ratio, affline=False)
        self.act1 = nn.ReLU()
        self.conv2 = nn.Conv2d(in_channels=128*inflation_ratio, out_channels=256*inflation_ratio, kernel_size=3, padding=1)
        self.bn2 = nn.BatchNorm2d(num_features=256*inflation_ratio, affline=False)
        self.act2 = nn.ReLU()
        self.conv3 = nn.Conv2d(in_channels=256*inflation_ratio, out_channels=256*inflation_ratio, kernel_size=3, padding=1)
        self.bn3 = nn.BatchNorm2d(num_features=256*inflation_ratio, affline=False)
        self.act3 = nn.ReLU()
        self.conv4 = nn.Conv2d(in_channels=256*inflation_ratio, out_channels=512*inflation_ratio, kernel_size=3, padding=1)
        self.bn4 = nn.BatchNorm2d(num_features=512*inflation_ratio, affline=False)
        self.act4 = nn.ReLU()
        self.conv5 = nn.Conv2d(in_channels=512*inflation_ratio, out_channels=512, kernel_size=3, padding=1)
        self.bn5 = nn.BatchNorm2d(num_features=512, affline=False)
        self.act5 = nn.ReLU()
        self.fc6 = nn.Linear(in_features=512*4*4, out_features=1024)
        self.bn6 = nn.BatchNorm1d(num_features=1024, affline=False)
        self.act6 = nn.ReLU()
        self.fc7 = nn.Linear(in_features=1024, out_features=1024)
        self.bn7 = nn.BatchNorm1d(num_features=1024, affline=False)
        self.act7 = nn.ReLU()
        self.fc8 = nn.Linear(in_features=1024, out_features=10)

    def forward(self, input):
        x = self.act0(self.bn0(self.conv0(input)))
        x = self.act1(self.bn1(F.max_pool2d(self.conv1(x), 2)))
        x = self.act2(self.bn2(self.conv2(x)))
        x = self.act3(self.bn3(F.max_pool2d(self.conv3(x), 2)))
        x = self.act4(self.bn4(self.conv4(x)))
        x = self.act5(self.bn5(F.max_pool2d(self.conv5(x), 2)))
        x = x.view(x.size(0), -1)
        x = self.act6(self.bn6(self.fc6(x)))
        x = self.act7(self.bn7(self.fc7(x)))
        return self.fc8(x)

def test(model, test_loader):
    correct = 0
    for batch_idx, (data, target) in enumerate(test_loader):        
        output = model(data.to(device))
        pred = output.data.max(1)[1]
        correct += pred.eq(target.to(device).data.view_as(pred)).cpu().sum()

    return 100. * float(correct) / float(len(test_loader.dataset))

device = torch.device('cpu' if 'cpu' in memtorch.__version__ else 'cuda')
epochs = 50
train_loader, validation_loader, test_loader = LoadCIFAR10(batch_size=256, validation=False)
model = Net().to(device)
criterion = nn.CrossEntropyLoss()
learning_rate = 1e-2
optimizer = optim.Adam(model.parameters(), lr=learning_rate)
best_accuracy = 0
for epoch in range(0, epochs):
    print('Epoch: [%d]\t\t' % (epoch + 1), end='')
    if epoch % 20 == 0:
        learning_rate = learning_rate * 0.1
        for param_group in optimizer.param_groups:
            param_group['lr'] = learning_rate

    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        optimizer.zero_grad()
        output = model(data.to(device))
        loss = criterion(output, target.to(device))
        loss.backward()
        optimizer.step()

    accuracy = test(model, test_loader)
    print('%2.2f%%' % accuracy)
    if accuracy > best_accuracy:
        torch.save(model.state_dict(), 'trained_model.pt')
        best_accuracy = accuracy

## 3. Conversion of a Deep Neural Network to a Memristive Deep Neural Network 

Within MemTorch, `memtorch.mn.Module.patch_model` can be used to convert a DNN to a MDNN. Prior to conversion, a memristive device model must be defined and characterized in part (prior to the introduction of other non-ideal device characteristics).

In the cell below:
* A reference memristor from `memtorch.bh.memristor` is defined.
* Optional reference memristor keyword arguments are set.
* A `memtorch.bh.memristor.Memristor` object is instantiated
* The hysteresis loop of the instantiated memristor object is generated/plotted.
* The bipolar switching behaviour of the instantiated memristor object is generated/plotted.

In [None]:
reference_memristor = memtorch.bh.memristor.VTEAM
reference_memristor_params = {'time_series_resolution': 1e-10}
memristor = reference_memristor(**reference_memristor_params)
memristor.plot_hysteresis_loop()
memristor.plot_bipolar_switching_behaviour()

In the cell below, the trained DNN from Section 2 is converted to an equivalent MDNN, where all linear layers are replaced with memristive-equivalent layers. Specifically:
* `memtorch.bh.map.Parameter.naive_map` is used to convert the weights within all `torch.nn.Linear` layers to equivalent conductance values, to be programmed to the two memristive devices used to represent each weight (positive and negative, respectively). 
* `tile_shape` is set to (128, 128), so that modular crossbar tiles of size 128x128 are used to represent weights. 
* `ADC_resolution` is set to 8 to set the bit width of all emulated Analogue to Digital Converters (ADC).
* `ADC_overflow` is used to set the initial overflow rate of each ADC. 
* `quant_method` is used to set the quantization method used (linear, by default).
* `transistor` is set to `True`, so a 1T1R arrangement is simulated. 
* `programming_routine` is set to `None` to skip device-level simulation of the programming routine. 



We note if `transistor` is `False` `programming_routine` must not be `None`. In which case, device-level simulation is performed for each device using `memtorch.bh.crossbar.gen_programming_signal` and `memtorch.bh.memristor.Memristor.simulate`, which use finite differences to model internal device dynamics. As `scheme` is not defined, a double-column parameter representation scheme is adopted. Finally, `max_input_voltage` is 0.3, so inputs to each layer are encoded between -0.3V and +0.3V.

In [None]:
import copy
from memtorch.mn.Module import patch_model
from memtorch.map.Parameter import naive_map
from memtorch.bh.crossbar.Program import naive_program


model = Net().to(device)
model.load_state_dict(torch.load('trained_model.pt'), strict=False)
patched_model = patch_model(copy.deepcopy(model),
                          memristor_model=reference_memristor,
                          memristor_model_params=reference_memristor_params,
                          module_parameters_to_patch=[torch.nn.Linear],
                          mapping_routine=naive_map,
                          transistor=True,
                          programming_routine=None,
                          tile_shape=(128, 128),
                          max_input_voltage=0.3,
                          ADC_resolution=8,
                          ADC_overflow_rate=0.,
                          quant_method='linear')

In the cell below, all patched `torch.nn.Linear` layers are tuned using linear regression. A randomly generated tensor of size (8, `self.in_features`) is propagated through each memristive layer and each legacy layer (accessible using `layer.forward_legacy`). `sklearn.linear_model.LinearRegression` is used to determine the coefficient and intercept between the linear relationship of each set of outputs, which is used to define the `transform_output` lamdba function, that maps the output of each layer to their equivalent representations.

In [None]:
patched_model.tune_()

Finally, in the cell below, the converted and tuned MDNN is benchmarked using the CIFAR-10 data set.

In [None]:
print(test(patched_model, test_loader))

## 4. Modeling Non-Ideal Device Characteristics


Non-ideal device characteristics can either be encapsulated within device specific memristive models, or introduced to base (generic) models after conversion, using `memtorch.bh.nonideality.NonIdeality.apply_nonidealities`. Currently, the following non-ideal device characteristics are supported:
* `memtorch.bh.nonideality.DeviceFaults`
* `memtorch.bh.nonideality.Endurance` and `memtorch.bh.nonideality.Retention`
* `memtorch.bh.nonideality.FiniteConductanceStates`
* `memtorch.bh.nonideality.NonLinear`

Stochastic parameters, used to model process variances, can be defined using `memtorch.bh.StochaticParameter`. The introduction of each type of non ideal device characteristic is demonstrated below.


### 4.1 Modeling Device Faults

Memristive devices are susceptible to failure, by either failing to eletroform at a pristine state, or becoming stuck at high or low resistance states. MemTorch incorporates a specific function for accounting for device failure, `memtorch.bh.nonideality.DeviceFaults`. 

In the cell below:
* The original patched model is copied using `copy.deepcopy`.
* `lrs_proportion` is set to 0.25, so that 25% of devices are assumed to fail to a low resistance state.
* `hrs_proportion` is set to 0.10, so that 15% of devices are assumed to fail to a high resistance state.

It is assumed that the total proportion of devices set to a high resistance state is equal to the proportion of devices that fail to eletroform at pristine states plus the proportion of devices stuck at a high resistance state.



In [None]:
from memtorch.bh.nonideality.NonIdeality import apply_nonidealities

patched_model_ = apply_nonidealities(copy.deepcopy(patched_model),
                                  non_idealities=[memtorch.bh.nonideality.NonIdeality.DeviceFaults],
                                  lrs_proportion=0.25,
                                  hrs_proportion=0.10,
                                  electroform_proportion=0)

### 4.2 Modeling Device Endurance and Retention

Memristive devices possess non-ideal endurance and retention properties, which should be accounted for. MemTorch incorporates specific functions for accounting for device endurance and retention characteristics, `memtorch.bh.nonideality.Endurance`, and `memtorch.bh.nonideality.Retention`, respectively.

All endurance and retention models are defined in `memtorch.bh.nonideality.endurance_retention_models`.

In the cell below:
* The original patched model is copied using `copy.deepcopy`.
* `x`, the number of SET-RESET cycles is set to be equal to 10,000.
* Endurance characteristics are accounted for using `memtorch.bh.nonideality.NonIdeality.Endurance` and `memtorch.bh.nonideality.endurance_retention_models.model_endurance_retention`.
* `operation_mode` within `endurance_model_kwargs` is set to `sudden`, so that sudden failure is modeled, and various other model arguments are set.


In [None]:
from memtorch.bh.nonideality.NonIdeality import apply_nonidealities

patched_model_ = apply_nonidealities(copy.deepcopy(patched_model),
                                  non_idealities=[memtorch.bh.nonideality.NonIdeality.Endurance],
                                  x=1e4,
                                  endurance_model=memtorch.bh.nonideality.endurance_retention_models.model_endurance_retention,
                                  endurance_model_kwargs={
                                        "operation_mode": memtorch.bh.nonideality.endurance_retention_models.OperationMode.sudden,
                                        "p_lrs": [1, 0, 0, 0],
                                        "stable_resistance_lrs": 100,
                                        "p_hrs": [1, 0, 0, 0],
                                        "stable_resistance_hrs": 1000,
                                        "cell_size": 10,
                                        "temperature": 350,
                                  })

In the cell below:
* The original patched model is copied using `copy.deepcopy`.
* `time`, the retention time, is set to be equal to 1,000s.
* Retention characteristics are accounted for using `memtorch.bh.nonideality.NonIdeality.Retention` and `memtorch.bh.nonideality.endurance_retention_models.model_conductance_drift`.
* `initial_time` within `retention_model_kwargs`, the initial time, is set to be equal to 1s.
* `drift_coefficient` within `retention_model_kwargs` is set to be equal to 0.1.

In [None]:
from memtorch.bh.nonideality.NonIdeality import apply_nonidealities

patched_model_ = apply_nonidealities(copy.deepcopy(patched_model),
                                  non_idealities=[memtorch.bh.nonideality.NonIdeality.Retention],
                                  time=1e3,
                                  retention_model=memtorch.bh.nonideality.endurance_retention_models.model_conductance_drift,
                                  retention_model_kwargs={
                                        "initial_time": 1,
                                        "drift_coefficient": 0.1,
                                  })

### 4.3 Modeling a Finite Number of Conductance States

Realistic memristive devices are non-ideal and have a finite number of stable discrete electrically switchable conductance states, bounded by a low conductance semiconducting state, and a high-conductance metallic state. MemTorch incorporates a specific function for accounting for devices with a finite number of conductance states, `memtorch.bh.nonideality.FiniteConductanceStates`. 

In the cell below:
* The original patched model is copied using `copy.deepcopy`.
* A finite number of conductance states are accounted for using `memtorch.bh.nonideality.NonIdeality.FiniteConductanceStates`.
* `conductance_states` is set to be equal to 5, to model 5 evenly-distributed conductance states.

In [None]:
from memtorch.bh.nonideality.NonIdeality import apply_nonidealities

patched_model_ = apply_nonidealities(copy.deepcopy(patched_model),
                                  non_idealities=[memtorch.bh.nonideality.NonIdeality.FiniteConductanceStates],
                                  conductance_states=5)            

### 4.4 Modeling Non-Linear Device Characteristics

Non-ideal memristive devices have non-linear I/V device characteristics, especially at high voltages, which are difficult to accurately and efficiently model. The `memtorch.bh.nonideality.NonLinear.apply_non_linear` function can be used to efficiently model non-linear device I/V characteristics during inference for devices with an infinite number of discrete conductance states, and for devices with a finite number of conductance states. 

For cases where devices are not simulated using their internal dynamics, it is assumed that the change in conductance during read cycles is negligible.

Within MemTorch, `memtorch.bh.nonideality.NonLinear.apply_non_linear` uses two methods to effectively model non-linear device I/V characteristics:

1. During inference, each device is simulated for timesteps of duration `device.time_series_resolution` using `device.simulate`.
2. Post weight mapping and programming, the I/V characteristics of each device are determined using a single reset voltage sweep.

In the cell below:
* The original patched model is copied using `copy.deepcopy`.
* Non-linear device characteristics are accounted for using `memtorch.bh.nonideality.NonLinear`.
* `simulate` is set to be equal to `True`, so during inference each device is simulated.




In [None]:
from memtorch.bh.nonideality.NonIdeality import apply_nonidealities

patched_model_ = apply_nonidealities(copy.deepcopy(patched_model),
                                  non_idealities=[memtorch.bh.nonideality.NonIdeality.NonLinear],
                                  simulate=True)

In the cell below:
* The original patched model is copied using `copy.deepcopy`.
* Non-linear device characteristics are accounted for using `memtorch.bh.nonideality.NonLinear`.
* `simulate` is not set, so the I/V characteristics of each device are determined using a single reset voltage sweep.
* `sweep_duration` is set to be equal to 2s.
* `sweep_voltage_signal_amplitude` is set to be equal to 1V.
* `sweep_voltage_signal_frequency` is set to be equal to 0.5Hz.


In [None]:
from memtorch.bh.nonideality.NonIdeality import apply_nonidealities

patched_model_ = apply_nonidealities(copy.deepcopy(patched_model),
                                  non_idealities=[memtorch.bh.nonideality.NonIdeality.NonLinear],
                                  sweep_duration=2,
                                  sweep_voltage_signal_amplitude=1,
                                  sweep_voltage_signal_frequency=0.5)

### 4.5 Nodeling Stochastic Parameters

MemTorch supports the usage of stochastic parameters for higher flexibility to simply account for process variances using `memtorch.bh.StochasticParameter.StochasticParameter`. Stochastic parameters can be used when defining device characteristics. 

In the cell below:
* A memristor object is characterised using stochastic parameters defining low and high resistance states.
* The memristor object is instantiated, and the hysteresis loop and bipolar switching behaviour of the instantiated memristor object is generated/plotted.

Each time the memristor object is instantiated, stochastic parameters will be resampled.


In [None]:
import memtorch

reference_memristor = memtorch.bh.memristor.VTEAM
reference_memristor_params = {'time_series_resolution': 1e-10, 
                              'r_off': memtorch.bh.StochasticParameter(loc=1000, scale=200, min=2),
                              'r_on': memtorch.bh.StochasticParameter(loc=5000, scale=sigma, min=1)}

memristor = reference_memristor(**reference_memristor_params)
memristor.plot_hysteresis_loop()
memristor.plot_bipolar_switching_behaviour()