# Use nn-Meter Benchmark Dataset
nn-Meter collects and generates 26k CNN models. The dataset is released and an interface of `nn_meter.dataset` is provided for users to get access to the dataset. In this notebook, we showed how to use nn-Meter benchmark dataset for nn-Meter latency prediction, and, as a extension, for GNN latency prediction.


In [1]:
import os
from nn_meter.dataset import bench_dataset

datasets = bench_dataset()
for data in datasets:
    print(f"Model group: {os.path.basename(data)}")

Model group: alexnets.jsonl
Model group: densenets.jsonl
Model group: googlenets.jsonl
Model group: mnasnets.jsonl
Model group: mobilenetv1s.jsonl
Model group: mobilenetv2s.jsonl
Model group: mobilenetv3s.jsonl
Model group: nasbench201s.jsonl
Model group: proxylessnass.jsonl
Model group: resnets.jsonl
Model group: shufflenetv2s.jsonl
Model group: squeezenets.jsonl
Model group: vggs.jsonl


There are 13 groups of models in the benchmark dataset. In each groups, about 2000 model with different parameters were sampled.

Dataset schema: for each model, the dataset stores its: 
- model id
- graph in nn-meter IR graph format 
- latency numbers on four devices

Here we export some information of one model to show the schema of the dataset.

In [2]:
import jsonlines
test_data = datasets[0]
with jsonlines.open(test_data) as data_reader:
    True_lat = []
    Pred_lat = []
    for i, item in enumerate(data_reader):
        print('dict keys:',list(item.keys()))
        print('model id',item['id'])
        print('cpu latency: ',item['cortexA76cpu_tflite21'])
        print('adreno640gpu latency: ',item['adreno640gpu_tflite21'])
        print('adreno630gpu latency: ',item['adreno630gpu_tflite21'])
        print('intelvpu latency: ',item['myriadvpu_openvino2019r2'])
        print('model graph is stored in nn-meter IR (shows only one node here):',\
            item['graph']['conv1.conv/Conv2D'])
        break

dict keys: ['id', 'cortexA76cpu_tflite21', 'adreno640gpu_tflite21', 'adreno630gpu_tflite21', 'myriadvpu_openvino2019r2', 'graph']
model id alexnet_1356
cpu latency:  148.164
adreno640gpu latency:  24.4851
adreno630gpu latency:  31.932404999999996
intelvpu latency:  15.486
model graph is stored in nn-meter IR (shows only one node here): {'inbounds': ['input_im_0'], 'attr': {'name': 'conv1.conv/Conv2D', 'type': 'Conv2D', 'output_shape': [[1, 56, 56, 63]], 'attr': {'dilations': [1, 1], 'strides': [4, 4], 'data_format': 'NHWC', 'padding': 'VALID', 'kernel_shape': [7, 7], 'weight_shape': [7, 7, 3, 63], 'pads': [0, 0, 0, 0]}, 'input_shape': [[1, 224, 224, 3]]}, 'outbounds': ['conv1.relu.relu/Relu']}


## Use nn-Meter predictor with benchmark dataset 

In [3]:
import nn_meter

predictor_name = 'adreno640gpu_tflite21' # user can change text here to test other predictors

# load predictor
predictor = nn_meter.load_latency_predictor(predictor_name)



In [4]:
# view latency prediction demo in one model group of the dataset 
test_data = datasets[0]
with jsonlines.open(test_data) as data_reader:
    True_lat = []
    Pred_lat = []
    for i, item in enumerate(data_reader):
        if i >= 20: # only show the first 20 results to save space
            break
        graph = item["graph"]
        pred_lat = predictor.predict(graph, model_type="nnmeter-ir")
        real_lat = item[predictor_name]
        print(f'[RESULT] {os.path.basename(test_data)}[{i}]: predict: {pred_lat}, real: {real_lat}')

        if real_lat != None:
            True_lat.append(real_lat)
            Pred_lat.append(pred_lat)

if len(True_lat) > 0:
    rmse, rmspe, error, acc5, acc10, _ = nn_meter.latency_metrics(Pred_lat, True_lat)
    print(
        f'[SUMMARY] The first 20 cases from {os.path.basename(test_data)} on {predictor_name}: rmse: {rmse}, 5%accuracy: {acc5}, 10%accuracy: {acc10}'
    )

[RESULT] alexnets.jsonl[0]: predict: 23.447085575244767, real: 24.4851
[RESULT] alexnets.jsonl[1]: predict: 23.885675776357132, real: 23.9185
[RESULT] alexnets.jsonl[2]: predict: 29.586297830632216, real: 30.3052
[RESULT] alexnets.jsonl[3]: predict: 51.12333226388625, real: 52.089
[RESULT] alexnets.jsonl[4]: predict: 4.937166470494071, real: 5.26442
[RESULT] alexnets.jsonl[5]: predict: 14.996201148770355, real: 15.2265
[RESULT] alexnets.jsonl[6]: predict: 9.262593840400983, real: 9.12046
[RESULT] alexnets.jsonl[7]: predict: 13.912859618198581, real: 14.2242
[RESULT] alexnets.jsonl[8]: predict: 15.02293612116675, real: 15.2457
[RESULT] alexnets.jsonl[9]: predict: 12.443609556620192, real: 12.5989
[RESULT] alexnets.jsonl[10]: predict: 15.971239887611217, real: 15.185
[RESULT] alexnets.jsonl[11]: predict: 19.469347190777857, real: 20.1434
[RESULT] alexnets.jsonl[12]: predict: 12.580476335563757, real: 14.4818
[RESULT] alexnets.jsonl[13]: predict: 18.514081238237033, real: 19.0136
[RESULT]

## Use benckmark dataset for GNN

Considering the dataset is encoded in a graph format, we also provide interfaces, i.e., `GNNDataset` and `GNNDataloader`, for GNN training to predict the model latency with the bench dataset. 

`GNNDataset` and `GNNDataloader` in `nn_meter/dataset/gnn_dataloader.py` build the model structure of the Dataset in `.jsonl` format into GNN required Dataset and Dataloader. The output of GNNDataset includes adjacency matrix and attributes of the graph, together with latency value. The script depends on package `torch` and `dgl`.

Here we provide dataset for GNN training:

In [5]:
import os
from nn_meter.dataset import gnn_dataloader

target_device = "cortexA76cpu_tflite21"

print("Processing Training Set.")
train_set = gnn_dataloader.GNNDataset(train=True, device=target_device) 
print("Processing Testing Set.")
test_set = gnn_dataloader.GNNDataset(train=False, device=target_device)

train_loader = gnn_dataloader.GNNDataloader(train_set, batchsize=1 , shuffle=True)
test_loader = gnn_dataloader.GNNDataloader(test_set, batchsize=1, shuffle=False)
print('Train Dataset Size:', len(train_set))
print('Testing Dataset Size:', len(test_set))
print('Attribute tensor shape:', next(train_loader)[1].ndata['h'].size(1))
ATTR_COUNT = next(train_loader)[1].ndata['h'].size(1)

Processing Training Set.
Processing Testing Set.


Using backend: pytorch


Train Dataset Size: 20732
Testing Dataset Size: 5173
Attribute tensor shape: 26


Then we build a GNN model, which is constructed based on GraphSAGE, and maxpooling is selected as out pooling method.

In [6]:
import torch
import torch.nn as nn
from torch.nn.modules.module import Module
import dgl.nn as dglnn
from dgl.nn.pytorch.glob import MaxPooling

class GNN(Module):
    def __init__(self, 
                num_features=0, 
                num_layers=2,
                num_hidden=32,
                dropout_ratio=0):

        super(GNN, self).__init__()
        self.nfeat = num_features
        self.nlayer = num_layers
        self.nhid = num_hidden
        self.dropout_ratio = dropout_ratio
        self.gc = nn.ModuleList([dglnn.SAGEConv(self.nfeat if i==0 else self.nhid, self.nhid, 'pool') for i in range(self.nlayer)])
        self.bn = nn.ModuleList([nn.LayerNorm(self.nhid) for i in range(self.nlayer)])
        self.relu = nn.ModuleList([nn.ReLU() for i in range(self.nlayer)])
        self.pooling = MaxPooling()
        self.fc = nn.Linear(self.nhid, 1)
        self.fc1 = nn.Linear(self.nhid, self.nhid)
        self.dropout = nn.ModuleList([nn.Dropout(self.dropout_ratio) for i in range(self.nlayer)])

    def forward_single_model(self, g, features):
        x = self.relu[0](self.bn[0](self.gc[0](g, features)))
        x = self.dropout[0](x)
        for i in range(1,self.nlayer):
            x = self.relu[i](self.bn[i](self.gc[i](g, x)))
            x = self.dropout[i](x)
        return x

    def forward(self, g, features):
        x = self.forward_single_model(g, features)
        with g.local_scope():
            g.ndata['h'] = x
            x = self.pooling(g, x)
            x = self.fc1(x)
            return self.fc(x)

Start GNN training:

In [7]:
from torch.optim.lr_scheduler import CosineAnnealingLR

if torch.cuda.is_available():
    print("Using CUDA.")
# device = "cpu"
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Start Training
model = GNN(ATTR_COUNT, 3, 400, 0.1).to(device)
opt = torch.optim.AdamW(model.parameters(), lr=4e-4)
EPOCHS=5
loss_func = nn.L1Loss()

lr_scheduler = CosineAnnealingLR(opt, T_max=EPOCHS)
loss_sum = 0
for epoch in range(EPOCHS):
    train_length = len(train_set)
    tran_acc_ten = 0
    loss_sum = 0 
    # latency, graph, types, flops
    for batched_l, batched_g in train_loader:
        opt.zero_grad()
        batched_l = batched_l.to(device).float()
        batched_g = batched_g.to(device)
        batched_f = batched_g.ndata['h'].float()
        logits = model(batched_g, batched_f)
        for i in range(len(batched_l)):
            pred_latency = logits[i].item()
            prec_latency = batched_l[i].item()
            if (pred_latency >= 0.9 * prec_latency) and (pred_latency <= 1.1 * prec_latency):
                tran_acc_ten += 1
        # print("true latency: ", batched_l)
        # print("Predict latency: ", logits)
        batched_l = torch.reshape(batched_l, (-1 ,1))
        loss = loss_func(logits, batched_l)
        loss_sum += loss
        loss.backward()
        opt.step()
    lr_scheduler.step()
    print("[Epoch ", epoch, "]: ", "Training accuracy within 10%: ", tran_acc_ten / train_length * 100, " %.")

[Epoch  0 ]:  Training accuracy within 10%:  22.486976654447233  %.
[Epoch  1 ]:  Training accuracy within 10%:  29.471348639783905  %.
[Epoch  2 ]:  Training accuracy within 10%:  32.60659849508007  %.
[Epoch  3 ]:  Training accuracy within 10%:  37.830407100135055  %.
[Epoch  4 ]:  Training accuracy within 10%:  43.32915300019294  %.
