Copyright (c) 2022 Habana Labs, Ltd. an Intel Company.
All rights reserved.

# Licensed under the Apache License, Version 2.0 (the “License”);

you may not use this file except in compliance with the License. You may obtain a copy of the License at https://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

# Inference on Gaudi - Example2

This notebook is used as an example to show inference on the Gaudi Accelerator. This is using a simple model and the MNIST dataset.

This tutorial will show how to infer an MNIST model with HPU GRAPH using HPU Graph APIs and Stream APIs. 

Download pretrained model checkpoints from vault

In [None]:
!wget https://vault.habana.ai/artifactory/misc/inference/mnist/mnist-epoch_20.pth

Import all neccessary dependencies

In [1]:
import os
import sys
import torch
import time
import habana_frameworks.torch as ht
import habana_frameworks.torch.core as htcore
from torch.utils.data import DataLoader
from torchvision import transforms, datasets
import torch.nn as nn
import torch.nn.functional as F

  from .autonotebook import tqdm as notebook_tqdm


Define a simple Net model for MNIST.
Create the model, and load the pre-trained checkpoint.
Optimize the model for eval, and move the model to Gaudi(hpu)

In [2]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1   = nn.Linear(784, 256)
        self.fc2   = nn.Linear(256, 64)
        self.fc3   = nn.Linear(64, 10)
    def forward(self, x):
        out = x.view(-1,28*28)
        out = F.relu(self.fc1(out))
        out = F.relu(self.fc2(out))
        out = self.fc3(out)
        out = F.log_softmax(out, dim=1)
        return out

In [3]:
model = Net()
checkpoint = torch.load('mnist-epoch_20.pth')
model.load_state_dict(checkpoint)
model = model.eval()

model = model.to("hpu")

 PT_HPU_LAZY_MODE = 1
 PT_HPU_LAZY_EAGER_OPTIM_CACHE = 1
 PT_HPU_ENABLE_COMPILE_THREAD = 0
 PT_HPU_ENABLE_EXECUTION_THREAD = 1
 PT_HPU_ENABLE_LAZY_EAGER_EXECUTION_THREAD = 1
 PT_ENABLE_INTER_HOST_CACHING = 0
 PT_ENABLE_INFERENCE_MODE = 1
 PT_ENABLE_HABANA_CACHING = 1
 PT_HPU_MAX_RECIPE_SUBMISSION_LIMIT = 0
 PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807
 PT_HPU_MAX_COMPOUND_OP_SIZE_SS = 10
 PT_HPU_ENABLE_STAGE_SUBMISSION = 1
 PT_HPU_STAGE_SUBMISSION_MODE = 2
 PT_HPU_PGM_ENABLE_CACHE = 1
 PT_HPU_ENABLE_LAZY_COLLECTIVES = 0
 PT_HCCL_SLICE_SIZE_MB = 16
 PT_HCCL_MEMORY_ALLOWANCE_MB = 0
 PT_HPU_INITIAL_WORKSPACE_SIZE = 0
 PT_HABANA_POOL_SIZE = 24
 PT_HPU_POOL_STRATEGY = 5
 PT_HPU_POOL_LOG_FRAGMENTATION_INFO = 0
 PT_ENABLE_MEMORY_DEFRAGMENTATION = 1
 PT_ENABLE_DEFRAGMENTATION_INFO = 0
 PT_HPU_ENABLE_SYNAPSE_LAYOUT_HANDLING = 1
 PT_HPU_ENABLE_SYNAPSE_OUTPUT_PERMUTE = 1
 PT_HPU_ENABLE_VALID_DATA_RANGE_CHECK = 1
 PT_HPU_FORCE_USE_DEFAULT_STREAM = 0
 PT_RECIPE_CACHE_PATH = 
 PT_HPU_ENABLE_REF

Create a MNIST datasets for evaluation.

In [4]:
transform=transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))])

data_path = './data'
test_kwargs = {'batch_size': 32}
dataset1 = datasets.MNIST(data_path, train=False, download=True, transform=transform)
test_loader = torch.utils.data.DataLoader(dataset1,**test_kwargs)

Create HPUGraph and HPUStream

The HPUGraph API provides a performance optimization technique to reduce PyTorch host overhead. This is done by capturing the PyTorch execution on a stream for the first iteration and replaying that in subsequent ones. The replay avoids the PyTorch overhead of accumulating the ops in the model and makes the execution device bound.

For further details on Stream APIs and HPU Graph APIs, refer to HPU Graph APIs and Stream APIs in the documentation.

The example below shows the capture and replay of HPU Graphs using the following functions:

capture_begin() Begins capturing HPU work on the current stream.
capture_end() Ends capturing HPU work on the current stream.
replay() Replays the HPU work captured by this graph.

In [5]:
g = ht.hpu.HPUGraph()
s = ht.hpu.Stream()

Do warm run and capture the HPUGraph 

Here we are using capture and replay of HPU Graphs:
capture_begin() Begins capturing HPU work on the current stream.
capture_end() Ends capturing HPU work on the current stream.
replay() Replays the HPU work captured by this graph.

The initial warmup step is run before replaying the graph in the subsequent infernce pass

In [6]:
static_input = None
output = None

#WARMRUN 
static_input = torch.randn(32, 1, 28, 28, device='hpu')
with ht.hpu.stream(s):
    g.capture_begin()
    output = model(static_input)
    g.capture_end()

Run inference by replaying the graph. 

Here, we need to copy the input data to the input placeholder(static_input), and then replay. Output will be saved in the output placeholder(output)

In [7]:
correct = 0 
for batch_idx, (data, label) in enumerate(test_loader):  
    static_input.copy_(data)
    g.replay()
    if output.shape[0] != label.shape[0]:
        output = output[:label.shape[0]]
    correct += output.max(1)[1].eq(label).sum()

print('Accuracy: {:.2f}%'.format(100. * correct / (len(test_loader) * 32)))

Accuracy: 94.36%


In [8]:
quit()