Copyright (c) 2022 Habana Labs, Ltd. an Intel Company.
All rights reserved.

# Licensed under the Apache License, Version 2.0 (the “License”);

you may not use this file except in compliance with the License. You may obtain a copy of the License at https://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

# Inference on Gaudi - Example3

This notebook is used as an example to show inference on the Gaudi Accelerator. 

This tutorial will show inference mode with HPU GRAPH with built-in wrapper, by using a simple model and the MNIST dataset.

Download pretrained model checkpoints from vault

In [None]:
!wget https://vault.habana.ai/artifactory/misc/inference/mnist/mnist-epoch_20.pth

Import all neccessary dependencies

In [1]:
import os
import sys
import torch
import time
import habana_frameworks.torch as ht
import habana_frameworks.torch.core as htcore
from torch.utils.data import DataLoader
from torchvision import transforms, datasets
import torch.nn as nn
import torch.nn.functional as F

  from .autonotebook import tqdm as notebook_tqdm


Define a simple Net model for MNIST.

In [2]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1   = nn.Linear(784, 256)
        self.fc2   = nn.Linear(256, 64)
        self.fc3   = nn.Linear(64, 10)
    def forward(self, x):
        out = x.view(-1,28*28)
        out = F.relu(self.fc1(out))
        out = F.relu(self.fc2(out))
        out = self.fc3(out)
        out = F.log_softmax(out, dim=1)
        return out

Create the model, and load the pre-trained checkpoint.
Optimize the model for eval, and move the model to the Gaudi Accelerator (“hpu”)

In [3]:
model = Net()
checkpoint = torch.load('mnist-epoch_20.pth')
model.load_state_dict(checkpoint)
model = model.eval()

Wrap the model with HPU graph, and move it to HPU
Here we are using "wrap_in_hpu_graph" to wrap module forward function with HPU Graphs. This wrapper captures, caches and replays the graph.

In [4]:
model = ht.hpu.wrap_in_hpu_graph(model)
model = model.to("hpu")

 PT_HPU_LAZY_MODE = 1
 PT_HPU_LAZY_EAGER_OPTIM_CACHE = 1
 PT_HPU_ENABLE_COMPILE_THREAD = 0
 PT_HPU_ENABLE_EXECUTION_THREAD = 1
 PT_HPU_ENABLE_LAZY_EAGER_EXECUTION_THREAD = 1
 PT_ENABLE_INTER_HOST_CACHING = 0
 PT_ENABLE_INFERENCE_MODE = 1
 PT_ENABLE_HABANA_CACHING = 1
 PT_HPU_MAX_RECIPE_SUBMISSION_LIMIT = 0
 PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807
 PT_HPU_MAX_COMPOUND_OP_SIZE_SS = 10
 PT_HPU_ENABLE_STAGE_SUBMISSION = 1
 PT_HPU_STAGE_SUBMISSION_MODE = 2
 PT_HPU_PGM_ENABLE_CACHE = 1
 PT_HPU_ENABLE_LAZY_COLLECTIVES = 0
 PT_HCCL_SLICE_SIZE_MB = 16
 PT_HCCL_MEMORY_ALLOWANCE_MB = 0
 PT_HPU_INITIAL_WORKSPACE_SIZE = 0
 PT_HABANA_POOL_SIZE = 24
 PT_HPU_POOL_STRATEGY = 5
 PT_HPU_POOL_LOG_FRAGMENTATION_INFO = 0
 PT_ENABLE_MEMORY_DEFRAGMENTATION = 1
 PT_ENABLE_DEFRAGMENTATION_INFO = 0
 PT_HPU_ENABLE_SYNAPSE_LAYOUT_HANDLING = 1
 PT_HPU_ENABLE_SYNAPSE_OUTPUT_PERMUTE = 1
 PT_HPU_ENABLE_VALID_DATA_RANGE_CHECK = 1
 PT_HPU_FORCE_USE_DEFAULT_STREAM = 0
 PT_RECIPE_CACHE_PATH = 
 PT_HPU_ENABLE_REF

Create a MNIST datasets for evaluation.

In [5]:
transform=transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))])

data_path = './data'
test_kwargs = {'batch_size': 32}
dataset1 = datasets.MNIST(data_path, train=False, download=True, transform=transform)
test_loader = torch.utils.data.DataLoader(dataset1,**test_kwargs)

Do a warm run : here HPU graph will be captured and cached. 

In [6]:
warmup_input = torch.randn(32, 1, 28, 28, device='hpu')
warmup_output = model(warmup_input)

Run inference.

Here, we already wrap the model with the HPU graph with wrap_in_hpu_graph as shown above, so there is no need to copy and replay the stream. It will be all done in the background. We are also using asynchronos copies here as shown below (copy with "non_blocking=True" followed by mark_step), to further optimize the inference. Please refer to the guideline below for more information [here](https://docs.habana.ai/en/latest/PyTorch/Inference_on_PyTorch/Inference_Optimization.html#using-asynchronous-copies). Adding mark_step after model() is not required with HPU Graphs as it is handled implicitly. 

In [7]:
correct = 0 
for batch_idx, (data, label) in enumerate(test_loader):  
    data = data.to("hpu", non_blocking=True)
    htcore.mark_step()
    output = model(data)
    correct += output.max(1)[1].eq(label).sum()

print('Accuracy: {:.2f}%'.format(100. * correct / (len(test_loader) * 32)))

Accuracy: 94.36%


In [8]:
quit()