### Step 1: Prepare the build/run environment
oneDNN has four different configurations inside the Intel oneAPI toolkits. Each configuration is in a different folder under the oneDNN installation path, and each configuration supports a different compiler or threading library  

Set the installation path of your oneAPI toolkit

In [34]:
# default path: /opt/intel/oneapi
%env ONEAPI_INSTALL=/opt/intel/oneapi

env: ONEAPI_INSTALL=/opt/intel/oneapi


In [35]:
import os
if os.path.isdir(os.environ['ONEAPI_INSTALL']) == False:
    print("ERROR! wrong oneAPI installation path")

In [36]:
!printf '%s\n'     $ONEAPI_INSTALL/dnnl/latest/cpu_*

/opt/intel/oneapi/dnnl/latest/cpu_dpcpp_gpu_dpcpp
/opt/intel/oneapi/dnnl/latest/cpu_gomp
/opt/intel/oneapi/dnnl/latest/cpu_iomp
/opt/intel/oneapi/dnnl/latest/cpu_tbb


As you can see, there are four different folders under the oneDNN installation path, and each of those configurations supports different features. This tutorial will use the dpcpp configuration to showcase the verbose log for both CPU and GPU.

Create a lab folder for this exercise.

In [23]:
!mkdir -p lab

Install required python packages.

In [37]:
!python3 -m pip install -r requirements.txt
# !python -m pip install --user -r requirements.txt

Defaulting to user installation because normal site-packages is not writeable


In [7]:
!python3 -m pip install torch

Defaulting to user installation because normal site-packages is not writeable


In [41]:
!python3 -m timeit --setup="import torch; net = torch.nn.Linear(784,216); batch = torch.rand(10,784)" "net(batch)"


5000 loops, best of 5: 43.4 usec per loop


In [11]:
!python3 -m pip install torchvision

Defaulting to user installation because normal site-packages is not writeable
Collecting torchvision
  Downloading torchvision-0.8.2-cp37-cp37m-manylinux1_x86_64.whl (12.8 MB)
[K     |████████████████████████████████| 12.8 MB 55 kB/s  eta 0:00:01
Collecting pillow>=4.1.1
  Downloading Pillow-8.0.1-cp37-cp37m-manylinux1_x86_64.whl (2.2 MB)
[K     |████████████████████████████████| 2.2 MB 27.2 MB/s eta 0:00:01
Installing collected packages: pillow, torchvision
Successfully installed pillow-8.0.1 torchvision-0.8.2


Get current platform information for this exercise.

In [38]:
from profiling.profile_utils import PlatformUtils
plat_utils = PlatformUtils()
plat_utils.dump_platform_info()

Physical cores: 12
Total cores: 24
Max Frequency: 3700.0
Min Frequency: 1200.0
Socket Number: 2
Total:  188 GB


In [39]:
import torch
from torchvision import datasets,transforms
import torchvision
from torch.autograd import  Variable
import numpy as np
import matplotlib.pyplot as plt

#./data
# transform = transforms.Compose([transforms.ToTensor(),transforms.Lambda(lambda x:x.repeat(3,1,1)),transforms.Normalize(mean=[0.5,0.5,0.5],std=[0.5,0.5,0.5])])
data_train=datasets.MNIST(root="./mnist",  transform=transforms.ToTensor(), train=True,
                          download=True
                          )
data_test=datasets.MNIST(root="./mnist", transform=transforms.ToTensor(), train=False)
data_loader_train=torch.utils.data.DataLoader(dataset=data_train,
                                              batch_size=50,
                                              shuffle=True)
data_loader_test=torch.utils.data.DataLoader(dataset=data_test,
                                             batch_size=50,
                                             shuffle=True)


In [40]:
images,labels=next(iter(data_loader_train))
img=torchvision.utils.make_grid(images)
 
img=img.numpy().transpose(1,2,0)
 
std=[0.5,0.5,0.5]
mean=[0.5,0.5,0.5]
 
img=img*std+mean
 
print([labels[i] for i in range(4)])
 
plt.show()

[tensor(0), tensor(7), tensor(0), tensor(0)]


In [41]:
import time
import datetime

# starttime = time.clock()
starttime= datetime.datetime.now()

class Model(torch.nn.Module):
    def __init__(self):
        super(Model,self).__init__()
        self.conv1 = torch.nn.Sequential(torch.nn.Conv2d(1,6,kernel_size=5,stride=1,padding=0),
                                        torch.nn.ReLU(),
                                        torch.nn.MaxPool2d(kernel_size=4))
        self.dense = torch.nn.Linear(6*6*6,10)
        
    def forward(self,x):
        x = self.conv1(x)
#         x = x.view(-1,14*14*128)
        x = x.view(x.size(0),-1)
        x = self.dense(x)
        return x
    
model = Model()
# if torch.cuda.is_available():
#     model.cuda()#将所有的模型参数移动到GPU上
cost = torch.nn.CrossEntropyLoss()
optimzer = torch.optim.Adam(model.parameters())
print(model)

n_epochs = 5
 
for epoch in range(n_epochs):
    running_loss = 0.0
    running_correct = 0
    print("Epoch{}/{}".format(epoch,n_epochs))
    print("-"*10)
    for data in data_loader_train:
        #print("train ing")
        X_train,y_train = data
        #有GPU加下面这行，没有不用加
#         X_train,y_train = X_train.cuda(),y_train.cuda()
        X_train,y_train = Variable(X_train),Variable(y_train)
        outputs = model(X_train)
        _,pred = torch.max(outputs.data,1)
        optimzer.zero_grad()
        loss = cost(outputs,y_train)
        
        loss.backward()
        optimzer.step()
        running_loss += loss.item()
        running_correct += torch.sum(pred == y_train.data)
    testing_correct = 0
    for data in data_loader_test:
        X_test,y_test = data
        #有GPU加下面这行，没有不用加
#         X_test,y_test = X_test.cuda(),y_test.cuda()
        X_test,y_test = Variable(X_test),Variable(y_test)
        outputs = model(X_test)
        _,pred = torch.max(outputs,1)
        testing_correct += torch.sum(pred == y_test.data)
    print("Loss is :{:.4f},Train Accuracy is:{:.4f}%,Test Accuracy is:{:.4f}".format(running_loss/len(data_train),100*running_correct/len(data_train),100*testing_correct/len(data_test)))
    
# endtime = time.clock()
endtime = datetime.datetime.now()
print('total execution time is ', (endtime - starttime))

Model(
  (conv1): Sequential(
    (0): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
    (1): ReLU()
    (2): MaxPool2d(kernel_size=4, stride=4, padding=0, dilation=1, ceil_mode=False)
  )
  (dense): Linear(in_features=216, out_features=10, bias=True)
)
Epoch0/5
----------
Loss is :0.0085,Train Accuracy is:88.4717%,Test Accuracy is:94.6700
Epoch1/5
----------
Loss is :0.0030,Train Accuracy is:95.7000%,Test Accuracy is:96.4100
Epoch2/5
----------
Loss is :0.0023,Train Accuracy is:96.5367%,Test Accuracy is:97.0300
Epoch3/5
----------
Loss is :0.0020,Train Accuracy is:96.9817%,Test Accuracy is:97.3000
Epoch4/5
----------
Loss is :0.0018,Train Accuracy is:97.3467%,Test Accuracy is:97.5700
total execution time is  0:00:31.452932
