# Getting Started

# Few Line Changes to Run on GPU

Let's see the example how you can easily in a few lines of code switch computations from CPU to GPU device.

Please look on the original example.
We allocate 2 matrices on the Host (CPU) device usnig NumPy array function, all future calculations will be performed as well on the allocated Host(CPU) device.

In [3]:
# Original CPU script

# Call numpy library
import numpy as np

# Data alocated on the CPU device
x = np.array([[1, 1], [1, 1]])
y = np.array([[1, 1], [1, 1]])

# Compute performed on the CPU device, where data is allocated
res = np.matmul(x, y)

print ("res = ", res)

res =  [[2 2]
 [2 2]]


Now let's try to modify our code in a way when all calculations occur on the GPU device.
To do it, you need just to switch to the dpnp library and see on the result.

In [1]:
# Modified XPU script

# Drop-in replacement via single line change
import dpnp as np

# Data alocated on default SYCL device
x = np.array([[1, 1], [1, 1]])
y = np.array([[1, 1], [1, 1]])

# Compute performed on the device, where data is allocated
res = np.matmul(x, y)


print ("Array x is located on the device:", x.device)
print ("Array y is located on the device:", y.device)
print ("res is located on the device:", res.device)
print ("res = ", res)

Array x is located on the device: Device(level_zero:gpu:0)
Array y is located on the device: Device(level_zero:gpu:0)
res is located on the device: Device(level_zero:gpu:0)
res =  [[2 2]
 [2 2]]


As you may see changing only one line of code help us to perform all calculations on the GPU device.
In this example np.array() creates an array on the default SYCL* device, which is "gpu" on systems with integrated or discrete GPU (it is "host" on systems that do not have GPU). The queue associated with this array is now carried with x and y, and np.matmul(x, y) will do matrix product of two arrays x and y, and respective pre-compiled kernel implementing np.matmul() will be submitted to that queue. The result res will be allocated on the device array associated with that queue too.

Now let's make a few improvements in our code and see how we can control and specify exact device on which we want to perform our calculations and which USM memory type to use.

# dpnp simple examples with popular functions

1. Example to return an array with evenly spaced values within a given interval.

In [35]:
import dpnp as np

# Create an array of values from 3 till 30 with step 6
a = np.arange(3, 30, step = 6)

print ("Result a is located on the device:", a.device)
print ("a = ", a)

282 µs ± 27.6 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
Result a is located on the device: Device(level_zero:gpu:0)
a =  [ 3  9 15 21 27]


In this example np.arange() creates an array on the default SYCL* device, which is "gpu" on systems with integrated or discrete GPU (it is "host" on systems that do not have GPU).

2. Example which calculates on the GPU the sum of the array elements

In [3]:
import dpnp as np

x = np.empty(3)

try:
    # Using filter selector strings to specify root devices for a new array
    x = np.asarray ([1, 2, 3], device="gpu")
    print ("Result x is located on the device:", x.device)
except:
    print ("GPU device is not available")

# Return the sum of the array elements
y = np.sum (x) # Expect 6

print ("Result y is located on the device:", y.device)
print ("The sum of the array elements is: ", y )

Result x is located on the device: Device(level_zero:gpu:0)
Result y is located on the device: Device(level_zero:gpu:0)
The sum of the array elements is:  6


In this example np.asarray() creates an array on the default GPU device. The queue associated with this array is now carried with x, and np.sum(x) will derive it from x, and respective pre-compiled kernel implementing np.sum() will be submitted to that queue. The result y will be allocated on the device 0-dimensional array associated with that queue too.

3. Example of inversion of an array

In [1]:
import dpnp as np

try:
    
    # Using filter selector strings to specify root devices for an array
    a = np.array([[1, 1], [2, 1], [1, 0], [-1, 0]], device = "gpu")
    print ("Array a is located on the device:", a.device)  

    # Do inversion of an array "a"
    x = np.invert(a)

    print ("Result x is located on the device:", x.device)
    print ("Array x is:", x) 

except:
    print ("GPU device is not available")


Array a is located on the device: Device(level_zero:gpu:0)
Result x is located on the device: Device(level_zero:gpu:0)
Array x is: [[-2 -2]
 [-3 -2]
 [-2 -1]
 [ 0 -1]]


In this example np.array() creates an array on the default GPU device. The queue associated with this array is now carried with a, and np.invert(a) will derive it from a, and respective pre-compiled kernel implementing np.invert() will be submitted to that queue. The result x will be allocated on the device array associated with that queue too.

# dpctl simple examples

Here you may find a list of simple examples which explain how to understand how many devices you have in the systen and how to operate with them
Let's print the list of all available SYCL devices.

In [5]:
# See the list of available SYCL platforms and extra metadata about each platform.
import dpctl

dpctl.lsplatform()  # Print platform information

Intel(R) OpenCL HD Graphics OpenCL 3.0 
Intel(R) FPGA Emulation Platform for OpenCL(TM) OpenCL 1.2 Intel(R) FPGA SDK for OpenCL(TM), Version 20.3
Intel(R) OpenCL OpenCL 3.0 WINDOWS
Intel(R) Level-Zero 1.3


Let's look on the output.
On my platform is available OpenCL GPU driver, Intel(R) FPGA Emulation Device, OpenCL CPU driver and Level Zero GPU driver.
If i play with verbocity parameter, i can get more information about the devices i have.

In [6]:
# See the list of available SYCL platforms and extra metadata about each platform.
import dpctl

dpctl.lsplatform(2)  # Print platform information with verbocitz level 2 (highest level)

Platform  0 ::
    Name        Intel(R) OpenCL HD Graphics
    Version     OpenCL 3.0 
    Vendor      Intel(R) Corporation
    Backend     opencl
    Num Devices 1
      # 0
        Name                Intel(R) Iris(R) Xe Graphics
        Version             31.0.101.3430
        Filter string       opencl:gpu:0
Platform  1 ::
    Name        Intel(R) FPGA Emulation Platform for OpenCL(TM)
    Version     OpenCL 1.2 Intel(R) FPGA SDK for OpenCL(TM), Version 20.3
    Vendor      Intel(R) Corporation
    Backend     opencl
    Num Devices 1
      # 0
        Name                Intel(R) FPGA Emulation Device
        Version             2022.15.11.0.18_160000
        Filter string       opencl:accelerator:0
Platform  2 ::
    Name        Intel(R) OpenCL
    Version     OpenCL 3.0 WINDOWS
    Vendor      Intel(R) Corporation
    Backend     opencl
    Num Devices 1
      # 0
        Name                11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
        Version             2022.15.11.0

Having information about available SYCL platforms you can specify which type of devices you want to work with

In [3]:
# See the list of available gpu devices and their extra metadata.
import dpctl

if dpctl.has_gpu_devices():
    print (dpctl.get_devices(device_type='gpu'))
else:
    print("GPU device is not available")

[<dpctl.SyclDevice [backend_type.opencl, device_type.gpu,  Intel(R) Iris(R) Xe Graphics] at 0x1a1eddd72f0>, <dpctl.SyclDevice [backend_type.level_zero, device_type.gpu,  Intel(R) Iris(R) Xe Graphics] at 0x1a1eddd70f0>]


In [7]:
# See the list of available gpu devices and their extra metadata.
import dpctl

if dpctl.has_cpu_devices():
    print (dpctl.get_devices(device_type='cpu'))
else:
    print("CPU device is not available")

[<dpctl.SyclDevice [backend_type.opencl, device_type.cpu,  11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz] at 0x1779083bcf0>]


And you can make selection of the specific device in your system using the default selctor

In [9]:
import dpctl

try:
    # Create a SyclDevice of type GPU based on whatever is returned
    # by the SYCL `gpu_selector` device selector class.
    gpu = dpctl.select_gpu_device()
    gpu.print_device_info() # print GPU device information

except:
    print ("GPU device is not available")

    Name            Intel(R) Iris(R) Xe Graphics
    Driver version  1.3.23904
    Vendor          Intel(R) Corporation
    Filter string   level_zero:gpu:0



Or by using the infromation in filter string of the device create abd explicit SyclDevice 

In [7]:
import dpctl

# Create a SyclDevice with an explicit filter string,
# in this case the first level_zero gpu device.
try:
    level_zero_gpu = dpctl.SyclDevice("level_zero:gpu:0")
    level_zero_gpu.print_device_info()
except:
    print("The first level_zero GPU device is not available")    

    Name            Intel(R) Iris(R) Xe Graphics
    Driver version  1.3.23904
    Vendor          Intel(R) Corporation
    Profile         FULL_PROFILE
    Filter string   level_zero:gpu:0



Let's check if your gpu device support double precision. To do this we need to selcet gpu device and check the parameter has_aspect_fp64:

In [16]:
import dpctl
# Select GPU device and check double precision support
try:
    gpu = dpctl.select_gpu_device()
    gpu.print_device_info()
    print("Double precision support is", gpu.has_aspect_fp64)
except:
    print("The GPU device is not available")   

    Name            Intel(R) Iris(R) Xe Graphics
    Driver version  1.3.23904
    Vendor          Intel(R) Corporation
    Filter string   level_zero:gpu:0

Double precision support is False
