diff --git a/docs/sources/_images/advisor_roofline_gen9.png b/docs/sources/_images/advisor_roofline_gen9.png new file mode 100644 index 0000000..93dfa6c Binary files /dev/null and b/docs/sources/_images/advisor_roofline_gen9.png differ diff --git a/docs/sources/examples.rst b/docs/sources/examples.rst index 7052dfd..5cd3e9b 100644 --- a/docs/sources/examples.rst +++ b/docs/sources/examples.rst @@ -7,21 +7,27 @@ List of examples .. literalinclude:: ../../examples/01-hello_dpnp.py :language: python :lines: 27- - :caption: Your first NumPy code running on GPU + :caption: **EXAMPLE 01:** Your first NumPy code running on GPU :name: examples_01_hello_dpnp .. literalinclude:: ../../examples/02-dpnp_device.py :language: python :lines: 27- - :caption: Select device type while creating array + :caption: **EXAMPLE 02:** Select device type while creating array :name: examples_02_dpnp_device .. literalinclude:: ../../examples/03-dpnp2numba-dpex.py :language: python :lines: 27- - :caption: Compile dpnp code with numba-dpex + :caption: **EXAMPLE 03:** Compile dpnp code with numba-dpex :name: examples_03_dpnp2numba_dpex +.. literalinclude:: ../../examples/04-dpctl_device_query.py + :language: python + :lines: 27- + :caption: **EXAMPLE 04:** Get information about devices + :name: examples_04_dpctl_device_query + Benchmarks ********** diff --git a/docs/sources/prerequisites_and_installation.rst b/docs/sources/prerequisites_and_installation.rst index ae3f6ab..8a618c0 100644 --- a/docs/sources/prerequisites_and_installation.rst +++ b/docs/sources/prerequisites_and_installation.rst @@ -1,6 +1,10 @@ .. _prerequisites_and_installation: .. include:: ./ext_links.txt +.. |copy| unicode:: U+000A9 + +.. |trade| unicode:: U+2122 + Prerequisites and installation ============================== @@ -31,7 +35,8 @@ Extensions for Python manually. 3. Data Parallel Extensions for Python *************************************** -You can skip this step if you already installed Intel® Distribution for Python or Intel® AI Analytics Toolkit. +You can skip this step if you already installed Intel |copy| Distribution for Python or Intel |copy| AI Analytics Toolkit. + The easiest way to install Data Parallel Extensions for Python is to install numba-dpex: Conda: ``conda install numba-dpex`` @@ -40,3 +45,6 @@ Pip: ``pip install numba-dpex`` The above commands will install ``numba-dpex`` along with its dependencies, including ``dpnp``, ``dpctl``, and required compiler runtimes and drivers. + +.. WARNING:: + Before installing with conda or pip it is strongly advised to update ``conda`` and ``pip`` to latest versions diff --git a/docs/sources/programming_dpep.rst b/docs/sources/programming_dpep.rst index 3d70c66..68b67b8 100644 --- a/docs/sources/programming_dpep.rst +++ b/docs/sources/programming_dpep.rst @@ -39,12 +39,12 @@ to execute your `Numpy*`_ script on GPU usually requires changing just a few lin .. literalinclude:: ../../examples/01-hello_dpnp.py :language: python :lines: 27- - :caption: Your first NumPy code running on GPU + :caption: **EXAMPLE 01:** Your first NumPy code running on GPU :name: ex_01_hello_dpnp In this example ``np.asarray()`` creates an array on the default `SYCL*`_ device, which is ``"gpu"`` on systems -with integrated or discrete GPU (it is ``"cpu"`` on systems that do not have GPU). +with integrated or discrete GPU (it is ``"host"`` on systems that do not have GPU). The queue associated with this array is now carried with ``x``, and ``np.sum(x)`` will derive it from ``x``, and respective pre-compiled kernel implementing ``np.sum()`` will be submitted to that queue. The result ``y`` will be allocated on the device 0-dimensional array associated with that queue too. @@ -57,7 +57,7 @@ In the following example we create the array ``x`` on the GPU device, and perfor .. literalinclude:: ../../examples/02-dpnp_device.py :language: python :lines: 27- - :caption: Select device type while creating array + :caption: **EXAMPLE 02:** Select device type while creating array :name: ex_02_dpnp_device @@ -73,7 +73,7 @@ It takes just a few lines to modify your CPU `Numba*`_ script to run on GPU. .. literalinclude:: ../../examples/03-dpnp2numba-dpex.py :language: python :lines: 27- - :caption: Compile dpnp code with numba-dpex + :caption: **EXAMPLE 03:** Compile dpnp code with numba-dpex :name: ex_03_dpnp2numba_dpex In this example we implement a custom function ``sum_it()`` that takes an array input. We compile it with @@ -104,6 +104,12 @@ there are some situations when you will need to use dpctl advanced capabilities: Another frequent usage is the creation additional queues for the purpose of profiling or choosing an out-of-order execution of offload kernels. +.. literalinclude:: ../../examples/04-dpctl_device_query.py + :language: python + :lines: 27- + :caption: **EXAMPLE 04:** Get information about devices + :name: ex_04_dpctl_device_query + 2. **Cross-platform development using Python Array API standard.** If you’re a Python developer programming Numpy-like codes and targeting different hardware vendors and different tensor implementations, then going with `Python* Array API Standard`_ is a good choice for writing a portable Numpy-like code. @@ -208,8 +214,10 @@ The next command generates the roof-line graph as a html file in the output dire > advisor --report=roofline --gpu --project-dir= --report-output=/roofline_gpu.html -.. todo:: - Insert high-resolution image illustrating Advisor html report +.. image:: ./_images/advisor_roofline_gen9.png + :width: 800px + :align: center + :alt: Advisor roofline analysis example on Gen9 integrated GPU The above figure shows an example roof-line graph generated using Intel Advisor. The X-axis in the graph represents arithmetic intensity and the Y-axis represents performance in GFLOPS. diff --git a/examples/01-hello_dpnp.py b/examples/01-hello_dpnp.py index 8f3a7bd..932f64e 100644 --- a/examples/01-hello_dpnp.py +++ b/examples/01-hello_dpnp.py @@ -27,7 +27,10 @@ import dpnp as np x = np.asarray([1, 2, 3]) +print("Array x allocated on the device:", x.device) + y = np.sum(x) -print(y.shape) # Must be 0-dimensional array -print(y) # Expect 6 +print("Result y is located on the device:", y.device) # The same device as x +print("Shape of y is:", y.shape) # 0-dimensional array +print("y=", y) # Expect 6 diff --git a/examples/02-dpnp_device.py b/examples/02-dpnp_device.py index 16fa2dd..895fb90 100644 --- a/examples/02-dpnp_device.py +++ b/examples/02-dpnp_device.py @@ -32,5 +32,10 @@ except: print("GPU device is not available") +print("Array x allocated on the device:", x.device) + y = np.sum(x) -print(y) # Expect 6 + +print("Result y is located on the device:", y.device) # The same device as x +print("Shape of y is:", y.shape) # 0-dimensional array +print("y=", y) # Expect 6 diff --git a/examples/03-dpnp2numba-dpex.py b/examples/03-dpnp2numba-dpex.py index c147f63..3c2f6ac 100644 --- a/examples/03-dpnp2numba-dpex.py +++ b/examples/03-dpnp2numba-dpex.py @@ -25,20 +25,22 @@ # ***************************************************************************** import dpnp as np -from numba_dpex import jit +from numba_dpex import njit @njit(parallel=True, fastmath=True) -def sum_it(x): +def sum_it(x): # Device queue is inferred from x. The kernel is submitted to that queue return np.sum(x) - -x = None +x = np.empty(3) try: x = np.asarray([1, 2, 3], device="gpu") except: print("GPU device is not available") -y = sum_it(x) +print("Array x allocated on the device:", x.device) + +y = np.sum(x) -print(y.shape) # Must be 0-dimensional array -print(y) # Expect 6 +print("Result y is located on the device:", y.device) # The same device as x +print("Shape of y is:", y.shape) # 0-dimensional array +print("y=", y) # Expect 6 diff --git a/examples/04-dpctl_device_query.py b/examples/04-dpctl_device_query.py new file mode 100644 index 0000000..3a21dbb --- /dev/null +++ b/examples/04-dpctl_device_query.py @@ -0,0 +1,33 @@ +# ***************************************************************************** +# Copyright (c) 2022, Intel Corporation All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are met: +# +# Redistributions of source code must retain the above copyright notice, +# this list of conditions and the following disclaimer. +# +# Redistributions in binary form must reproduce the above copyright notice, +# this list of conditions and the following disclaimer in the documentation +# and/or other materials provided with the distribution. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, +# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR +# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR +# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, +# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, +# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; +# OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, +# WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR +# OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, +# EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +# ***************************************************************************** + +import dpctl + +print("Platform:", dpctl.lsplatform()) # Print platform information +print("GPU devices:", dpctl.get_devices(device_type="gpu")) # Get the list of all GPU devices +print("Number of GPU devices", dpctl.get_num_devices(device_type="gpu")) # Get the number of GPU devices +print("Has CPU devices?", dpctl.has_cpu_devices()) # Check if there are CPU devices +print("Has host device?", dpctl.has_host_device()) # Check if there is the host device