In [1]:
import cudnn
print(cudnn.__version__)
import torch

torch.manual_seed(42)
assert torch.cuda.is_available()

1.14.0


  cpu = _conversion_method_template(device=torch.device("cpu"))


In [2]:
device: torch.cuda.device = torch.device("cuda")
print(f"cudnn.backend_version(): {cudnn.backend_version()}")
print(f"torch.version.cuda: {torch.version.cuda}")

cudnn.backend_version(): 91002
torch.version.cuda: 12.8


#### Initialise CUDNN Handle

Initialises the library's context, it acts as an identifier for the current session with cuDNN. In the back-end, under underlying handle is explicitly passed to every subsequent library function that operates on GPU data. This provides the user with a means to explicitly control the library's functioning across multiple host threads, GPUs and CUDA streams. 
e.g. using cudaSetDevice can associate different physical GPUs with different host threads. With a different handle initialised in each host thread, the work from each different host thread will automatically run on different GPU devices

The handle is used to determine on which GPU the kernel will be launched. The context is only associated wtih a single physical GPU device, however multiple handles can be initialised for a single physical GPU device.

Front-end docs: https://docs.nvidia.com/deeplearning/cudnn/frontend/latest/developer/core-concepts.html#cudnn-handle \
Back-end docs: https://docs.nvidia.com/deeplearning/cudnn/backend/latest/developer/core-concepts.html#cudnn-handle

In [None]:
handle = cudnn.create_handle()

#### Initialise CUDNN Graph

CUDNN provides a eclarative programming model, computation is defined via a graph of operations (on tensors), the CUDNN back-end handles how it is executed on the GPU device. Graphs are comprised of three main concepts, Operations, Execution Engines and Heuristics.

*Operations* are a mathematical specification of the operations being executed.\
*Execution Engines*: TODO\
*Heuristics*: TODO


Front-end docs: https://docs.nvidia.com/deeplearning/cudnn/frontend/v1.14.1/developer/graph-api.html#graphs


In [None]:
graph = cudnn.pygraph(
    handle=handle,
    name="cudnn_graph_0",
    io_data_type=cudnn.data_type.HALF,
    compute_data_type=cudnn.data_type.FLOAT,
)

#### Define input tensors

In [None]:
# Create tensor in NHWC format then permute to NCHW
X_gpu = torch.randn(8, 56, 56, 64, device=device, dtype=torch.float16).permute(
    0, 3, 1, 2
)
W_gpu = torch.randn(32, 3, 3, 64, device=device, dtype=torch.float16).permute(
    0, 3, 1, 2
)


In [7]:
cudnn.destroy_handle(handle)