# Hybrid CUDA on Python Tutorial

This notebook aims at presenting the API of hybrid cuda.

## Initializing the CUDA context in the kernel runner process

hybrid cuda uses the CUDA driver API. It requires initialization.

In [None]:
import os
import inspect
import hybridcuda
hybridcuda.registerheader("hybpython.cuh", os.getcwd() + os.sep + ".." + os.sep + ".." + os.sep + "hybpython.cuh")
cures = hybridcuda.initcuda()
cures

In [None]:
class hybridkernel:
    gridDimX = 1
    blockDimX = 1
    shared = 0
    stream = 0
    def __init__(self, func):
        self.hc = hybridcuda.processfunction(func)
        self.hc = hybridcuda.cudajitcode(self.hc)
        self.hc = hybridcuda.ptxlinkcode(self.hc)
        
    def __call__(self, *args):
        self.hc = hybridcuda.launch(self.hc, self.gridDimX,1,1, self.blockDimX,1,1, self.shared,self.stream, *args)

    def __getitem__(self, args):
        if (type(args) != tuple):
            self.grid = args
            return self
        # args is a tuple...
        if (len(args) > 0):
            self.grid = args[0]
        if (len(args) > 1):
            self.block = args[1]
        if (len(args) > 2):
            self.shared = args[2]
        if (len(args) > 3):
            self.stream = args[3]
        return self

#decorator definition
def hybridfunction(func):
    return hybridkernel(func)

## 1. Hello World sample

Kernel definition

In [None]:
@hybridfunction
def mykernel(N : int, a,b,c):
    for i in range(0,N):
        c[i] = a[i] + b[i]

Running kernel on GPU

In [None]:
## prepare some data
import numpy as np
N = 10
a = np.ones(N)
b = np.ones(N)
c = np.zeros(N)

## launch kernel
mykernel[1,1](N,a,b,c)
c


## 2. Walkthrough a simple example (without syntaxic sugar)

### Function to be transpiled

In [None]:
def func(N : int, a,b,c):
    for i in range(0,N):
        c[i] = a[i] + b[i]

### 1. Generate cuda source code 

In [None]:
hc = hybridcuda.processfunction(func)
hc

Function call returns a dictionary with the following entries:
* `version`: a version number
* `cuda`: string with the cuda source code of the generated module
* `kernelname`: the kernel function name - *that is the exported symbol of the kernel function*
* `argtypes`: contains the argument types in CUDA format

### 2. Generate PTX from CUDA source

In [None]:
hc = hybridcuda.cudajitcode(hc)
hc.keys()

Two entries are added: 
* `ptx`: holds a string with the ptx assembly code
* `nvrtclog`: holds the log from the compilation

### 3. Generate CUBIN from PTX

In [None]:
hc = hybridcuda.ptxlinkcode(hc)
hc.keys()

`cubin` entry is added which is a memory view

### 4. Launching the kernel

In [None]:
## prepare some data
import numpy as np
N = 10
a = np.ones(N)
b = np.ones(N)
c = np.zeros(N)

## launch kernel
hc = hybridcuda.launch(hc, 1,1,1, 1,1,1, 0,0, N,a,b,c)
c


In [None]:
# Arguments of the launch function
print(hybridcuda.launch.__doc__)