<h1 style="color:#65AE11;">Kernel Launches in Non-Default Streams</h1>

In this section you will learn to launch kernels in non-default streams.

<h2 style="color:#65AE11;">Objectives</h2>

By the time you complete this section you will:

* Know how to create non-default streams
* Be able to launch kernels in non-default streams
* Know how to observe operations in non-default streams in Nsight Systems
* Know how to destroy non-default streams

<h2 style="color:#65AE11;">Non-Default Stream Creation</h2>

To create a new non-default stream, pass `cudaStreamCreate` a `cudaStream_t` pointer:

```c
cudaStream_t stream;
cudaStreamCreate(&stream);
```

<h2 style="color:#65AE11;">Launching a Kernel in a Non-Default Stream</h2>

To launch a kernel in a non-default stream, pass a non-default stream identifier as its 4th launch configuration argument. Because a kernel's 3rd launch configuration argument defines dynamically allocated shared memory, you will need to pass it `0` (its default value since we are not using shared memory) if you are not modifying its default value:

```c
cudaStream_t stream;
cudaStreamCreate(&stream);

kernel<<<grid, blocks, 0, stream>>>();
```

<h2 style="color:#65AE11;">Non-Default Stream Destruction</h2>

Destroy non-default streams when you are done with them by passing a non-default stream identifier to `cudaStreamDestroy`:

```c
cudaStream_t stream;
cudaStreamCreate(&stream);

kernel<<<grid, blocks, 0, stream>>>();

cudaStreamDestroy(stream);
```

<h2 style="color:#65AE11;">Exercise: Launch Kernel in Non-Default Stream</h2>

Open and refactor [*06_Kernels_in_Streams/baseline_cipher/baseline.cu*](baseline_cipher/baseline.cu) to launch the `decrypt_gpu` kernel (around line 65) in a non-default stream.

Generate a report file for the refactored application by using a JupyterLab terminal and running `make profile` from within the *06_Kernels_in_Streams/baseline_cipher* directory. (See the [*Makefile*](baseline_cipher/Makefile) there for details).

Open the report file in Nsight Systems. If you've closed the Nsight Systems tab, you can reopen it by following the instructions in [*Nsight Systems Setup*](../04_Nsight_Systems_Setup/Nsight_Systems_Setup.ipynb). As a reminder the password is `nvidia`.

If you were successful, you should notice that the Nsight Systems visual timeline is now presenting information about streams, and that the kernel launch occured in some non-default stream, as is shown in the screenshot below.

If you get stuck, please refer to [06_Kernels_in_Streams/baseline_cipher/baseline_solution.cu](../06_Kernels_in_Streams/baseline_cipher/baseline_solution.cu).

In [3]:
%%bash
# Affiche le répertoire courant pour vérifier le point de départ
pwd

# Se déplacer dans le répertoire contenant le code.
# Remarque : ici on utilise un chemin relatif. Adaptez-le si le dossier se trouve ailleurs.
cd 06_Kernels_in_Streams/baseline_cipher || { 
    echo "Erreur : le répertoire 06_Kernels_in_Streams/baseline_cipher n'existe pas dans $(pwd). Vérifiez le chemin."; 
    exit 1; 
}

# Compiler le code via le Makefile
make

# Lancer l'exécutable qui devrait lancer le kernel 'decrypt_gpu'
./decrypt_gpu


bash: line 2: cd: /06_Kernels_in_Streams/baseline_cipher: No such file or directory
make: *** No targets specified and no makefile found.  Stop.
bash: line 8: ./decrypt_gpu: No such file or directory


CalledProcessError: Command 'b'\ncd /06_Kernels_in_Streams/baseline_cipher\n\n\nmake\n\n\n./decrypt_gpu\n'' returned non-zero exit status 127.

![kernel_in_stream](images/kernel_in_stream.png)

<h2 style="color:#65AE11;">Next</h2>

Now that you can launch kernels in non-default streams, you will in the next section launch memory transfers in non-default streams.

Please continue to the next section: [*Memcpy in Streams*](../07_Memcpy_in_Streams/Memcpy_in_Streams.ipynb).

<h2 style="color:#65AE11;">Optional Further Study</h2>

The following are for students with time and interest to do additional study on topics related to this workshop.

* In scenarios where a single kernel is unable to saturate the device, you might consider using streams to [launch multiple kernels simultaneously](https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html#concurrent-kernel-execution).
* For full coverage of of CUDA stream management functions, see [Stream Management](https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__STREAM.html) in the CUDA Runtime API docs.