## N Ways to GPU Programming

## Learning objectives
With the release of NVIDIA® CUDA®  in 2007, different approaches to programming GPUs have evolved. Each approach has its own advantages and disadvantages. By the end of this bootcamp session, students will have a broader perspective on GPU programming approaches to help them select a programming model that better fits their applications' needs and constraints. The bootcamp will teach how to accelerate a popular algorithm of Radial Distribution Function (RDF) using the following methods:
* Standard: C++ stdpar, Fortran Do-Concurrent
* Directives: OpenACC, OpenMP
<!--* Frameworks: Kokkos-->
* Programming Language Extension: CUDA C, CUDA Fortran

Let's start with testing the CUDA Driver and GPU you are running the code on in this lab:

In [None]:
!nvidia-smi

<!--**IMPORTANT**: Before we start please download the input file needed for this application from the [Google drive](https://drive.google.com/drive/folders/1aQ_MFyrjBIDMhCczse0S2GQ36MlR6Q_s?usp=sharing) and upload it to the input folder. From the top menu, click on *File*, and *Open* and navigate to `C/source_code/input` directory and copy paste the downloaded input file (`alk.traj.dcd`).-->


### Bootcamp Outline

 We will be following the cycle of Analysis - Parallelization - Optimization cycle throughout. To start with let us understand the Nsight tool ecosystem:   

- [Nsight Systems](jupyter_notebook/nsight_systems.ipynb)
    - Overview of Nsight profiler tools
    - Introduction to Nsight Systems
    - How to view the report
    - How to use NVTX APIs
    - Optimization Steps to parallel programming 
    
- [Nsight Compute](jupyter_notebook/nsight_compute.ipynb)
    - Introduction to Nsight Compute
    - Overview of sections
    - Roofline Charts
    - Memory Charts
    - Profiling a kernel using CLI
    - How to view the report
  
We will be working on porting a radial distribution function (RDF) to GPUs. Note: Learn about all terminologies used throughtout the notebooks in the [GPU Architecture Terminologies](jupyter_notebook/GPU_Architecture_Terminologies.ipynb) notebook.

Please read the [RDF Overview](jupyter_notebook/rdf_overview.ipynb) to get familiar with how this application works.

Below is the list of GPU programming approaches we will be covering during this course, click on the link below to start exploring:
    
1. [ISO C++ and ISO Fortran](../iso/jupyter_notebook/nways_iso.ipynb)
2. [OpenACC](../openacc/jupyter_notebook/nways_openacc.ipynb)<!-- , [OpenACC Advanced](C/jupyter_notebook/openacc/nways_openacc_opt.ipynb)-->
<!--3. [Kokkos](C/jupyter_notebook/kokkos/nways_kokkos.ipynb)-->
3. [OpenMP](../openmp/jupyter_notebook/nways_openmp.ipynb) 
4. [CUDA](../cuda/jupyter_notebook/nways_cuda.ipynb) 
5. [Memory Coherent Architectures](../memory_coherent/jupyter_notebook/memory_coherent_architectures.ipynb)
   
To finish the lab, let us go through some final [remarks](jupyter_notebook/Final_Remarks.ipynb)



### Bootcamp Duration
The lab material will be presented in an 8hr session. The link to the material is available for download at the end of the lab.

### Content Level
Beginner, Intermediate

### Target Audience and Prerequisites
The target audience for this lab is researchers/graduate students and developers  interested in learning about programming various ways to program GPUs to accelerate their scientific applications.

Basic experience with C/C++ or Fortran programming is needed. No GPU programming knowledge is required.

-----
<!--
# <div style="text-align: center ;border:3px; border-style:solid; border-color:#FF0000  ; padding: 1em">[HOME](../../nways_start.ipynb)</div> 
-->
-----



## Licensing 

Copyright © 2022 OpenACC-Standard.org.  This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0). These materials may include references to hardware and software developed by other entities; all applicable licensing and copyrights apply.