<a href="https://colab.research.google.com/github/VijiKK/Parallel_Algorithms/blob/main/OpenMP_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

To execute OpenMP code in Colab, first install the prerequsites.

Ref_1: https://siddhigate.hashnode.dev/how-to-run-cuda-and-openmp-code-on-google-colaboratory

Ref_2: https://medium.com/@iphoenix179/running-cuda-c-c-in-jupyter-or-how-to-run-nvcc-in-google-colab-663d33f53772

**OpenMP (Open Multi-Processing):**

OpenMP is an API (Application Programming Interface) that provides a simple and flexible way to add parallelism to C, C++, and Fortran programs. It allows developers to express parallelism through compiler directives, making it easier to write multi-threaded programs for shared-memory architectures.

**Example Code:**

Here's a simple example of using OpenMP in C to parallelize a loop that calculates the sum of an array using multiple threads:

In [1]:
!apt update -qq;
!wget https://developer.nvidia.com/compute/cuda/8.0/Prod2/local_installers/cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64-deb;
!dpkg -i cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64-deb;
!apt-key add /var/cuda-repo-8-0-local-ga2/7fa2af80.pub;
!apt-get update -qq;
!apt-get install cuda gcc-5 g++-5 -y -qq;
!ln -s /usr/bin/gcc-5 /usr/local/cuda/bin/gcc;
!ln -s /usr/bin/g++-5 /usr/local/cuda/bin/g++;
!apt install cuda-8.0;

16 packages can be upgraded. Run 'apt list --upgradable' to see them.
--2023-08-28 01:54:23--  https://developer.nvidia.com/compute/cuda/8.0/Prod2/local_installers/cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64-deb
Resolving developer.nvidia.com (developer.nvidia.com)... 152.195.19.142
Connecting to developer.nvidia.com (developer.nvidia.com)|152.195.19.142|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://developer.nvidia.com/downloads/compute/cuda/8.0/Prod2/local_installers/cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64-deb [following]
--2023-08-28 01:54:23--  https://developer.nvidia.com/downloads/compute/cuda/8.0/Prod2/local_installers/cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64-deb
Reusing existing connection to developer.nvidia.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://developer.download.nvidia.com/compute/cuda/8.0/secure/Prod2/local_installers/cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61

In [2]:
!/usr/local/cuda/bin/nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0


In [3]:
!pip install git+https://github.com/andreinechaev/nvcc4jupyter.git

Collecting git+https://github.com/andreinechaev/nvcc4jupyter.git
  Cloning https://github.com/andreinechaev/nvcc4jupyter.git to /tmp/pip-req-build-cijrazyx
  Running command git clone --filter=blob:none --quiet https://github.com/andreinechaev/nvcc4jupyter.git /tmp/pip-req-build-cijrazyx
  Resolved https://github.com/andreinechaev/nvcc4jupyter.git to commit 0a71d56e5dce3ff1f0dd2c47c29367629262f527
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: NVCCPlugin
  Building wheel for NVCCPlugin (setup.py) ... [?25l[?25hdone
  Created wheel for NVCCPlugin: filename=NVCCPlugin-0.0.2-py3-none-any.whl size=4295 sha256=28ba31af0c1cd6ae2c211bab7637edb7037189c28062b359c2c422821a1ff2bd
  Stored in directory: /tmp/pip-ephem-wheel-cache-rvk4e4te/wheels/a8/b9/18/23f8ef71ceb0f63297dd1903aedd067e6243a68ea756d6feea
Successfully built NVCCPlugin
Installing collected packages: NVCCPlugin
Successfully installed NVCCPlugin-0.0.2


In [5]:
%load_ext nvcc_plugin

created output directory at /content/src
Out bin /content/result.out


In the following example, the #**pragma omp parallel for** directive is used to parallelize the loop. The **omp_get_thread_num()** function returns the ID of the current thread. The **reduction(+:sum)** clause ensures that the **sum** variable is safely updated across threads.

When you compile and run this program, you'll see that the loop iterations are distributed among multiple threads, each calculating a part of the sum. The final sum is then computed by aggregating the partial sums from each thread.

OpenMP's power lies in its ability to parallelize loops, sections of code, and tasks through compiler directives. It's a convenient way to introduce parallelism to programs without extensive changes to the original codebase.

In [6]:
%%cuda --name openmp_arraysum.cu

#include <stdio.h>
#include <omp.h>

int main() {
    int array[10] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
    int sum = 0;

    #pragma omp parallel for reduction(+:sum)
    for (int i = 0; i < 10; i++) {
        printf("Thread %d working on index %d\n", omp_get_thread_num(), i);
        sum += array[i];
    }

    printf("Sum of array elements: %d\n", sum);

    return 0;
}


'File written in /content/src/openmp_arraysum.cu'

In [7]:
!nvcc -Xcompiler="-fopenmp" -arch=sm_75 -o /content/src/openmp_arraysum_output /content/src/openmp_arraysum.cu
!/content/src/openmp_arraysum_output

Thread 1 working on index 5
Thread 1 working on index 6
Thread 1 working on index 7
Thread 1 working on index 8
Thread 1 working on index 9
Thread 0 working on index 0
Thread 0 working on index 1
Thread 0 working on index 2
Thread 0 working on index 3
Thread 0 working on index 4
Sum of array elements: 55


**Example - Perform multiple tasks in parallel**

In [8]:
%%cuda --name openmp_tasks.cu

//  -*- mode: c++; eval: (c-set-offset (quote cpp-macro) 0)-*-
#include <cstdio>
#include <omp.h>
#include <unistd.h>

void do_task(const char* name, int duration)
{
    printf("task %s started By Thread - %d\n", name, omp_get_thread_num());
    sleep(duration);
    printf("task %s ended\n", name);
}

int main(int argc, char *argv[])
{
    double start = omp_get_wtime();
    int A, B, C, D, E;
   omp_set_num_threads(3);

    #pragma omp parallel
    {
        #pragma omp single
        {
            #pragma omp task depend(out:A)
            do_task("A", 10);

            #pragma omp task depend(out:B)
            do_task("B", 2);

            #pragma omp task depend(out:C)
            do_task("C", 2);

            #pragma omp task depend(inout:B,C,D)
            do_task("D", 5);

            #pragma omp task depend(in:A,D)
            do_task("E", 1);
        }
    }

    double end = omp_get_wtime();
    printf("time: %lf\n", end-start);
    return 0;
}

'File written in /content/src/openmp_tasks.cu'

In [9]:
!nvcc -Xcompiler="-fopenmp" -arch=sm_75 -o /content/src/openmp_tasks_output /content/src/openmp_tasks.cu
!/content/src/openmp_tasks_output






task A started By Thread - 1
task B started By Thread - 0
task C started By Thread - 2
task B ended
task C ended
task D started By Thread - 2
task D ended
task A ended
task E started By Thread - 1
task E ended
time: 11.000360
