Skip to content

RohitBhandare/HPC-Practicals

Repository files navigation

Parallel Computing Overview

This repository contains examples and explanations of different parallel computing techniques and frameworks, including OpenMP, CUDA, Java Parallel Executor Fork/Join, and Python multiprocessing.

OpenMP

OpenMP is an API for parallel programming in shared-memory systems, primarily used with C, C++, and Fortran. It provides a set of compiler directives, library routines, and environment variables that enable parallelism in loops, sections, and tasks.

Running the "Hello, World!" Example

Here's a simple "Hello, World!" program using OpenMP in C++:

#include <iostream>
#include <omp.h>

int main() {
    // Enable OpenMP parallelism
    #pragma omp parallel
    {
        // Get the total number of threads
        int num_threads = omp_get_num_threads();
        
        // Get the thread ID
        int thread_id = omp_get_thread_num();
        
        // Print "Hello, World!" with thread information
        std::cout << "Hello, World! Thread ID: " << thread_id << " Total threads: " << num_threads << std::endl;
    }
    return 0;
}

You can compile and run this code with the following commands:

g++ -o hello_world hello_world.cpp -fopenmp
./hello_world

Output:

Hello, World! Thread ID: Hello, World! Thread ID: 8 Total threads: 12Hello, World! Thread ID: 11 Total threads: 12
Hello, World! Thread ID: 1 Total threads: 12
Hello, World! Thread ID: 2 Total threads: 12
Hello, World! Thread ID: 9 Total threads: 12

Hello, World! Thread ID: 5 Total threads: 12
Hello, World! Thread ID: 7 Total threads: 12
4 Total threads: 12
Hello, World! Thread ID: 3 Total threads: Hello, World! Thread ID: 6 Total threads: 12Hello, World! Thread ID: 0 Total threads: 12
12
Hello, World! Thread ID: 10 Total threads: 12

Use Cases:

  • Parallelizing computationally intensive loops such as matrix multiplication, image processing, or numerical simulations.
  • Exploiting multicore CPUs for accelerating applications that can be parallelized.

CUDA

CUDA is a parallel computing platform and programming model developed by NVIDIA for accelerating computations on NVIDIA GPUs. It allows developers to write C/C++ code and execute it on GPU devices, enabling massively parallel processing.

Use Cases:

  • Accelerating scientific simulations, deep learning training, and data analytics tasks by offloading compute-intensive portions to the GPU.
  • Real-time processing of large datasets, such as image and video processing.

Java Parallel Executor Fork/Join

Java Parallel Executor Fork/Join is a framework introduced in Java 7 for parallel programming. It provides a simple API for parallelizing tasks by recursively dividing them into smaller subtasks and combining their results.

Use Cases:

  • Recursive algorithms like quicksort, mergesort, and tree traversals can be parallelized effectively using Fork/Join.
  • Performing parallel processing of large collections or arrays in data-intensive applications.

Python Multiprocessing

Python Multiprocessing is a module in the Python Standard Library that supports parallel execution of tasks using processes. It allows Python programs to utilize multiple CPU cores for concurrent processing.

Use Cases:

  • Parallelizing CPU-bound tasks such as numerical computations, data processing, and machine learning training.
  • Improving the performance of I/O-bound tasks by offloading them to separate processes, such as web scraping and file processing.