# Introduction

## Overview

* Main aim: provide *standardized* way to *parallelize* applications with *threads* using *directives* and *runtime functions*
* Standard is available online at [https://www.openmp.org/specifications](https://www.openmp.org/specifications)
    * as pdf, currently version 5.2 from Nov. 2021, 669 pages
    * as html, currently version 5.1 from Nov. 2020 ($\leftarrow$ course links to this)
* Supported languages: C, C++, FORTRAN
* Requires compiler support, and runtime
    * Each compiler implements the standard in its own way
    * Different compiler versions have varying support ([compiler support overview](https://www.openmp.org/resources/openmp-compilers-tools/))
* Targets shared memory architectures
    * Distributed memory requires additional parallelization, e.g. with MPI
* A glossary is included in case any technical terms are unclear ([OpenMP 5.1 - 1.2](https://www.openmp.org/spec-html/5.1/openmpse2.html))

## Code Workflow

Let's start with a simple (serial) hello world application in C++ as baseline.

We will use the custom magic function `cpp` which is part of the _instantiate, compile and execute (ICE)_ class.
When used, it performs the following steps for any code snippet passed to it:
* The code is wrapped with a main function and
* header includes are added.
* The resulting application is stored in an output file,
* compiled and
* executed.

The Python code can be reviewed in [ice_magic.py](./ice_magic.py).

In [None]:
%load_ext ice.magic

In [None]:
%%cpp -o code/introduction/hello-world.cpp

std::cout << "Hello world" << std::endl;

The (last) generated application code can be highlighted with markdown by using the `%display_cpp` magic.

In [None]:
%display_cpp

Adding the `--verbose` flag (or `-v` for short) displays additional information on each  step performed.

In [None]:
%%cpp -o code/introduction/hello-world.cpp -v

std::cout << "Hello world" << std::endl;

Lastly, using the `--help` flag (or `-h` for short) displays additional information about available arguments.

In [None]:
%%cpp --help

std::cout << "Hello world" << std::endl;

## First OpenMP application

At this point we can write our first OpenMP parallelized code.
We use the `parallel` construct ([OpenMP 5.0 - 2.6](https://www.openmp.org/spec-html/5.0/openmpse14.html)).

In [None]:
%%cpp -o code/introduction/hello-world-omp.cpp -v

#pragma omp parallel
    std::cout << "Hello world" << std::endl;

Without adaptation our code still runs serially.
To actually enable OpenMP parallelization, we also have to pass an additional argument to our compiler.

For g++, this flag is simply `-fopenmp`.
We can specify the compile flags used when invoking our magic command (`%%cpp --def-flags O3 march=native std=c++17 Wall fopenmp` or `%%cpp --flags fopenmp`), or simply use the specialized magic `cpp_omp`.

<div class="alert alert-block alert-info"> <b>Note:</b> we add the <code>-Wall</code> flag by default since it includes <code>-Wunknown-pragmas</code>. </div>


In [None]:
%%cpp_omp -o code/introduction/hello-world-omp-ext.cpp -v

std::cout << "This is serial" << std::endl;

#pragma omp parallel
    std::cout << "This is parallel" << std::endl;
    
std::cout << "This is serial" << std::endl;

In this code, everything is executed (serially) by the initial thread.
When a parallel region is encountered, the intial thread forks and multiple threads execute that region.
At the end of the region, parallel threads wait at an imlicit barrier before joining.

<img src="img/introduction/parallel.svg" alt="omp-parallel" width="50%"/>

## Different compilers

Different flag(s) are required for different compilers:
| compiler              | flag       |
|-----------------------|------------|
| gcc, g++, gfortran    | `-fopenmp` |
| clang, clang++, flang | `-fopenmp` |
| icc, icpc, ifort      | `-qopenmp` |
| icx, icpx, ifx        | `-qopenmp` |

For Intel's icpx for example:

In [None]:
%%cpp_omp -o code/introduction/hello-world-intel.cpp -v

#pragma omp parallel
    std::cout << "This is parallel" << std::endl;

In [None]:
%%cpp_omp -o code/introduction/hello-world-intel.cpp -v -c icpx -F O3 march=native std=c++17 Wall qopenmp

#pragma omp parallel
    std::cout << "This is parallel" << std::endl;

## Controling Threads

Next, we want to tune the number of OpenMP threads executing our print statement, and make each thread print a unique message.
There are different mechanisms to do this (in decreasing order of priority):
* Adding the `num_threads` clause ([OpenMP 5.1 - 2.6](https://www.openmp.org/spec-html/5.1/openmpse14.html))
* Calling the `omp_set_num_threads` function ([OpenMP 5.1 - 3.2.1](https://www.openmp.org/spec-html/5.1/openmpsu120.html))
* Setting the `OMP_NUM_THREADS` environment variable ([OpenMP 5.1 - 6.2](https://www.openmp.org/spec-html/5.1/openmpse59.html))


For modifying the message we use OpenMP API functions (note that this requires including the `omp.h` header)
* `omp_get_thread_num` which returns the index of the calling thread ([OpenMP 5.1 - 3.2.4](https://www.openmp.org/spec-html/5.1/openmpsu123.html))
* `omp_get_num_threads` which returns the current number of threads ([OpenMP 5.1 - 3.2.2](https://www.openmp.org/spec-html/5.1/openmpsu121.html))

In [None]:
%%cpp_omp -o code/introduction/hello-world-num-threads.cpp

#pragma omp parallel num_threads(4)
    std::cout << "Hello world" << std::endl;

In [None]:
%%cpp_omp -o code/introduction/get-thread-num-serial.cpp

// serial execution
std::cout << "Hello from thread " << omp_get_thread_num() << " of " << omp_get_num_threads() << std::endl;

In [None]:
%%cpp_omp -o code/introduction/num-threads.cpp

# pragma omp parallel num_threads(4)
    std::cout << "Hello from thread " << omp_get_thread_num() << " of " << omp_get_num_threads() << std::endl;

In [None]:
%%cpp_omp -o code/introduction/set-num-threads.cpp

omp_set_num_threads(4);
#pragma omp parallel
    std::cout << "Hello from thread " << omp_get_thread_num() << " of " << omp_get_num_threads() << std::endl;

In [None]:
%%cpp_omp --env OMP_NUM_THREADS=4 -v -o code/introduction/env-num-threads.cpp

#pragma omp parallel
    std::cout << "Hello from thread " << omp_get_thread_num() << " of " << omp_get_num_threads() << std::endl;

## OpenMP Timers

OpenMP provides an easy to use timer interface.
Code regions to be timed are wrapped in `omp_get_wtime` API calls ([OpenMP 5.1 - 3.10.1](https://www.openmp.org/spec-html/5.1/openmpsu185.html)).
\
This can either be done outside of parallel regions, or inside for per-thread timing.

In [None]:
%%cpp_omp -o code/introduction/time-app.cpp

auto start = omp_get_wtime();

#pragma omp parallel num_threads(4)
{
    std::this_thread::sleep_for(std::chrono::milliseconds(128));
}

auto end = omp_get_wtime();
std::cout << "Total time: " << 1e3 * (end - start) << " ms" << std::endl;

In [None]:
%%cpp_omp -o code/introduction/time-thread.cpp

#pragma omp parallel num_threads(4)
{
    auto start = omp_get_wtime();

    std::this_thread::sleep_for(std::chrono::milliseconds(128 + 32 * omp_get_thread_num()));

    auto end = omp_get_wtime();
    std::cout << "Time for thread " << omp_get_thread_num() << " : " << 1e3 * (end - start) << " ms" << std::endl;
}

For convenience, the provided Python magic supports adding timer functionalities automatically by setting the `--time` flag (or `-t` for short).
The source code can be reviewed in [ice-magic](ice_magic.py:75).

In [None]:
%%cpp_omp -o code/introduction/generated-time.cpp --time -d

#pragma omp parallel num_threads(4)
    std::this_thread::sleep_for(std::chrono::milliseconds(128));