# Synchronization

## Overview

OpenMP offer a range of synchronization options working at different levels of granularity.

In [None]:
%load_ext ice_magic

## Barrier

Introduces a single explicit barrier at which all threads have to wait ([OpenMP 5.1 - 2.19.2](https://www.openmp.org/spec-html/5.1/openmpsu100.html))

In [None]:
%%cpp_omp -o code/sync/barrier.cpp

// can be stand-alone
#pragma omp barrier

// or in an enclosing region
#pragma omp parallel num_threads(4)
{
    std::cout << omp_get_thread_num() << std::endl;
    #pragma omp barrier
    std::cout << 10 * omp_get_thread_num() << std::endl;
}

## Single

Introduces a code block only executed by a single thread ([OpenMP 5.1 - 2.10.2](https://www.openmp.org/spec-html/5.1/openmpsu43.html)).
\
Which thread actually executes is compiler specific and may change between runs.
\
The single construct implies an implicit barrier at the end unless the `nowait` clause is added.

In [None]:
%%cpp_omp -o code/sync/single.cpp

#pragma omp parallel num_threads(4)
{
    #pragma omp single
        std::cout << omp_get_thread_num() << std::endl << std::endl;
    
    std::cout << omp_get_thread_num() << std::endl;
}

Private variables may be propagated to other threads by adding the `copyprivate` clause.

In [None]:
%%cpp_omp -o code/sync/copyprivate.cpp

#pragma omp parallel num_threads(4)
{
    int tid;
    #pragma omp single copyprivate(tid)
        tid = omp_get_thread_num();
    
    std::cout << tid << std::endl;
}

## Masked (replacing Master)

`master` restricts execution of a code block to the thread with id 0, but is now deprecated.
\
Its replacement is `masked` which specifies the same behavior, and additionally supports adding a `filter` clause specifying the executing thread ([OpenMP 5.1 - 2.8](https://www.openmp.org/spec-html/5.1/openmpse16.html)).

<div class="alert alert-block alert-info"> <b>Note:</b> In contrast to single, there is <i>no</i> implicit barrier at the end of either, and copying out variables is <i>not</i> supported. </div>

In [None]:
%%cpp_omp -o code/sync/master.cpp

#pragma omp parallel num_threads(4)
{
    // this is deprecated
    #pragma omp master
        std::cout << omp_get_thread_num() << std::endl;

    // this is the replacement ...
    #pragma omp masked
        std::cout << omp_get_thread_num() << std::endl;

    // ... which allows using filter clauses
    #pragma omp masked filter(2)
    std::cout << omp_get_thread_num() << std::endl;
}

## Critical

`critical` restricts _concurrent_ execution of a code block to one thread at a time.

In [None]:
%%cpp_omp -o code/sync/critical.cpp

#pragma omp parallel num_threads(4)
{
    #pragma omp critical
        std::cout << omp_get_thread_num() << std::endl;
}

By default, all critical regions are linked, i.e. concurrent execution of _different_ critical regions is not possible (application-wide).

In [None]:
%%cpp_omp -o code/sync/multi-critical.cpp

#pragma omp parallel num_threads(4)
{
    #pragma omp critical
        std::cout << "A " << omp_get_thread_num() << std::endl;

    #pragma omp critical
        std::cout << "B " << omp_get_thread_num() << std::endl;
}

To circumvent this, named regions can be used.
This groups all critical regions by their names and only allows concurrent execution of critical regions with different names.

In [None]:
%%cpp_omp -o code/sync/named-critical.cpp

#pragma omp parallel num_threads(4)
{
    #pragma omp critical (A)
        std::cout << "A " << omp_get_thread_num() << std::endl;

    #pragma omp critical (B)
        std::cout << "B " << omp_get_thread_num() << std::endl;
}

## Atomic

Critical regions can be very expensive.
In many cases, using atomic operations instead is beneficial ([OpenMP 5.1 - 2.19.7](https://www.openmp.org/spec-html/5.1/openmpsu105.html)).
\
Supported operations are `read`, `write` and `update`.

<div class="alert alert-block alert-info"> <b>Note:</b> Even though atomics are comparatively fast, reductions are usually a better choice if they are applicable </div>

In [None]:
%%cpp_omp -o code/sync/atomic.cpp

int sum = 0;
#pragma omp parallel num_threads(128)
    #pragma omp atomic // default 'update'
    sum += 1;

std::cout << sum << std::endl;