# Multithreading

## Overview

* **What are threads?**

* **How many threads are there?**

* **Where are the threads running?**

* **How do I use the threads?**

## What are threads?
Threads are **execution units within a process** that can run simultaneously and **share memory** (heap).

<br>
<img src="./imgs/stack_heap_threads.svg" width=450px>
<br>

## How many threads are there?

By default, Julia starts with a single *user thread*. We must tell it explicitly to start multiple user threads.

* Environment variable: `JULIA_NUM_THREADS=4`
* Command line argument: `julia -t 4`

**It is currently not (really) possible to change the number of threads at runtime!**

In [None]:
Threads.nthreads()

## Where are the threads running?

[ThreadPinning.jl](https://github.com/carstenbauer/ThreadPinning.jl) is the tool for visualizing and controlling thread placement in Julia.

In [None]:
using ThreadPinning

threadinfo()

### Pinning threads (i.e. controling where they are running)

* To avoid double occupancy of CPU cores.
* To reduce noise in benchmarks.
* To address the complexity of the system topology, e.g. to use specific/all memory domains (NUMA).
* ...

`pinthreads(strategy)`
* `:cputhreads` pin to CPU threads (incl. "hypterthreads") one after another
* `:cores:` pin to CPU cores one after another
* `:numa:` alternate between NUMA domains (round-robin)
* `:sockets:` alternate between sockets (round-robin)
* `:affinitymask`: pin according to an external affinity mask (e.g. set by SLURM)

(More? See my talk at JuliaCon2023 @ MIT: https://youtu.be/6Whc9XtlCC0)

In [None]:
pinthreads(:cores) # try :cores or :sockets or :numa or :random or :affinitymask
# threadinfo(; slurm=true)
threadinfo()

#### Memory domains (NUMA)

<img src="imgs/amd_milan_cpu_die.svg" width=800px>

**Image source:** [AMD, High Performance Computing (HPC) Tuning Guide for AMD EPYCTM 7003 Series Processors](https://www.amd.com/system/files/documents/high-performance-computing-tuning-guide-amd-epyc7003-series-processors.pdf)

In [None]:
pinthreads(:numa)
threadinfo(; groupby=:numa)

In [None]:
# using Hwloc
# topology_graphical()

## How do I use the threads?

### Task-based multithreading

In traditional HPC, one typically tells each thread what to do. **("One thinks about threads")**

Julia implements **task-based multithreading**. In this paradigm, a task - e.g. a computational piece of a code - is marked for **parallel** execution on **any** of the available Julia threads. Julia's **dynamic scheduler** will automatically put the task on one of the threads and trigger the execution of the task on said thread.

<br>
<img src="./imgs/tasks_threads_cores.svg" width=750px>
</br>

Generally speaking, the user should **think about tasks and not threads**.
* The scheduler is controlling on which thread a task will eventually run.
* It might even dynamically [migrate tasks](https://docs.julialang.org/en/v1/manual/multi-threading/#man-task-migration) between threads.

**Advantages:**
* high-level abstraction
* nestability / composability (especially important for libraries)

**Disadvantages:**
* scheduling overhead
* uncertain and potentially suboptimal task → thread assignment
  * **can get in the way when performance engineering** because
    * scheduler has limited information (e.g. about the system topology)
    * profiling tools often don't know anything about tasks but monitor threads (or even CPU-cores) instead (e.g. LIKWID).

### `@threads`

* **Splits up the iteration space into `nthreads()` contiguous chunks**
* Creates a task for each of them and hands them off to the dynamic scheduler (essentially `@spawn`).

In [None]:
using Base.Threads: @threads, threadid, nthreads
using ThreadPinning: taskid # for pedagogical purposes only

In [None]:
# creates nthreads() many tasks

@threads for i in 1:2*nthreads()
    println("Task ", ThreadPinning.taskid(), " is running iteration ", i, " on thread ", threadid())
end

#### Static scheduling: `@threads :static`

For `@threads` there is the `:static` scheduling option to opt-out of Julia's dynamic scheduling.

Syntax: `@threads :static for ...`

 * **statically** maps tasks/chunks to threads, specifically: task 1 → thread 1, task 2 → thread 2, and so on.
   * no task migration, i.e. **fixed task-thread mapping** 👍
   * only little overhead 👍
   * not composable / nestable 👎

In [None]:
@threads :static for i in 1:2*nthreads()
    println("Task ", taskid(), " is running iteration ", i, " on thread ", threadid());
end

For `@threads :static`, every thread handles precisely two iterations!

## If time permits

### *User threads* vs other threads

The Julia process is also spawning multiple threads already in "single-threaded" mode, like
* a thread for unix signal listening
* GC threads
* multiple **OpenBLAS threads** for BLAS/LAPACK operations

In [None]:
using LinearAlgebra
BLAS.get_num_threads()

### Garbage collection

If it gets triggered, it stops the world (all threads) for clearing up memory.

Hence, when using multithreading, it is very important to **avoid heap allocations!**

(If you can't avoid allocations, consider using multiprocessing instead.)