# Multithreading

On Linux/MacOS:

```bash
export JULIA_NUM_THREADS=4
```

On Windows:

```bash
set JULIA_NUM_THREADS=4
```

Afterwards start julia.

You can also create a *Jupyter kernel* for multithreaded Julia:

```julia
using IJulia
installkernel("Julia (4 threads)", env=Dict("JULIA_NUM_THREADS"=>"4"))
```

In [None]:
# How many threads?
Threads.nthreads()

In [None]:
# How many processes?
using Distributed; nprocs()

### Fill an array in parallel

In [None]:
import Base.Threads: @threads, nthreads, threadid

a = zeros(nthreads()*10)
@threads for i in 1:length(a)
    a[i] = threadid()
end

In [None]:
a

### Be careful: parallel summation (naive)

In [None]:
function mysum(xs)
    s = zero(eltype(xs))
    for x in xs
        s += x
    end
    return s
end

In [None]:
function mysum_threaded_naive(xs)
    s = zero(eltype(xs))
    @threads for x in xs
        s += x
    end
    return s
end

In [None]:
xs = rand(100_000);

In [None]:
@show sum(xs);
@show mysum(xs);
@show mysum_threaded_naive(xs);

### Parallel summation (divide the work)

In [None]:
function mysum_threaded(xs)
    b = ceil(Int, length(xs)/nthreads())
    map(sub_xs -> Threads.@spawn(sum(sub_xs)), Iterators.partition(xs, b)) .|> fetch |> sum
end

In [None]:
@show sum(xs);
@show mysum(xs);
@show mysum_threaded(xs);

In [None]:
using BenchmarkTools
@btime mysum($xs);
@btime mysum_threaded($xs);

### Parallel summation (atomics)

In [None]:
import Base.Threads: Atomic, atomic_add!

function mysum_threaded_atomics(xs)
    s = Atomic{eltype(xs)}(zero(eltype(xs)))
    @threads for x in xs
        atomic_add!(s, x)
    end
    return s[]
end

In [None]:
@show mysum(xs);
@show mysum_threaded_atomics(xs);

In [None]:
@btime mysum(xs);
@btime mysum_threaded_atomics(xs);
@btime mysum_threaded(xs);

See [Atomic Operations](https://docs.julialang.org/en/v1/manual/parallel-computing/#Atomic-Operations-1) in the Julia doc for more information.