# The Simplest Kind of Threading

in Julia is of this type:

```
using Base.Threads: @threads

Z = zeros(N)
@threads for i = 1:N
    local y
    y = SomeFunction(i,W)
    Z[i] = SomeOtherFunction(i,X,y,but not Z)
end
U = YetAnotherFunction(Z)
```

In this case, `Z` is just used for storing the results and not used as arguments/inputs inside the loop. Variables that are created in the loop (like `y`) should be declared as `local` to avoid that they are shared across threads.

The `@threads` is a simple (but powerful) approach that works well when the iterations are similar (uniform). In contrast, if some iterations are much more costly or you want to used threads in a nested setting, the `@spawn/fetch/@sync` might be better. The focus of this notebook is on `@threads`.

Threading typically only pays off when the iterations involve heavy computations. Otherwise, the cost of setting of the threading might dominate.

In [1]:
using Printf
using Base.Threads: @threads

include("jlFiles/printmat.jl")

printyellow (generic function with 1 method)

# Comparing w/wo Threads 

We will calculate 

$\mu_{t}=\lambda \mu_{t-1} + (1-\lambda) x_{t-1}$, for $t=2...T$

on each column of a large matrix ($t$ refers to rows).

In [2]:
function ExpMA(x,λ)
    T = length(x)
    μ  = zeros(T)
    for t = 2:T
        μ[t]  = λ*μ[t-1]  + (1-λ)*x[t-1]
    end
    return μ
end

ExpMA (generic function with 1 method)

In [3]:
function ExpMA_1(X,λ)          #wrap ExpMA(X,λ) in loop over columns of X
    (T,N) = (size(X,1),size(X,2))
    result = zeros(T,N)
    for i = 1:N
        result[:,i] = ExpMA(X[:,i],λ)
    end
    return result    
end

function ExpMA_2(X,λ)         #wrap but with threaded loop
    (T,N) = (size(X,1),size(X,2))
    result = zeros(T,N)
    @threads for i = 1:N            #notice Threads.@threads
        result[:,i] = ExpMA(X[:,i],λ)
    end
    return result    
end

ExpMA_2 (generic function with 1 method)

In [4]:
λ = 0.94
T = 100_000

x = randn(T)
μ = ExpMA(x,λ)              #test the function
println()




In [5]:
N = 500
X = randn(T,N)

Y1 = ExpMA_1(X,λ)
Y2 = ExpMA_2(X,λ)

println("Test if the same results: ", isequal(Y1,Y2))

Test if the same results: true


In [6]:
println("Number of threads: ", Threads.nthreads())

Number of threads: 2


## Speed Comparison

In [7]:
using BenchmarkTools           #package for benchmarking

println("standard loop")                      
@btime ExpMA_1($X,λ)           #use $X to get more accurate timing
                               #maybe use @benchmark instead to get more info   

println("threaded loop")
@btime ExpMA_2($X,λ)
println()

standard loop
  775.974 ms (2002 allocations: 1.12 GiB)
threaded loop
  475.712 ms (2013 allocations: 1.12 GiB)



# Things that Could Go Wrong with Threading

are often related to data races (the threads writing to the same memory location).

## Case 1: Several Threads Changing the Same Elements of an Array

Threads should write to different memory locations, otherwise anything can happen.

In [8]:
N = 10_000

x = [0]
@threads for i = 1:N
  x[1] = x[1] + 1
end

println("This ought to be $N, but it is ",x[])

This ought to be 10000, but it is 5409


## Case 2: Threads Writing to BitArrays

Code like this
```
Z = falses(N)                       #use fill(false,N) instead
@threads for i = 1:N
    Z[i] = SomeFunction()
end
```
can also create unexpected results since the threads are trying write to the same BitArray 
(which has a packed format). This is solved by instead using `Z = fill(false,N)` which is an array of Bools.

In [9]:
N = 10
Z = falses(N)
#Z = fill(false,N)              #ucomment to solve the problem
@threads for i = 1:N
    Z[i] = true
    sleep(0.5)                   #give the thread something to do
end

println("This should always be `true`. Run a few times to check if that holds.\n")
println("All $N values in Z should be `true`, but only ", sum(Z)," are.")

This should always be `true`. Run a few times to check if that holds.

All 10 values in Z should be `true`, but only 8 are.


## Case 3: @threads and Variable Scope

Code like this
```
v = 1:2
@threads for i = 1:N
    v = something 
    x = SomeFunction(v)
end
```
can create unexpected results since the threads are sharing `v`. This is solved by declaring `v` inside the loop to be `local`.

In [10]:
using LinearAlgebra

function f2(N)
  v = falses(N+1)
  x = zeros(Int,N,N)
  @threads for i = 1:N
    #local v                   #uncomment to solve the problem
    v    = falses(N)
    v[i] = true
    x[v,i] .= i
  end
  return x
end

println("This should always be zero. Run a few times to check if that holds.\n")
M = 100
dev = zeros(M)
for i = 1:M
  dev[i] = maximum(abs,f2(i) - diagm(1:i))
end
println("All $M values should be 0, but only ", sum(dev.==0), " are.")

This should always be zero. Run a few times to check if that holds.

All 100 values should be 0, but only 62 are.
