# Multi-threading in Julia

Multi-threading in Julia is still experimental. But you can still get some pretty good speedups with very little to no effort.

First, load the threading module:

In [1]:
using Base.Threads

`nthreads()` tells you how many threads you've started Julia up with. 

In [None]:
nthreads()

To demonstrate its usage, here's a quick example: 

In [None]:
a = rand(10)

Let us modify this code in place by adding 1 to each element, using a for loop:

In [None]:
for i = 1:size(a,1)
    a[i] = a[i] + 1
end

In [None]:
@threads for i = 1:size(a, 1)
    a[i] = a[i] + 1
end

Let's check if it works:

In [None]:
a

## An Example: BlackScholes 

Now let's consider a bigger example: a simple Blackscholes kernel, widely used in finance. The following code evaluates the cost of a put option:

In [None]:
using SpecialFunctions
@inline function cndf2(in::Array{Float64,1})
    out = 0.5 .+ 0.5 .* erf(0.707106781 .* in)
    return out
end

In [None]:
function blackscholes(sptprice::Float64,
                 strike::Vector{Float64},
                 rate::Float64,
                 volatility::Float64,
                 time::Float64)
     sqt = sqrt(time)
     put = similar(strike)
     for i = 1:size(strike, 1)
         logterm = log10(sptprice / strike[i])
         powterm = 0.5 * volatility * volatility
         den = volatility * sqt
         d1 = (((rate + powterm) * time) + logterm) / den
         d2 = d1 - den
         NofXd1 = 0.5 + 0.5 * erf(0.707106781 * d1)
         NofXd2 = 0.5 + 0.5 * erf(0.707106781 * d2)
         futureValue = strike[i] * exp(-rate * time)
         c1 = futureValue * NofXd2
         call_ = sptprice * NofXd1 - c1
         put[i] = call_ - futureValue + sptprice
     end
     put
 end

As you might have noticed, there is abundant data parallelism which we can take advantage of. 

In [None]:
function blackscholes_parallel(sptprice::Float64,
                 strike::Vector{Float64},
                 rate::Float64,
                 volatility::Float64,
                 time::Float64)
     sqt = sqrt(time)
     put = similar(strike)
     @threads for i = 1:size(strike, 1)
         logterm = log10(sptprice / strike[i])
         powterm = 0.5 * volatility * volatility
         den = volatility * sqt
         d1 = (((rate + powterm) * time) + logterm) / den
         d2 = d1 - den
         NofXd1 = 0.5 + 0.5 * erf(0.707106781 * d1)
         NofXd2 = 0.5 + 0.5 * erf(0.707106781 * d2)
         futureValue = strike[i] * exp(-rate * time)
         c1 = futureValue * NofXd2
         call_ = sptprice * NofXd1 - c1
         put[i] = call_ - futureValue + sptprice
     end
     put
 end

Let's set up our variables:

In [None]:
function run(iterations)
     sptprice   = 42.0
     initStrike = Float64[ 40.0 + (i / iterations) for i = 1:iterations ]
     rate       = 0.5
     volatility = 0.2
     time       = 0.5
 
     tic()
     put1 = blackscholes(sptprice, initStrike, rate, volatility, time)
     t1 = toq()
     println("Serial checksum: ", sum(put1))
     tic()
     put2 = blackscholes_parallel(sptprice, initStrike, rate, volatility, time)
     t2 = toq()
     println("Parallel checksum: ", sum(put2))
     return t1, t2
 end

And have a simple driver function like so. Do we see any scaling? 

In [None]:
function driver()
     srand(0)
     tic()
     iterations = 10^6
     blackscholes(0., Float64[], 0., 0., 0.)
     blackscholes_parallel(0., Float64[], 0., 0., 0.)
     println("SELFPRIMED ", toq())
     tserial, tparallel = run(iterations)
     println("Time taken for serial = $tserial")
     println("Time taken for parallel = $tparallel")
     println("Speedup over $(nthreads()) threads = $(tserial/tparallel)")
     println("Serial rate = ", iterations / tserial, " opts/sec")
     println("Parallel rate = ", iterations / tparallel, " opts/sec")
 end
 driver()