# Pleasingly parallel simulations

**Pleasingly parallel** simulations (also called "embarrassingly parallel") consist of simulations in which each process / worker is executing the same code independently, with no communication required between workers. It is the simplest type of parallelism, and also probably the most common.

In [1]:
addprocs(2)

2-element Array{Int64,1}:
 2
 3

In [2]:
@everywhere using DistributedArrays

In [3]:
@everywhere function walk(numsteps)
    pos = 0

    for j in 1:numsteps
        
        if rand(Bool)  # NB
            step = -1
        else
            step = +1
        end
        
        pos += step # ifelse(rand() < 0.5, -1, +1)
    end
    
    return pos
end

In serial:

In [4]:
numwalkers = 1000
numsteps = 1000

1000

In [5]:
data = [walk(numsteps) for i in 1:numwalkers] 

1000-element Array{Int64,1}:
  -4
 -32
 -30
   2
  -2
 -10
 -20
   4
  70
  30
   4
  30
  24
   ⋮
   6
 -38
 -12
  22
 -38
 -66
  18
  12
 -32
 -16
   0
 -20

Let's make a distributed array that will divide up the walkers' indices between the available workers:

In [6]:
walkers = @DArray [i for i in 1:numwalkers]

1000-element DistributedArrays.DArray{Int64,1,Array{Int64,1}}:
    1
    2
    3
    4
    5
    6
    7
    8
    9
   10
   11
   12
   13
    ⋮
  989
  990
  991
  992
  993
  994
  995
  996
  997
  998
  999
 1000

In [7]:
walkers.indexes

2-element Array{Tuple{UnitRange{Int64}},1}:
 (1:500,)   
 (501:1000,)

Let's send out the information about how many steps and walkers to do:

In [14]:
@everywhere begin
    numsteps   = 10000
    numwalkers = 100000 
end

We will now use a "trick": we wish to run the same function (`walk`) for each walker.
But the function does not really need the information about *which* walker (index) it is. So we can use a function that takes no arguments:

A function that takes zero arguments and returns a constant:


In [14]:
f() = 3  

f (generic function with 1 method)

An anonymous function that takes one argument, that it ignores, and returns a constant:

In [15]:
_ -> 3

(::#19) (generic function with 1 method)

An anonymous function that returns something random:

In [16]:
g = _ -> walk(numsteps)   # _ means that the argument is ignored

(::#21) (generic function with 1 method)

In [9]:
g(1)

46

In [11]:
g(17)

-18

In [12]:
g("string")

-44

Using an anonymous function to generate a vector of random numbers:

In [13]:
h = _ -> rand()
map(h, 1:5)

5-element Array{Float64,1}:
 0.608136
 0.300646
 0.919476
 0.670608
 0.356738

In [17]:
@time positions = map( _ -> walk(numsteps), walkers)

  0.121649 seconds (84.77 k allocations: 4.563 MiB, 6.54% gc time)


1000-element DistributedArrays.DArray{Int64,1,Array{Int64,1}}:
 -18
  58
  10
 -34
  -6
  34
  32
  36
 -26
  20
 -18
   4
   6
   ⋮
 -26
 -46
   4
 -20
   4
 -20
 -76
  16
  24
 -20
 -22
  -8

In [18]:
@everywhere begin
    numsteps   = 10000
    numwalkers = 100000 
end

In [21]:
function run_serial(numwalkers, numsteps)
    walkers = 1:numwalkers
    data = map(_ -> walk(numsteps), 1:numwalkers)
end

function run_distributed(numwalkers, numsteps)
    walkers = distribute([1:numwalkers;])
    data = map(_ -> walk(numsteps), walkers)
end

run_distributed (generic function with 1 method)

In [23]:
run_serial(1, 1)
run_distributed(1, 1)

1-element DistributedArrays.DArray{Int64,1,Array{Int64,1}}:
 1

In [24]:
@time run_serial(numwalkers, numsteps)
@time run_distributed(numwalkers, numsteps)

  2.231796 seconds (6 allocations: 781.484 KiB)
  1.095464 seconds (2.93 k allocations: 1.699 MiB)


100000-element DistributedArrays.DArray{Int64,1,Array{Int64,1}}:
   34
 -124
   40
   48
   46
  160
    6
  -96
  -70
  112
  -84
   40
 -104
    ⋮
    2
 -124
 -150
  -28
  124
  -60
  -36
   48
   60
  144
   14
   42

We see a 2x speedup with 2 processes!

# Another example: random matrices

In [None]:
workers()

In [None]:
# addprocs(4)

In [None]:
@everywhere begin
    using DistributedArrays
    using StatsBase
    using Plots
end

In [None]:
@everywhere function stochastic(β = 2, n = 200)
    h = n ^ -(1/3)
    x = 0:h:10
    N = length(x)
    d = (-2 / h^2 .- x) + 2*sqrt(h*β) * randn(N) # diagonal
    e = ones(N - 1) / h^2                     # subdiagonal
  
    eigvals(SymTridiagonal(d, e))[N]        # smallest negative eigenvalue
end

Serial version:

In [None]:
println("Serial version")

t = 10000
p = plot()
for β = [1,2,4,10,20]
    
    z = fit(Histogram, [stochastic(β) for i = 1:t], -4:0.01:1).weights
    plot!(midpoints(-4:0.01:1), z / sum(z) / 0.01)
end
p

A related parallel construct: `@parallel`. This does a "reduce" operation.

In [None]:
println("@parallel version")

@everywhere t = 10000

p = plot()

for β = [1,2,4,10,20]
    
    z = @parallel (+) for p = 1:nprocs()
        fit(Histogram, [stochastic(β) for i = 1:t], -4:0.01:1).weights
    end
    
    plot!(midpoints(-4:0.01:1), z / sum(z) / 0.01)
end

p

In [None]:
function dhist(x; closed=:left, nbins=10)
    
    hist_parts = DArray(p->fit(Histogram, localpart(x), closed=closed, nbins=nbins).weights, (nbins*length(x.pids),))
    
    reduce(+, map(pid -> @fetchfrom(pid, localpart(hist_parts)), hist_parts.pids))
      
end

In [None]:
a = randn(10000)
d = distribute(a)

dhist(d)

## SharedArrays and threads

Alternative techniques are SharedArrays and threads.

https://docs.julialang.org/en/stable/manual/parallel-computing