In [1]:
versioninfo()

Julia Version 1.3.1
Commit 2d5741174c* (2019-12-30 21:36 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)


In [2]:
using Distributed

# Computational Physics
## Parallel Programming
### Why?

1. Make things fast (not today)
2. Solver larger problems

### Terminology

- "Supercomputer": many nodes (10,000)
- - "Compute node": a workstation, with a fast network
- - - "Memory" (e.g. 200 GByte)
- - - "Cores" (e.g. 40 cores)

"Multi-threading" (easy): using one node
- one single data structure
- many "threads" ("workers") working simultaneously

"Distributed computing" (difficult): combining many nodes
- break data structures into pieces
- many "processes" ("workers") working simultaneously

Example: Solve linear system

### State of the art

Fortran, C, C++: multi-threading is much easier than distributed computing

!$omp parallel do
do i = 1, 1000000
   ... do stuff ...
end do

Python, Julia: multi-threading is not yet supported
- except for external libraries, e.g. for linear algebra
- or for machine learning!

I hope that this is soon possible:
@parallel for i in 1:1000000
    ... do stuff ...
end



Fortran, C, C++: distributed computing uses MPI standard (Message Passing Interface)

Julia: distributed computing is built in (today)

## Distributed Computing in Julia

In [3]:
using Distributed

In [5]:
nworkers()

1

In [6]:
addprocs(4)

4-element Array{Int64,1}:
 2
 3
 4
 5

In [7]:
nworkers()

4

In [8]:
workers()

4-element Array{Int64,1}:
 2
 3
 4
 5

In [37]:
@everywhere function task(from::Int, to::Int)::Float64
    s = 0.0
    for i in from:to
        s += i
    end
    s
end

In [38]:
@time task(0, 999999999)

  1.533484 seconds (5.36 k allocations: 258.556 KiB)


4.99999999067109e17

In [39]:
r = @time remotecall(task, 2, 0, 999999999)

  0.000641 seconds (38 allocations: 1.375 KiB)


Future(2, 1, 18, nothing)

In [40]:
r[]

4.99999999067109e17

In [47]:
function count1(n::Int)::Float64
    p = 4
    np = div(n, p)
    @assert mod(n, p) == 0
    s = 0.0
    for i in 1:p
        s += task((i-1)*np+1, i*np)
    end
    s
end

count1 (generic function with 1 method)

In [54]:
function count(n::Int)::Float64
    p = nworkers()
    np = div(n, p)
    @assert mod(n, p) == 0
    futures = Future[]
    for i in 1:p
        push!(futures, remotecall(task, workers()[i], (i-1)*np+1, i*np))
    end
    s = 0.0
    for f in futures
        s += f[]
    end
    s
end

count (generic function with 1 method)

In [56]:
@time count1(10000000000)
@time count(10000000000)

 14.960530 seconds (5 allocations: 176 bytes)
  3.789481 seconds (786 allocations: 29.266 KiB)


5.000000000007186e19

In [57]:
14.960530 / 3.789481

3.9479100172292725

In [58]:
versioninfo()

Julia Version 1.3.1
Commit 2d5741174c* (2019-12-30 21:36 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)


In [60]:
]add Hwloc

[32m[1m Resolving[22m[39m package versions...
[32m[1m Installed[22m[39m Loess ─────────── v0.5.1
[32m[1m Installed[22m[39m OpenSpecFun_jll ─ v0.5.3+3
[32m[1m Installed[22m[39m PDMats ────────── v0.9.12
[32m[1m Installed[22m[39m Hwloc ─────────── v1.0.3
[32m[1m Installed[22m[39m ArrayInterface ── v2.5.1
[32m[1m  Updating[22m[39m `~/.julia/environments/v1.3/Project.toml`
 [90m [0e44f5e4][39m[92m + Hwloc v1.0.3[39m
[32m[1m  Updating[22m[39m `~/.julia/environments/v1.3/Manifest.toml`
 [90m [4fba245c][39m[93m ↑ ArrayInterface v2.5.0 ⇒ v2.5.1[39m
 [90m [0e44f5e4][39m[92m + Hwloc v1.0.3[39m
 [90m [4345ca2d][39m[93m ↑ Loess v0.5.0 ⇒ v0.5.1[39m
 [90m [efe28fd5][39m[93m ↑ OpenSpecFun_jll v0.5.3+2 ⇒ v0.5.3+3[39m
 [90m [90014a1f][39m[93m ↑ PDMats v0.9.11 ⇒ v0.9.12[39m
[32m[1m  Building[22m[39m Hwloc → `~/.julia/packages/Hwloc/1kB0k/deps/build.log`


In [64]:
using Hwloc
topology = Hwloc.topology_load()
println("Machine topology:")
print(topology)

Machine topology:
D0: L0 P0 Machine  
    D1: L0 P0 Package  
        D2: L0 P-1 L3Cache  Cache{size=14417920,depth=3,linesize=64,associativity=11,type=Unified}
            D3: L0 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L0 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L0 P0 Core  
                        D6: L0 P0 PU  
                        D6: L1 P20 PU  
            D3: L1 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L1 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L1 P4 Core  
                        D6: L2 P2 PU  
                        D6: L3 P22 PU  
            D3: L2 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L2 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
 