# Parallel Computing

## Processors and threads

Every Julia session has some number of *processors* (which do not share memory), and each processor has some number of *threads* (which do share memory).

In [1]:
using Distributed

In [2]:
nprocs()

1

Processes are distinct from threads:

In [3]:
Threads.nthreads()

1

The Julia process connected to your keyboard is processor 1 (sometimes called the "master" or "head node").

`addprocs(n)` starts additional processors on your local machine, as separate OS-level processes. Processors other than 1 are sometimes called "workers".

In [4]:
myid()

1

In [5]:
addprocs(2)

2-element Array{Int64,1}:
 2
 3

In [6]:
nprocs()

3

In [7]:
nworkers()

2

In [8]:
workers()

2-element Array{Int64,1}:
 2
 3

In [9]:
rmprocs(2:3)

Task (done) @0x00007f45caa55270

In [10]:
nprocs()

1

In [11]:
addprocs(2)

2-element Array{Int64,1}:
 4
 5

## Tasks

Processors and threads represent compute resources. *Tasks* represent units of work performed by those resources.

A `Task` gives you a handle to a computation to be performed. It has a create-start-run-finish lifecycle:

* Create: `Task(thunk)` or `@task expr`. The latter is equivalent to `Task(()->expr)`.
* Start: `schedule(task)`. `@async` and `@schedule` do create+start in one step.
* Run: `current_task()` is always running. A task cooperatively pauses itself using `wait` or `yield`.
* Finish: the task's `thunk` must exit via return or exception. You can `wait(task)`.

Technically, `Task` is a *concurrency* primitve (as opposed to a *parallelism* primitive).

In [12]:
work = @task sum(rand(1000))

Task (runnable) @0x00007f45caa6f9d0

In [13]:
schedule(work)

Task (done) @0x00007f45caa6f9d0

In [14]:
fetch(work)

498.29853463014587

- `wait`: wait for something to finish
- `fetch`: wait for finish and get value (might move data)

In [19]:
@time work = @async sum(rand(100000000))

  0.000121 seconds (22 allocations: 2.048 KiB)


Task (done) @0x00007f45ca5197b0

In [20]:
fetch(work)

4.9996473433314025e7

In [18]:
@time sum(rand(100000000))

  0.609093 seconds (7 allocations: 762.940 MiB, 11.47% gc time)


5.000304858138728e7

In [21]:
using Dates
@sync for t = 'A':'F'
    @async for k = 1:10
        println("$t$k @ $(now())")
        sleep(rand())
    end
end

A1 @ 2018-10-26T13:12:06.656
B1 @ 2018-10-26T13:12:06.752
C1 @ 2018-10-26T13:12:06.752
D1 @ 2018-10-26T13:12:06.752
E1 @ 2018-10-26T13:12:06.752
F1 @ 2018-10-26T13:12:06.752
E2 @ 2018-10-26T13:12:07.239
B2 @ 2018-10-26T13:12:07.401
D2 @ 2018-10-26T13:12:07.417
E3 @ 2018-10-26T13:12:07.515
C2 @ 2018-10-26T13:12:07.545
F2 @ 2018-10-26T13:12:07.547
F3 @ 2018-10-26T13:12:07.566
D3 @ 2018-10-26T13:12:07.569
F4 @ 2018-10-26T13:12:07.613
A2 @ 2018-10-26T13:12:07.706
B3 @ 2018-10-26T13:12:07.805
C3 @ 2018-10-26T13:12:08.006
B4 @ 2018-10-26T13:12:08.079
D4 @ 2018-10-26T13:12:08.084
E4 @ 2018-10-26T13:12:08.28
A3 @ 2018-10-26T13:12:08.308
F5 @ 2018-10-26T13:12:08.438
D5 @ 2018-10-26T13:12:08.503
C4 @ 2018-10-26T13:12:08.646
B5 @ 2018-10-26T13:12:08.766
B6 @ 2018-10-26T13:12:08.936
A4 @ 2018-10-26T13:12:08.977
E5 @ 2018-10-26T13:12:08.985
F6 @ 2018-10-26T13:12:09.016
D6 @ 2018-10-26T13:12:09.104
A5 @ 2018-10-26T13:12:09.418
C5 @ 2018-10-26T13:12:09.419
B7 @ 2018-10-26T13:12:09.529
E6 @ 2018-10-26

## Communication: Conditions and Channels

Most of the time, Tasks are too low-level to use directly.

* A `Condition` represents an edge-triggered event that Tasks can `wait` for and `notify`.
* I/O operations use Conditions, wait, and notify internally.
* Parallel primitives start and manage Tasks internally.

In [22]:
event = Condition()

Condition(Any[])

In [23]:
work = @async begin
    println("I'm waiting for the event to happen.")
    wait(event)
    println("Ok, it happened.")
end

I'm waiting for the event to happen.


Task (runnable) @0x00007f45ca519f90

In [24]:
notify(event)

Ok, it happened.


1

A `Channel` is similar, but adds the ability to send data.

In [25]:
c = Channel(4)  # 4 is the queue size

Channel{Any}(sz_max:4,sz_curr:0)

In [26]:
printer = @async begin
    while true
        data = take!(c)
        println("from background: ", data)
    end
end

Task (runnable) @0x00007f45cb450010

In [27]:
put!(c, rand(2,2));

from background: [0.894978 0.904605; 0.807699 0.893968]


In [28]:
put!(c, 1);

from background: 1


In [29]:
put!(c, "hello");

from background: hello


A channel needs to be `close`d eventually.

## Putting it together: remote execution

`addprocs` sets up Tasks on all processors ready to run code for you.

A single remote function call can be done with `remotecall`, which returns a `Future`:

In [30]:
map([1,2,3]) do x
    y = 2x
    return y+1
end

3-element Array{Int64,1}:
 3
 5
 7

In [32]:
futuremat = remotecall(rand, 4, 3, 3)

Future(4, 1, 6, nothing)

In [33]:
fetch(futuremat)

3×3 Array{Float64,2}:
 0.934434  0.0265968  0.409198
 0.121118  0.529621   0.119145
 0.965964  0.705946   0.388846

In [34]:
isready(futuremat)

true

In [35]:
@async begin
    wait(futuremat)
    println("got it, callback goes here")
    # ......
end

got it, callback goes here


Task (done) @0x00007f45caa547f0

`fetch` should be avoided, because it moves data. However, it's efficient when used on the process that owns the data.

`@spawnat` rewrites to `remotecall`, similar to `@async` for `schedule` etc.

In [36]:
futuremat_plus_1 = @spawnat 4 1 .+ fetch(futuremat)

Future(4, 1, 8, nothing)

In [37]:
fetch(futuremat_plus_1)

3×3 Array{Float64,2}:
 1.93443  1.0266   1.4092 
 1.12112  1.52962  1.11915
 1.96596  1.70595  1.38885

`RemoteChannel` works similarly to `Channel`, but can be used by any process.

In [38]:
rc = RemoteChannel()

RemoteChannel{Channel{Any}}(1, 1, 10)

In [39]:
@spawnat 4 begin
    @show rc
    while true
        val = take!(rc)
        println(val)
    end
end

Future(4, 1, 11, nothing)

      From worker 4:	rc = RemoteChannel{Channel{Any}}(1, 1, 10)


In [40]:
put!(rc, "test")

RemoteChannel{Channel{Any}}(1, 1, 10)

      From worker 4:	test


In [41]:
put!(rc, rand(2,2))

RemoteChannel{Channel{Any}}(1, 1, 10)

      From worker 4:	[0.684862 0.0216064; 0.28462 0.0379759]


In [42]:
close(rc)

In [43]:
put!(rc, 1)

InvalidStateException: InvalidStateException("Channel is closed.", :closed)

## Spawn and sync

* `@spawn` is like `@spawnat` but picks a processor for you
* `@sync` waits for all lexically-enclosed `@spawn`, `@spawnat`, and `@async`

#### Example

Recipe for parallel-loading many files from a shared file system:
```jl
dataset = [(@spawn load(file)) for file in files]
results = [(@spawnat x.where f(fetch(x))) for x in dataset]
```

In [44]:
futuremat_plus_1.where

4

In [45]:
@everywhere println(myid())

1
      From worker 4:	4
      From worker 5:	5


## Loading code

Use this recipe:

1. `using Thing1, Thing2` for packages used only on process 1
2. `addprocs()`
3. `using Thing3` for packages needed everywhere
4. Any time: `include(...)` is always local-only; `@everywhere include(...)` for global

**Note:** if you need to send an object of type T to another processor, the package defining T must be loaded there.

## Demo: "crowdsourced" pi estimation

Server code:

```jl
using Sockets, Serialization

function estimate_pi(in_circle, N)
    4in_circle / N
end

function run_server(n=10^8)
    results = Channel(1)
    @async begin
        srvr = listen(IPv4(0), 8000)
        while true
            sock = accept(srvr)
            serialize(sock, n)
            @async begin
                put!(results, deserialize(sock))
            end
        end
    end

    running_tally = (0, 0)
    @async while true
        c, trials = take!(results)
        C, total_trials = C+c, total_trials+trials
        println("Total samples: ", total_trials)
        println("Current estimate: ", estimate_pi(C, total_trials))
    end
end

run_server()
wait()
```

In [46]:
# Client code
using Sockets, Serialization

function trials(numsteps=1000)  # default value of the parameter
    pos = 0 
    for j in 1:numsteps
        pos += Int(rand()^2 + rand()^2 < 1)
    end
    return pos
end

trials (generic function with 2 methods)

In [53]:
for i = 1:10
@async begin
    c = connect("anubis.juliacomputing.io", 8000)
    n = deserialize(c)    # <--- Block wait for a request
    serialize(c, (trials(n), n))
    close(c)
end
end