# Parallel Computing

## Processors and threads

Every Julia session has some number of *processors* (which do not share memory), and each processor has some number of *threads* (which do share memory).

In [None]:
using Distributed

In [None]:
nprocs()

Processes are distinct from threads:

In [None]:
Threads.nthreads()

The Julia process connected to your keyboard is processor 1 (sometimes called the "master" or "head node").

`addprocs(n)` starts additional processors on your local machine, as separate OS-level processes. Processors other than 1 are sometimes called "workers".

In [None]:
myid()

In [None]:
addprocs(2)

In [None]:
nprocs()

In [None]:
nworkers()

In [None]:
workers()

## Tasks

Processors and threads represent compute resources. *Tasks* represent units of work performed by those resources.

A `Task` gives you a handle to a computation to be performed. It has a create-start-run-finish lifecycle:

* Create: `Task(thunk)` or `@task expr`. The latter is equivalent to `Task(()->expr)`.
* Start: `schedule(task)`. `@async` and `@schedule` do create+start in one step.
* Run: `current_task()` is always running. A task cooperatively pauses itself using `wait` or `yield`.
* Finish: the task's `thunk` must exit via return or exception. You can `wait(task)`.

Technically, `Task` is a *concurrency* primitve (as opposed to a *parallelism* primitive).

In [None]:
work = @task sum(rand(1000))

In [None]:
schedule(work)

In [None]:
fetch(work)

- `wait`: wait for something to finish
- `fetch`: wait for finish and get value (might move data)

In [None]:
@time work = @async sum(rand(100000000))

In [None]:
fetch(work)

In [None]:
@time sum(rand(100000000))

In [None]:
using Dates
@sync for t = 'A':'F'
    @async for k = 1:10
        println("$t$k @ $(now())")
        sleep(rand())
    end
end

## Communication: Conditions and Channels

Most of the time, Tasks are too low-level to use directly.

* A `Condition` represents an edge-triggered event that Tasks can `wait` for and `notify`.
* I/O operations use Conditions, wait, and notify internally.
* Parallel primitives start and manage Tasks internally.

In [None]:
event = Condition()

In [None]:
work = @async begin
    println("I'm waiting for the event to happen.")
    wait(event)
    println("Ok, it happened.")
end

In [None]:
notify(event)

A `Channel` is similar, but adds the ability to send data.

In [None]:
c = Channel(4)  # 4 is the queue size

In [None]:
printer = @async begin
    while true
        data = take!(c)
        println("from background: ", data)
    end
end

In [None]:
put!(c, rand(2,2));

In [None]:
put!(c, 1);

In [None]:
put!(c, "hello");

A channel needs to be `close`d eventually.

## Putting it together: remote execution

`addprocs` sets up Tasks on all processors ready to run code for you.

A single remote function call can be done with `remotecall`, which returns a `Future`:

In [None]:
futuremat = remotecall(rand, 2, 3, 3)

In [None]:
fetch(futuremat)

In [None]:
isready(futuremat)

In [None]:
@async begin
    wait(futuremat)
    println("got it, callback goes here")
    # ......
end

`fetch` should be avoided, because it moves data. However, it's efficient when used on the process that owns the data.

`@spawnat` rewrites to `remotecall`, similar to `@async` for `schedule` etc.

In [None]:
futuremat_plus_1 = @spawnat 2 1 .+ fetch(futuremat)

In [None]:
fetch(futuremat_plus_1)

`RemoteChannel` works similarly to `Channel`, but can be used by any process.

In [None]:
rc = RemoteChannel()

In [None]:
@spawnat 2 begin
    @show rc
    while true
        val = take!(rc)
        println(val)
    end
end

In [None]:
put!(rc, "test")

In [None]:
put!(rc, rand(2,2))

## Spawn and sync

* `@spawn` is like `@spawnat` but picks a processor for you
* `@sync` waits for all lexically-enclosed `@spawn`, `@spawnat`, and `@async`

#### Example

Recipe for parallel-loading many files from a shared file system:
```jl
dataset = [(@spawn load(file)) for file in files]
results = [(@spawnat x.where f(fetch(x))) for x in dataset]
```

In [None]:
futuremat_plus_1.where

## Loading code

Use this recipe:

1. `using Thing1, Thing2` for packages used only on process 1
2. `addprocs()`
3. `using Thing3` for packages needed everywhere
4. Any time: `include(...)` is always local-only; `@everywhere include(...)` for global

**Note:** if you need to send an object of type T to another processor, the package defining T must be loaded there.

## Demo: "crowdsourced" pi estimation

Server code:

```
using Sockets, Serialization

function estimate_pi(in_circle, N)
    4in_circle / N
end

function run_server(n=10^8)
    results = Channel(1)
    @async begin
        srvr = listen(IPv4(0), 8000)
        while true
            sock = accept(srvr)
            serialize(sock, n)
            @async begin
                put!(results, deserialize(sock))
            end
        end
    end

    running_tally = (0, 0)
    @async while true
        c, trials = take!(results)
        C, total_trials = C+c, total_trials+trials
        println("Total samples: ", total_trials)
        println("Current estimate: ", estimate_pi(C, total_trials))
    end
end

run_server()
wait()
```

In [None]:
# Client code
using Sockets, Serialization

function trials(numsteps=1000)  # default value of the parameter
    pos = 0 
    for j in 1:numsteps
        pos += Int(rand()^2 + rand()^2 < 1)
    end
    return pos
end

In [None]:
@async begin
    c = connect("anubis.juliacomputing.io", 8000)
    n = deserialize(c)    # <--- Block wait for a request
    serialize(c, (trials(n), n))
    close(c)
end