# DistributedJets.jl
Package that extends Jets to work with parallel distributed block operators.  This gives us a consistent way to book-keep distributed memory and computation.  It relies heavily on the community (public) DistributedArrays.jl package.

## Add 4 workers

In [1]:
using Distributed
addprocs(4)

4-element Array{Int64,1}:
 2
 3
 4
 5

## Add the packages we need on the workers
We add the packages we need on the workers, accomplished with the `@everywhere` macro

In [2]:
@everywhere using DistributedArrays, DistributedJets, Jets, JetPack

## Example
We use the same blockop macro as is used in `Jets`, but now supply a distributed array `DArray`, with some additional information about how the work is distributed. 

**See also** the help docs:
```julia
?DArray
?@blockop
```

In [3]:
A = @blockop DArray(I->[JopDiagonal(rand(2)) for irow in I[1], icol in I[2]], (4,4), workers(), [2,2])

"Jet linear operator, (8,) → (8,)"

#### Explanation for arguments to `DArray` used above:
* `[JopDiagonal(rand(2)) for irow in I[1], icol in I[2]]` is the *constructor* for each of the blocks in the distributed block operator. This runs *remotely* on the workers as specified below.
* `(4,4)` is the overall size of the block operator: 4 rows and 4 columns for a total of 16 elements
* `workers()` supplies the process identifiers (pids) of the workers the operators will be constructed on: our 4 workers with pids `2,3,4,5`
* `[2,2]` describes how the block operator array should be distributed in each dimension: each worker gets 2 rows and 2 columns.

#### Matrix representation of the block operator

We show the pids for the workers as assigned to the 4x4 block operator. Recall that each *cell* of the block operator is a `JotOperator`.

$$
\begin{bmatrix}
    2 & 2 & 3 & 3 \\
    2 & 2 & 3 & 3 \\
    4 & 4 & 5 & 5 \\
    4 & 4 & 5 & 5
\end{bmatrix}
$$

#### Getting information about block operator layouts
We can use various methods to understand which processes store which blocks

* `procs(A)` Shows how workers are distributed in the block operator. Note this is the layout as described by the last argument to `@blockop` above: `[2,2]`. 

* `blockmap(A)` Shows what elements of the overall block operator are assigned to each worker. This information is also shown in the matrix representation above.
    * pid 2 has row-blocks 1:2, and column blocks 1:2
    * pid 4 has row-blocks 1:2, and column blocks 3:4
    * pid 5 has row-blocks 3:4, and column blocks 1:2
    * pid 6 has row-blocks 3:4, and column blocks 3:4


* `remotecall_fetch(localblockindices, i, A)` will return the part of the `blockmap` operator assigned to pid `i`.

We exercise these methods in the next four cells below.

In [4]:
procs(A)

2×2 Array{Int64,2}:
 2  4
 3  5

In [5]:
blockmap(A)

2×2 Array{Tuple{UnitRange{Int64},UnitRange{Int64}},2}:
 (1:2, 1:2)  (1:2, 3:4)
 (3:4, 1:2)  (3:4, 3:4)

In [6]:
remotecall_fetch(localblockindices, 2, A)

(1:2, 1:2)

In [7]:
remotecall_fetch(localblockindices, 3, A)

(3:4, 1:2)

## Obtaining blocks from a distributed block operator
You can obtain the blocks of the operator in two ways. 
* `getblock(A,1,1)` fetches block 1,1, and passes a copy of it from pid 2 to the master.

* `remotecall(getblock, 2, A, 1, 1)` get a `Future` for block 1,1 from pid 2. No copy is made.

In [8]:
getblock(A,1,1)

"Jet linear operator, (2,) → (2,)"

In [9]:
remotecall(getblock, 2, A, 1, 1)

Future(2, 1, 61, nothing)

## Distributed block arrays (DBArray)
`DBArray` is used with distributed block operators, and handles the bookeeping and storage of arrays on workers associated with the distributed operators.

We show examples below for creating `DBArray` for domain and range, and getting and setting blocks.

In [10]:
d = rand(range(A))

8-element DBArray{Float64,Jets.BlockArray{Float64,Array{Float64,1}},Array{Jets.BlockArray{Float64,Array{Float64,1}},1}}:
 0.352713938381378
 0.5864307202325443
 0.9597602606519091
 0.8496153461375726
 0.5976773400801554
 0.9353798109991533
 0.0864748058510234
 0.22178338034810063

In [11]:
procs(d)

2-element Array{Int64,1}:
 2
 3

In [12]:
blockmap(d)

2-element Array{UnitRange{Int64},1}:
 1:2
 3:4

In [13]:
m = rand(domain(A))

8-element DBArray{Float64,Jets.BlockArray{Float64,Array{Float64,1}},Array{Jets.BlockArray{Float64,Array{Float64,1}},1}}:
 0.9107943594323042
 0.5100905132429301
 0.8356021294898601
 0.12272474167846648
 0.13596777833350937
 0.08861593630231002
 0.2387613615639088
 0.6378719438369456

In [14]:
procs(m)

2-element Array{Int64,1}:
 2
 4

In [15]:
blockmap(m)

2-element Array{UnitRange{Int64},1}:
 1:2
 3:4

In [16]:
# fetch block 1, and passes a copy of it from pid 2 to the master
getblock(d, 1)

2-element Array{Float64,1}:
 0.352713938381378
 0.5864307202325443

In [17]:
# passes a new array from the master to pid 2, and assigns it to block 1
setblock!(d, 1, ones(2))
d

8-element DBArray{Float64,Jets.BlockArray{Float64,Array{Float64,1}},Array{Jets.BlockArray{Float64,Array{Float64,1}},1}}:
 1.0
 1.0
 0.9597602606519091
 0.8496153461375726
 0.5976773400801554
 0.9353798109991533
 0.0864748058510234
 0.22178338034810063

In [18]:
# on pid=2 we get a reference to the block
remotecall_fetch(getblock, 2, d, 1) 

2-element Array{Float64,1}:
 1.0
 1.0

In [19]:
@everywhere function remotegetblock_mutating(d, i)
    dᵢ = getblock(d, i)
    dᵢ .= 2.0
    nothing
end
remotecall_fetch(remotegetblock_mutating, 2, d, 1)
d

8-element DBArray{Float64,Jets.BlockArray{Float64,Array{Float64,1}},Array{Jets.BlockArray{Float64,Array{Float64,1}},1}}:
 2.0
 2.0
 0.9597602606519091
 0.8496153461375726
 0.5976773400801554
 0.9353798109991533
 0.0864748058510234
 0.22178338034810063

# Specialized distributed block operators

## tall-and-skinny
Block operators with a single column-block.  This specialization is often used in FWI.  The model is stored on the master.

In [20]:
A = @blockop DArray(I->[JopDiagonal(rand(2)) for irow=1:4, icol=1:1], (4,1))

"Jet linear operator, (2,) → (8,)"

In [21]:
blockmap(A)

4×1 Array{Tuple{UnitRange{Int64},UnitRange{Int64}},2}:
 (1:1, 1:1)
 (2:2, 1:1)
 (3:3, 1:1)
 (4:4, 1:1)

In [22]:
d = rand(range(A))
blockmap(d)

4-element Array{UnitRange{Int64},1}:
 1:1
 2:2
 3:3
 4:4

In [23]:
m = rand(domain(A))

2-element Array{Float64,1}:
 0.3481928644910337
 0.7244407138265914

## Sparse block diagonal
This is the only sparse block operator that we support.  Supporting a larger variety of sparse layouts is possible, but would require an engineering effort to build a proper sparse distributed arrays package.

Below we build a sparse block diagonal with 4 rows and 4 columns, with operators along the diagonal. We use `JopZeroBlock` to specify that the off diagonals do not have operators. The distribution of pids is shown in the matrix below. 

$$
\begin{bmatrix}
    2 & 0 & 0 & 0 \\
    0 & 3 & 0 & 0 \\
    0 & 0 & 4 & 0 \\
    0 & 0 & 0 & 5
\end{bmatrix}
$$

In [24]:
A = @blockop DArray(
        I->[irow==icol ? JopDiagonal(rand(2)) : JopZeroBlock(JetSpace(Float64,2),JetSpace(Float64,2)) for irow in I[1], icol in I[2]],
        (4,4),
        workers()[1:4],
        [4,1]) isdiag=true

"Jet linear operator, (8,) → (8,)"

In [25]:
procs(A)

4×1 Array{Int64,2}:
 2
 3
 4
 5

In [26]:
blockmap(A)

4×1 Array{Tuple{UnitRange{Int64},UnitRange{Int64}},2}:
 (1:1, 1:4)
 (2:2, 1:4)
 (3:3, 1:4)
 (4:4, 1:4)

In [27]:
d = rand(range(A))
blockmap(d)

4-element Array{UnitRange{Int64},1}:
 1:1
 2:2
 3:3
 4:4

In [28]:
m = rand(domain(A))
blockmap(m)

4-element Array{UnitRange{Int64},1}:
 1:1
 2:2
 3:3
 4:4