# MPI and Elemental in Julia
## Parallel Workshop JuliaCon 2016
### `MPI.jl`
- MPI.jl provides
 - Julia wrappers for many MPI function but not yet all the newer ones. Normal script execution, e.g.
 ```
 mpirun -np 100000 julia mpiprogram.jl
 ```
 - An MPI Cluster manager for interactive execution of MPI jobs. See below

In [1]:
using MPI

INFO: Recompiling stale cache file /home/juser/.julia/lib/v0.4/MPI.ji for module MPI.


Create an `MPIManager` and use `addprocs` to launch the workers. This will automatically initialize MPI.

In [2]:
man = MPIManager(np = 8)
addprocs(man)

8-element Array{Int64,1}:
 2
 3
 4
 5
 6
 7
 8
 9

To run a command on the MPI workers, use the `@mpi_do` macro. Here, store the MPI rank in a variable and `@show` it

In [3]:
@mpi_do man @show myrank = MPI.Comm_rank(MPI.COMM_WORLD);

	From worker 9:	myrank = MPI.Comm_rank(MPI.COMM_WORLD) = 7
	From worker 8:	myrank = MPI.Comm_rank(MPI.COMM_WORLD) = 6
	From worker 2:	myrank = MPI.Comm_rank(MPI.COMM_WORLD) = 0
	From worker 3:	myrank = MPI.Comm_rank(MPI.COMM_WORLD) = 1
	From worker 7:	myrank = MPI.Comm_rank(MPI.COMM_WORLD) = 5
	From worker 4:	myrank = MPI.Comm_rank(MPI.COMM_WORLD) = 2
	From worker 6:	myrank = MPI.Comm_rank(MPI.COMM_WORLD) = 4
	From worker 5:	myrank = MPI.Comm_rank(MPI.COMM_WORLD) = 3


Allocate the vector `x` on all workers and show its values. Notice that the RNG is not syncronized accross workers.

In [4]:
@mpi_do man @show x = randn(2)

	From worker 4:	x = randn(2) = [-1.6765456730388448,1.2131824359477428]
	From worker 5:	x = randn(2) = [-1.1267065325470114,-0.29528738428502355]
	From worker 2:	x = randn(2) = [-0.19007999177859225,-1.064609349404248]
	From worker 8:	x = randn(2) = [0.9657671133584432,-1.7751556133408686]
	From worker 6:	x = randn(2) = [0.44935705433847023,1.6370970004054133]
	From worker 3:	x = randn(2) = [0.36752292128476544,1.0107201978778455]
	From worker 9:	x = randn(2) = [0.09719767718957066,1.0017534649683444]
	From worker 7:	x = randn(2) = [-1.8769314228769776,-0.6809231908781325]


Below, we show an example of `Bcast!` which is the Julia wrapper function for the collective MPI operation `MPI_Bcast`. The function is overloaded for several input types. The type of the broadcasted buffer is always determined from the Julia type and if the input argument is a Julia vector then the size determined automatically as well. Alternatively if the argument is either a vector or pointer, the length can be specified as an integer argument.

In [5]:
@mpi_do man MPI.Bcast!(x, 0, MPI.COMM_WORLD)
@mpi_do man @show x

	From worker 7:	x = [-0.19007999177859225,-1.064609349404248]
	From worker 4:	x = [-0.19007999177859225,-1.064609349404248]
	From worker 2:	x = [-0.19007999177859225,-1.064609349404248]
	From worker 9:	x = [-0.19007999177859225,-1.064609349404248]
	From worker 5:	x = [-0.19007999177859225,-1.064609349404248]
	From worker 8:	x = [-0.19007999177859225,-1.064609349404248]
	From worker 6:	x = [-0.19007999177859225,-1.064609349404248]
	From worker 3:	x = [-0.19007999177859225,-1.064609349404248]


In [9]:
@mpi_do man begin
    if myrank == 3
        MPI.Send(3.0, 2, 0, MPI.COMM_WORLD)
        elseif myrank == 2
        MPI.Recv!(x, 1, 3, 0, MPI.COMM_WORLD)
    end
end
@mpi_do man @show x

	From worker 4:	x = [3.0,-1.0162911416865608]
	From worker 6:	x = [-0.4078192531491544,-1.0162911416865608]
	From worker 7:	x = [-0.4078192531491544,-1.0162911416865608]
	From worker 3:	x = [-0.4078192531491544,-1.0162911416865608]
	From worker 5:	x = [-0.4078192531491544,-1.0162911416865608]
	From worker 8:	x = [-0.4078192531491544,-1.0162911416865608]
	From worker 9:	x = [-0.4078192531491544,-1.0162911416865608]
	From worker 2:	x = [-0.4078192531491544,-1.0162911416865608]


## `Elemental.jl`

- [Elemental](http://github.com/Elemental/elemental) is a C++ library for distributed dense linear algebra (lately also some sparse and optimization functions)
- `Elemental.jl` provides Julia wrappers for `Elemental`.
- Still alpha stage
- Two APIs
 - Thin layer on top of C++ library
 - Higher level with `DArray` interoperability

In [6]:
using Elemental

In [8]:
@mpi_do man n = 2000
@mpi_do man A = Elemental.DistMatrix(Float64)
@mpi_do man Elemental.gaussian!(A, n, n)

In [9]:
@mpi_do man @show A[1,1]

	From worker 7:	A[1,1] = 0.1361529124304835
	From worker 4:	A[1,1] = 0.1361529124304835
	From worker 8:	A[1,1] = 0.1361529124304835
	From worker 2:	A[1,1] = 0.1361529124304835
	From worker 5:	A[1,1] = 0.1361529124304835
	From worker 3:	A[1,1] = 0.1361529124304835
	From worker 9:	A[1,1] = 0.1361529124304835
	From worker 6:	A[1,1] = 0.1361529124304835


In [10]:
@time @mpi_do man vals = svdvals(A)

  4.145919 seconds (5.23 k allocations: 396.407 KB)


In [12]:
@mpi_do man @show vals[1]

	From worker 2:	vals[1] = 89.14574736698128
	From worker 7:	vals[1] = 89.14574736698128
	From worker 6:	vals[1] = 89.14574736698128
	From worker 5:	vals[1] = 89.14574736698128
	From worker 9:	vals[1] = 89.14574736698128
	From worker 3:	vals[1] = 89.14574736698128
	From worker 4:	vals[1] = 89.14574736698128
	From worker 8:	vals[1] = 89.14574736698128


In [13]:
n = 2000
A = randn(n, n);
@time @show svdvals(A)[1];

(svdvals(A))[1] = 89.15005868298738
  3.364512 seconds (90.60 k allocations: 35.763 MB, 0.07% gc time)


In [25]:
# Pkg.clone("https://github.com/andreasnoack/TSVD.jl")
# Pkg.checkout("TSVD")

INFO: Checking out TSVD master...
INFO: Pulling TSVD latest master...
INFO: No packages to install, update or remove


In [14]:
using TSVD

In [23]:
@time @mpi_do man vals = TSVD.tsvd(A, 10)

  1.552865 seconds (4.93 k allocations: 380.784 KB)
