# JULIA MPI First Example: pi computaton

First step was to load MPI on my mac.  Seems mpich and openmpi are two reasonable choices
with probably no beginner's reason to prefer one over the other. <br>

I did  <t> brew install gcc </t> first to get the gcc compiler.  I ran into problems.  
The magic thing that told me what to do was <t> brew doctor </t>.  It wanted me to type
<t> xcode-select --install </t> and when I did, all was good.  I then typed
<t> brew install mpich </t> and mpi was just working.

My first example was to reproduce <a href="http://www.mcs.anl.gov/research/projects/mpi/tutorial/mpiexmpl/src/pi/C/solution.html">
the classic mypi </a> in the notebook

In [1]:
using Distributed
using MPI

In [2]:
m = MPIManager(np=4)

MPI.MPIManager(np=4,launched=false,mode=MPI_ON_WORKERS)

In [3]:
addprocs(m)
#@mpi_do m comm = MPI.COMM_WORLD

4-element Array{Int64,1}:
 2
 3
 4
 5

In [4]:
@mpi_do m comm = MPI.COMM_WORLD
#
# Enter number of intervals, and tell every processor
# Traditional MPI would do this with a BCAST
 @mpi_do m n = rand()

In [5]:
# Let's see if the processors got it
@mpi_do m println(n)

      From worker 2:	0.2148202830106598
      From worker 3:	0.8744544203099425
      From worker 4:	0.2453806602381643
      From worker 5:	0.63878792561216


In [6]:
# my MPI id
@mpi_do m myid = MPI.Comm_rank(comm)
@mpi_do m println(myid)

      From worker 5:	3
      From worker 3:	1
      From worker 2:	0
      From worker 4:	2


In [7]:
# Get the number of processors
@mpi_do m np = MPI.Comm_size(comm)
@mpi_do m println(np)

      From worker 3:	4
      From worker 2:	4
      From worker 5:	4
      From worker 4:	4


Compute $\int_0^1 4/(1+x^2) dx= 4 atan(x)]_0^1$ which evaluates to $\pi$

In [8]:
using Interact

In [9]:
@time @mpi_do m mypi = let
    n = 50_000_000
    comm = MPI.COMM_WORLD
    s = 0.0
    for i = MPI.Comm_rank(comm) + 1 : MPI.Comm_size(comm) : n 
        x = (i - .5)/n 
        s += 4/(1 + x^2) 
    end
    mypi = s/n
    our_π = MPI.Reduce(mypi, MPI.SUM, 0, comm)
    if myid==0
        println(our_π - π) 
    end
    mypi
end

      From worker 2:	1.1146639167236572e-13
  1.077969 seconds (111.48 k allocations: 5.463 MiB)


In [10]:
[( @fetchfrom i π-4*mypi, π  ) for i in workers()] 

4-element Array{Tuple{Float64,Irrational{:π}},1}:
 (-5.99999570027876e-8, π = 3.1415926535897...)  
 (-2.0000185063651088e-8, π = 3.1415926535897...)
 (1.9999810252357975e-8, π = 3.1415926535897...) 
 (5.999988594851402e-8, π = 3.1415926535897...)  

In [11]:
function f_serial()
    n = 50_000_000
    h = 1/n
    our_π = 0.0
    for i = 0:h:1
        our_π += 4/(1 + i^2)
    end
    our_π*h
end

function f_serial2(n)
    our_π = 0.0
    for i = 1:n
        x = (i - 0.5)/n
        our_π += 4/(1 + x^2)
    end
    our_π/n
end

f_serial2 (generic function with 1 method)

In [12]:
f_serial() #warmup
f_serial()
f_serial2(50_000_000) #warmup
@time f_serial2(50_000_000)

  0.102584 seconds (5 allocations: 176 bytes)


3.1415926535895617

In [13]:
function f_par(n)

 @mpi_do m begin
    comm = MPI.COMM_WORLD
       
    s = 0.0
    for i = MPI.Comm_rank(comm) + 1 : MPI.Comm_size(comm) : $n 
        x = (i - .5)/$n 
        global s += 4/(1 + x^2) 
    end
    mypi = s/$n
    our_π = MPI.Reduce(mypi, MPI.SUM, 0, comm)
    #if myid==0
     #   println(our_π - π) 
   # end
end
@fetchfrom 2 our_π   
end

f_par (generic function with 1 method)

In [14]:
@mpi_do m function _pi_sum_par(n)
    comm = MPI.COMM_WORLD

    s = 0.0
    for i = MPI.Comm_rank(comm) + 1 : MPI.Comm_size(comm) : n
        x = (i - .5)/n 
        s += 4/(1 + x^2) 
    end
    mypi = s/n
    our_π = MPI.Reduce(mypi, MPI.SUM, 0, comm)
    return our_π
end
function f_par2(n)
    @mpi_do m tmp = _pi_sum_par($n)
    @fetchfrom 2 tmp
end
f_par(50_000_000) #warmup
f_par(50_000_000)
f_par2(50_000_000) #warmup
@time f_par2(50_000_000)

  0.028434 seconds (452 allocations: 29.531 KiB)


3.1415926535899046

In [15]:
π

π = 3.1415926535897...

In [16]:
[f_par2(10^k) for k=3:9] .- π

7-element Array{Float64,1}:
  8.333333312293689e-8  
  8.333307377483834e-10 
  8.323119971009874e-12 
  1.1013412404281553e-13
 -1.0702549957386509e-13
  4.2366110619695974e-13
 -2.531308496145357e-14 

In [11]:
@mpi_do m using Elemental
@mpi_do m using LinearAlgebra
@mpi_do m A = Elemental.DistMatrix(Float64)
@mpi_do m Elemental.gaussian!(A, 1000, 1000)

In [39]:
n=4000
B = randn(n,n)
using LinearAlgebra
@time svdvals(B);

 13.969891 seconds (15 allocations: 124.390 MiB, 0.10% gc time)


In [38]:
n = 4000
@mpi_do m Elemental.gaussian!(A, $n,$n)
@time @mpi_do m s = svdvals(A)
#@mpi_do m println(s)
#@mpi_do m println(size(U))

 12.310227 seconds (862 allocations: 49.121 KiB)


UndefVarError: UndefVarError: A not defined