# RTM Example
We will perform RTM using the following steps:
1. Read the 10m resampled models output from the FWI notebook
    * true model
    * initial model
    * fwi model
2. Visualize the models 
3. Build a small local compute cluster (2 workers)
4. Create list of shot locations 
5. Define the migrateshot , timemute! and stacking functions
6. Run the migration and then stack the data we wrote to disk
7. Perform a little post migration filtering
8. Visualize Results

#### Note on runtime
This notebook takes approximately 1/2 hour to run for 24 shots with two workers on an Intel 8168.

`lscpu` CPU information: `Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz`

## Add required packages

In [2]:
using Distributed, PyPlot, Jets, JetPack, JetPackDSP, Printf

## Record time for start of notebook

In [3]:
time_beg = time()

1.602866008874156e9

## 1. Read the models output from the FWI notebook

In [4]:
file1 = "../50_fwi/marmousi_resampled_10m_349x1701_vtrue.bin"

nz,nx = 349,1701
dz,dx = 10.0,10.0

v1_orig = read!(file1, Array{Float32}(undef,nz,nx));

LoadError: SystemError: opening file "../50_fwi/marmousi_resampled_10m_349x1701_vtrue.bin": No such file or directory

####  Apply a 3x3 convolutional smoother
We perform the RTM migration in a slightly smoothed model.

In [None]:
ns = 21
P = JopPad(JetSpace(Float32,nz,nx), -ns:nz+ns, -ns:nx+ns, extend=true)
M = JopMix(range(P), (7,7))
R = JopPad(JetSpace(Float32,nz,nx), -ns:nz+ns, -ns:nx+ns, extend=false)

s1 = R' ∘ M ∘ P * (1 ./ v1_orig)

v1 = 1 ./(s1);

## 2. Visualize the model

In [None]:
figure(figsize=(9,12)); clf()

imshow(v1,aspect="auto",cmap="jet");
colorbar(orientation="vertical");clim(vmin,vmax);
title("True Velocity")

## 3. Build a small local compute cluster (2 workers) 

#### Setup OMP environment variables for the cluster
Note we need to do this because we are using multiple workers on the same physical node, and without setting up thread affinity the modeling will be *incredibly* slow.

Because we set `ENV["OMP_DISPLAY_ENV"] = "true"`, when the "cluster spins up" by calling `addprocs()` below you will see the OMP environment printed out on each worker. You can verify that half of the threads are assigned to the first half of the physical cpus, and similarly for the second half, by looking at the value of the `OMP_PLACES` variable.

In [None]:
nthread = Sys.CPU_THREADS
ENV["OMP_DISPLAY_ENV"] = "true"
ENV["OMP_PROC_BIND"] = "close"
ENV["OMP_NUM_THREADS"] = "$(div(nthread,2))" 
addprocs(2)
@show workers()
for k in 1:nworkers()
    place1 = (k - 1) * div(nthread,nworkers())
    place2 = (k + 0) * div(nthread,nworkers()) - 1
    @show place1, place2, nthread
    @spawnat workers()[k] ENV["GOMP_CPU_AFFINITY"] = "$(place1)-$(place2)";
end

In [None]:
@everywhere using DistributedArrays, DistributedJets, DistributedOperations, Jets, JetPack, WaveFD, JetPackWaveFD, Random, LinearAlgebra, Schedulers

## 4. Create list of shot locations 
We use 100 shot locations, many times than our FWI example, and run at significantly higher frequency. 

In [None]:
nshots = 100
sx = round.(Int,collect(range(0,stop=(nx-1)*dx,length=nshots)))
@show nshots;
@show sx;

## 5. Define the migrateshot and stack functions
### migrateshot is the writes the image and illumination files to our scratch disk
### stack reads in the specified shot from disk and stacks them

#### Note on scratch space for temporary files
When dealing with serialized nonlinear wavefields as in this example, we need to specify the location where scratch files will be written.

You may need to change this to point to a temporary directory available on your system.

In [None]:
@everywhere scratch = "/mnt/scratch"
@assert isdir(scratch)

In [None]:
@everywhere begin
    ntrec = 3001
    dtrec = 0.002
    dtmod = 0.001
end

### Build the migrate shot function which is the work that needs to be done for 1 shot

In [None]:
@everywhere function migrateshot(isrc,nz,nx,dz,dx,_vtrue,_v,sx)
    @info "migrating shot $(isrc) on $(gethostname()) with id $(myid())..."
    F = JopNlProp2DAcoIsoDenQ_DEO2_FDTD(;
        b = ones(Float32,nz,nx),
        nthreads = div(Sys.CPU_THREADS,2),
        ntrec = ntrec,
        dtrec = dtrec,
        dtmod = dtmod,
        dz = dz,
        dx = dx,
        wavelet = WaveletCausalRicker(f=10.0),
        sx = sx[isrc],
        sz = dz,
        rx = dx*[0:1:nx-1;],
        rz = 2*dz*ones(length(0:1:nx-1)),
        nbz_cache = nz,
        nbx_cache = 16,
        comptype = UInt32,
        srcfieldfile = joinpath(scratch, "field-$isrc-$(randstring()).bin"),
        reportinterval=12001)

    d = F*localpart(_vtrue) #here we model the data usually you would just read the dat
    timemute!(F,d,1500,2/16) #mute out the direct and diving waves
    J = jacobian!(F, localpart(_v))
    illum = srcillum(J)
    m = J'*d
    close(F) #delete scratch files that we don't need anymore
    @info "writing image and illumination for shot $(isrc)"
    write(joinpath(scratch,"image_$(isrc).bin"),m)
    write(joinpath(scratch,"illum_$(isrc).bin"),illum)
    @info "done migrating shot $(isrc) on $(gethostname()) with id $(myid())..."
end

### A simple mute function based on the Marmousi water velocity of 1500 m/s and the source/receiver coordinates from the traces headers.

In [None]:
@everywhere function timemute!(F, d, watervel, tmute)
    for i = 1:length(state(F, :rx)) 
        rx = state(F, :rx)
        rz = state(F, :rz)
        sx = state(F, :sx)
        sz = state(F, :sz)
        dist = sqrt((sx[1] - rx[i])^2 + (sz[1] - rz[i])^2)
        time = dist / watervel
        tbeg = 1
        tend = round(Int, (time + tmute) / state(F,:dtrec))
        tend = clamp(tend,1,size(d,1))
        d[tbeg:tend,i] .= 0
    end
    nothing
end

### A function that reads the migrated images and illuminations from disk and stacks them

In [None]:
function stack(shots,nz,nx)
   img = zeros(Float32,nz,nx)
   ill = zeros(Float32,nz,nx)
   for isrc in shots
      img += read!(joinpath(scratch,"image_$(isrc).bin"), Array{Float32}(undef,nz,nx));
      ill += read!(joinpath(scratch,"illum_$(isrc).bin"), Array{Float32}(undef,nz,nx));
   end
   return img,ill
end

## 6. Run the migration and then stack the data we wrote to disk
### Here we use epmap to schedule the work for the migration

In [None]:
#broadcast the models to the works
_v1 = bcast(v1)

### True model migration/stack

In [None]:
t1 = @elapsed begin
    epmap(i->migrateshot(i, nz,nx,dz,dx,_v1,_v1,sx), 1:nshots)
end
@show t1;

In [None]:
@printf("Time for migrating the dta %.2f minutes\n", t1 / 60)

In [None]:
shots = collect(1:nshots)
m1, illum1 = stack(shots,nz,nx);

## 7. Perform a little post migration filtering

#### Laplacian filter to remove backscattered noise

In [None]:
L = JopHighpass(JetSpace(Float32,nz,nx))

#### Apply low cut filter, illumination compensation, and gain

In [None]:
g = ([0:(nz-1);]*dz).^2 * ones(1,nx);

img1 = g .* (L * m1) ./ (illum1 .+ 1e-8 * maximum(abs, illum1));

@show extrema(img1)

#### Apply water bottom mute

In [None]:
img1[v1.==1500.0] .= 0;

## 9. Visualize Results

In [None]:
mrms1 = 2.5 * sqrt(norm(img1)^2 / length(img1))

figure(figsize=(9,12)); clf()

imshow(img1,aspect="auto",cmap="gray");
colorbar(orientation="vertical");clim(-mrms1,+mrms1);
title("Migration in True Velocity")

## Remove workers

In [None]:
rmprocs(workers())

In [None]:
time_end = time()
@sprintf("Time to run notebook; %.2f minutes\n", (time_end - time_beg) / 60)