# Estimating the correlation length 

* The correlation length $L$ determines whether two points separated by a given distance are correlated or not.
* `DIVAnd` includes several tools to estimate the correlation length.
* We will start with a 2D case and consider the 3D case.
* This notebook can run on multiple CPU threads (see [01-notebooks-basics.ipynb](../1-Intro/01-notebooks-basics.ipynb))

In [None]:
using DIVAnd
using PyPlot
using Dates
using Statistics
using Random
using Printf

┌ Info: Precompiling DIVAnd [efc8151c-67de-5a8f-9a35-d8f54746ae9d]
└ @ Base loading.jl:1273
┌ Info: Precompiling PyPlot [d330b81b-6aea-500a-939a-2ce795aea3ee]
└ @ Base loading.jl:1273


# Data reading

The file `WOD-Salinity-Provencal.nc` contains salinity measurement obtained from the [World Ocean Database](https://www.nodc.noaa.gov/OC5/WOD/pr_wod.html). for the Provençal Basin (Mediterranean Sea). The profiles were not interpolated vertically.

The resulting correlation length can fluctuate a bit between runs, because the correlation is based on a collection of random pairs. 

## 2D case

* First lets consider only the data on a 2D surface (longitude and latitude)
* Load the data file if it is not already present.

In [None]:
varname = "Salinity"
filename = "../data/WOD-Salinity-Provencal.nc"

if !isfile(filename)    
    download("https://dox.ulg.ac.be/index.php/s/PztJfSEnc8Cr3XN/download",filename)
else
    @info("Data file $filename already downloaded")
end


bathname = "../data/gebco_30sec_4.nc"
if !isfile(bathname)
    download("https://dox.ulg.ac.be/index.php/s/U0pqyXhcQrXjEUX/download",bathname)
else
    @info("Bathymetry file already downloaded")
end

lonr = 3.:0.1:11.8
latr = 42.:0.1:44.5
bathisglobal = true
# Extract the bathymetry for plotting
bx,by,b = extract_bath(bathname,bathisglobal,lonr,latr);




Load the data and print basic information about it that can be used for quality check.

In [None]:
obsval,obslon,obslat,obsdepth,obstime,obsid = loadobs(Float64,filename,"Salinity")
checkobs((obslon,obslat,obsdepth,obstime),obsval,obsid)

It is quite important to remove outliers for the correlation length estimation, as outlires can have a significant impact.

In [None]:
hist(obsval,1000)
xlim(37,39)

For this example we select only the first month (January) and we remove some questionable data

In [None]:
badid = ["wod_015600782O","wod_015602753O","wod_015604717O","wod_015606732O","wod_015607117O","wod_015607524O","wod_015607893O","wod_015924970O","wod_015926914O","wod_015928739O","wod_016650545O", "wod_008518725O","wod_007643915O","wod_015875365O","wod_006614816O","wod_006614929O","wod_006614937O","wod_007644875O","wod_009996947O","wod_010742471O","wod_010742472O","wod_006614931O","wod_006614934O","wod_006625408O","wod_006752127O","wod_006752129O"]
good = (37.6 .< obsval .< 38.75) .& map(id -> !(id in badid),obsid)


sel = (Dates.month.(obstime) .== 1) .& good
x = (obslon[sel],obslat[sel],obsdepth[sel]);
v = obsval[sel]

z = [0.,5., 10., 15., 20., 25., 30., 40., 50., 66, 
    75, 85, 100, 112, 125, 135, 150, 175, 200, 225, 250, 
    275, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 
    800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, 
    1300, 1350, 1400, 1450, 1500, 1600, 1750, 1850, 2000]

#z = [0.,5., 10., 15., 20., 25., 30., 40., 50., 66, 
#    75, 85, 100, 112, 125, 135, 150, 175, 200, 225, 250, 
#    275, 300, 350, 400, 450, 500, 550, 600];


In [None]:
[0,1,2]

Plot the observation at a given level. What do you think will happend if you run `plotobs.(z)`?

In [None]:
function plotobs(z)
    figure()
    sel = (Dates.month.(obstime) .== 1) .& (abs.(obsdepth .- z) .< 50)  .& good # .& (obsval .< 38.3)
    scatter(obslon[sel],obslat[sel],10,obsval[sel], cmap="jet"); colorbar(orientation = "horizontal");
    #print(join(map(s -> '"' * s * '"',String.(unique(obsid[sel]))),","))
    contourf(bx,by,b', levels = [-1e5,0],colors = [[.5,.5,.5]])
    aspectratio = 1/cos(mean(latr) * pi/180)
    gca().set_aspect(aspectratio)
    title("Depth: $z m")
end

In [None]:
plotobs.(z);

In [None]:
depthr = z

In [None]:
2 .* [1,2,3]

In [None]:
size(mask)

In [None]:
size(background_len[1])

In [None]:
mask, pmn, xyi = DIVAnd.domain(bathname, bathisglobal, lonr, latr, depthr)
sz = size(mask);
# obs. coordinate matching selection
xsel = (obslon[sel],obslat[sel],obsdepth[sel])

vm = mean(obsval[sel])
va = obsval[sel] .- vm
toaverage = [true, true, false]
background_len = (zeros(sz),zeros(sz),fill(50.,sz))
background_epsilon2 = 1000.


                    fi, vaa = DIVAnd.DIVAnd_averaged_bg(
                        mask,
                        pmn,
                        xyi,
                        xsel,
                        va,
                        background_len,
                        background_epsilon2,
                        toaverage;
                    )

                    fbackground = fi .+ vm
                    @debug "fbackground: $(fbackground[1,1,:])"

v = vaa;
sel2 = isfinite.(v)
x = (xsel[1][sel2],xsel[2][sel2],xsel[3][sel2])
v = v[sel2];

In [None]:
plot(fbackground[1,1,:],-depthr)

### Horizontal correlation length
Estimate the horizontal correlation length for different depth levels    
(can take a few minutes).

In [None]:
?fithorzlen

In [None]:
@time lenxy,infoxy = fithorzlen(x,v,z; distfun = DIVAnd.distfun_m, searchz = z -> (z/4+10));

Information about the fitting:


In [None]:
lenxy

In [None]:
infoxy[:fitinfos][1]

Extract the information for the level `k = 1`:
* `covar`: the empirical covariance
* `fitcovar`: the fitted empirical covariance
* `distx`: distance
* `range`: part of `covar` used for the fitting
* `rqual`: the quality for the fit (1: excellent, 0: poor)

In [None]:
k = 1
covar = infoxy[:fitinfos][k][:covar]
fitcovar = infoxy[:fitinfos][k][:fitcovar]
distx = infoxy[:fitinfos][k][:distx]
range = infoxy[:fitinfos][k][:range]
rqual = infoxy[:fitinfos][k][:rqual]

plot(distx,covar,label="empirical covariance")
plot(distx[range],covar[range],"r",label="empirical covariance used for fitting")
ylabel("covariance [psu²]")
xlabel("distance [m]")
legend();

In [None]:
plot(distx[range],covar[range],"r",label="empirical covariance used for fitting")
plot(distx[range],fitcovar[range],"g",label="fitted covariance (rqual = $(@sprintf("%4.3f",rqual)))")
ylabel("covariance [psu²]")
xlabel("distance [m]")
legend();

🖋️ Try different values of k (the level index) and re-run the previous two cells.     
Note that at some level the fit was quite poor. Additional filtering (vertically) is done to smooth the horizontal correlation length.

Make a plot of the horizontal correlation length with respect to the depth. 

In [None]:
rqual = [f[:rqual] for f in infoxy[:fitinfos]]
figure(figsize = (10,6))
subplot(1,2,1)
plot(lenxy/1000,-z, "ko-")
plot(infoxy[:len]/1000,-z,":");
xlabel("Horizontal correlation length (km)")
ylabel("Depth (m)")
#xlim(0,180)
subplot(1,2,2)
plot(rqual,-z)
xlabel("quality of the fit");

In [None]:
plotobs(1500);

It is useful to limit the acceptable range of the correlation length by providing a function `limitfun` with the argument depth `z` and estimated correlation length `len`. It the returns the adjusted correlation length. This adjustememt is done before the filtering.

In [None]:
@time lenxy2,infoxy2 = fithorzlen(x,v,z; distfun = DIVAnd.distfun_m, limitfun = (z,len) -> min(max(len,25e3),60e3));


In [None]:
plot(lenxy2,-z, "ko-")
#plot(infoxy2[:len],-z,":");
xlabel("Horizontal correlation length (m)")
ylabel("Depth (m)");

### Vertical correlation length 
The vertical correlation length is also estimated for different depth levels

In [None]:
?fitvertlen

In [None]:
mean(v)

In [None]:
lenz,infoz = fitvertlen(x,v,z);

Make a plot

In [None]:
k = 45
covar = infoz[:fitinfos][k][:covar]
fitcovar = infoz[:fitinfos][k][:fitcovar]
distx = infoz[:fitinfos][k][:distx]
range = infoz[:fitinfos][k][:range]
rqual = infoz[:fitinfos][k][:rqual]

plot(distx,covar,label="empirical covariance")
plot(distx[range],covar[range],"r",label="empirical covariance used for fitting")
ylabel("covariance [psu²]")
xlabel("distance [m]")
legend();

In [None]:
rqual = [f[:rqual] for f in infoz[:fitinfos]]
figure(figsize = (10,6))
subplot(1,2,1)
plot(lenz,-z, "ko-")
plot(infoz[:len],-z, "-")
xlabel("Vertical correlation length (m)")
ylabel("Depth (m)");

subplot(1,2,2)
plot(rqual,-z, "b-");

In [None]:
infoz[:fitinfos][end]

An alternative is to use of the vertical coordinate to obtain a reasonable guess of the vertical correlation

In [None]:
Dz = (z[3:end] - z[1:end-2])/2
lenz = 3 * [Dz[1], Dz..., Dz[end]]
lenzf = DIVAnd.smoothfilter(1:length(lenz),lenz,10)
plot(lenz,-z)
plot(lenzf,-z);