This notebook is designed to create the _benthos_ interpolated maps using `DIVAnd`.      
The data file was prepared by P. Herman (Deltares).

In [26]:
using DIVAnd
using PyPlot
using Proj4
using DelimitedFiles
using PyCall
using Dates
using NCDatasets
using Pkg
include("../scripts/BenthosInterp.jl")
Pkg.status("DIVAnd")

doplot = false       # set to 'true' to create the plots
doplotdata = false    # set to 'true' to plot the observations
usecartopy = true    # set to 'true' if plots are created using Cartopy
writenc = true;     # set to 'true' to write netCDF files with the results

[32m[1mStatus[22m[39m `~/.julia/environments/v1.5/Project.toml`
 [90m [efc8151c] [39m[37mDIVAnd v2.6.5[39m


## Files and directories

In [24]:
figdir = "../product/figures/1-UniformL/"
outputdir = "../product/netCDF/1-UniformL/2/"
datadir = "../data/raw_data/"
datafilesmall = joinpath(datadir, "specs4Diva.csv")
datafile = joinpath(datadir, "spe.csv")
specnamefile = joinpath(datadir, "specieslist.csv")
isdir(figdir) ? "Figure directory already exists" : mkpath(figdir)
isdir(outputdir) ? "Output directory already exists" : mkpath(outputdir)
isfile(datafilesmall) ? @info("Small data file already downloaded") : download("https://dox.ulg.ac.be/index.php/s/vNQcvqjW8RzdNBt/download", datafilesmall)
isfile(datafile) ? @info("Full data file already downloaded") : @warn("Full data file does not exist")

┌ Info: Small data file already downloaded
└ @ Main In[24]:9
┌ Info: Full data file already downloaded
└ @ Main In[24]:10


In [7]:
domain = [-16., 9., 45., 66.]; # [West East South North]
Δlon = 0.1
Δlat = 0.1

0.1

## Data reading
There are 2 possibilities for the selection of the species to process:
1. Take the *N* most frequent species
2. Specify a list of `AphiaID`s.     

The 2 solutions are described in the next cells.     

The dictionary `namesdict` is used to get the _scienfic name_ for the _aphiaID_.

In [44]:
namesdict = read_specnames(specnamefile);

Dict{Any,Any} with 4747 entries:
  "111654"  => "Triticella pedicellata"
  "139486"  => "Roxania utriculus"
  "14635"   => "Solenoidea"
  "238202"  => "Myrianida rubropunctata"
  "152515"  => "Stichasteridae"
  "1423136" => "Regioscalpellum regium"
  "228"     => "Astartidae"
  "141168"  => "Alvania cimicoides"
  "139813"  => "Fuscapex cabiochi"
  "107582"  => "Acanthephyra purpurea"
  "129947"  => "Caulleriella viridis"
  "117809"  => "Nemertesia antennina"
  "152547"  => "Oestergrenia digitata"
  "130356"  => "Nephtys ciliata"
  "1296685" => "Pseudotanais falcicula"
  "110545"  => "Campylaspis horrida"
  "146116"  => "Thyone"
  "846027"  => "Palposyllis propeweismanni"
  "119046"  => "Idotea linearis"
  "110784"  => "Arachnidiidae"
  "130265"  => "Scoletoma tetraura"
  "107072"  => "Callianassa"
  "138118"  => "Arculus"
  "111535"  => "Schizoporella patula"
  "102604"  => "Lysianassa caesarea"
  ⋮         => ⋮

### 1. N most frequent

In [41]:
N = 20 # Set number of species to process
speciesnamelist = get_species_list(datafile)
speciesshortlist = sort(speciesnamelist[1:N])
@info("Working on $(length(speciesshortlist)) species")

┌ Info: Working on 20 species
└ @ Main In[41]:4


### 2. User-defined list of AphiaID
- do not run the next cell if you plan to use the N most frequent species
- adapt the variable `speciesnamelist` to your application

In [39]:
speciesnamelist = ["130118", "129370", "130359", "140767"]
@info("Working on $(length(speciesshortlist)) species")

┌ Info: Working on 20 species
└ @ Main In[39]:2


## Prepare domain grid, metrics and mask
### Interpolation grid

In [8]:
longrid = domain[1]:Δlon:domain[2]
latgrid = domain[3]:Δlat:domain[4]

45.0:0.1:66.0

### Download bathymetry file

In [9]:
bathname = joinpath(datadir, "gebco_30sec_4.nc")
if !isfile(bathname)
    download("https://dox.ulg.ac.be/index.php/s/RSwm4HPHImdZoQP/download", bathname)
else
    @info("Bathymetry file already downloaded")
end

"../data/raw_data/gebco_30sec_4.nc"

### Read bathymetry

In [10]:
bx, by, b = load_bath(bathname, true, longrid, latgrid)
@show size(b)

if doplot
    fig = PyPlot.figure()
    ax = PyPlot.subplot(111)
    pcolor(bx,by,b', vmin=0., cmap=PyPlot.cm.gist_earth); 
    colorbar(orientation="vertical")
    title("Depth (m)")
    savefig(joinpath(figdir, "benthos_bathy.jpg"), dpi=300, bbox_inches="tight")
    show()
end

size(b) = (251, 211)


### Metrics

In [11]:
_, (pm, pn),(xi, yi) = DIVAnd.DIVAnd_rectdom(longrid, latgrid);
xi, yi, mask = DIVAnd.load_mask(bathname, true, longrid, latgrid, 0.0);
xx, yy = ndgrid(xi, yi);

## Interpolation
### Method

Loop on all the species: 
1. read the data
2. compute the heatmaps and 
3. derive the probability field as:
```
d = npre * dens2 / (npre * dens_pre + nabs * dens_abs)
```
where 
* dens_pre is the heatmap obtained with the presence data only; 
* dens_abs is the heatmap obtained with the absence data only. 

The reason for this equation is that the heatmap are computed so that their integral over the domain is 1, whatever the number of observations. 

### Analysis 1: Using a uniform correlation length

In [42]:
# Set correlation length
Lvalues = [0.01, 0.05, 0.1, 0.5, 1.]
L = 0.1
epsilon2 = 5.

5.0

Loop on the list of species.

In [33]:
for spec in speciesshortlist[1:2]
    aphiaID = replace(spec, "pa"=>"")
    species = namesdict[aphiaID]

    speciesslug = get_species_slug(String(species))
    
    @info(speciesslug)
    @info("Working on species $(String(species))");
    lon_pre, lat_pre, lon_abs, lat_abs = read_coords_species(datafile, spec);
    npre = length(lon_pre)
    nabs = length(lon_abs)

    @info("Number of presence: $(npre), number of absence: $(nabs)")
    

    # Plot the data locations
    if doplotdata
        make_plot_presence_absence(lon_pre, lat_pre, lon_abs, lat_abs, String(species),
            dlat=4., dlon=6.,
            figname=joinpath(figdir, "$(speciesslug)_data.jpg"), usecartopy=true)
    end
    
    
    
    @info("Computing heatmaps")
    dens_pre, LHM2, LCV2, LSCV2 = DIVAnd_heatmap(mask, (pm,pn), (xx, yy), 
        (lon_pre, lat_pre), ones(npre), L);
    dens_abs, LHM3, LCV3, LSCV3 = DIVAnd_heatmap(mask, (pm,pn), (xx, yy), 
        (lon_abs, lat_abs), ones(nabs), L);


    d = npre .* dens_pre ./ (npre .* dens_pre .+ nabs .* dens_abs);

    
    @info("Computing error field with CPME")
    lon = [lon_pre ; lon_abs]
    lat = [lat_pre ; lat_abs]

    cpme = DIVAnd_cpme(mask, (pm, pn), (xx, yy), (lon, lat), 
        ones(length(lon)), 5. * L, epsilon2);

    
    if doplot
        
        plot_heatmap(longrid, latgrid, d, lon_pre, lat_pre, lon_abs, lat_abs,
            "$(species): probability", figname=joinpath(figdir, "$(speciesslug)_density.jpg"), 
            usecartopy=usecartopy)  
        """
        plot_error(longrid, latgrid, cpme, "$(species): error", 
            figname=joinpath(figdir, "$(speciesslug)_error.png"),
            usecartopy=usecartopy)
        """
        
    end   

    if writenc
        @info("Creating the netCDF file with results")
        create_nc_results(joinpath(outputdir, "$(speciesslug)_density.nc"), 
            longrid, latgrid, d, String(species), domain=domain);

        @info("Adding error field to netCDF file")
        write_nc_error(joinpath(outputdir, "$(speciesslug)_density.nc"), cpme);
    end


end

┌ Info: Asterias_rubens
└ @ Main In[33]:9
┌ Info: Working on species Asterias rubens
└ @ Main In[33]:10
┌ Info: Column index for pa123776: 24
└ @ Main /home/ctroupin/Projects/EMODnet/EMODnet-Biology-Benthos-Interpolated-Maps/scripts/BenthosInterp.jl:25
┌ Info: Number of presence: 0, number of absence: 0
└ @ Main In[33]:15
┌ Info: Computing heatmaps
└ @ Main In[33]:27
┌ Info: Computing error field with CPME
└ @ Main In[33]:37
┌ Info: Creating the netCDF file with results
└ @ Main In[33]:59
┌ Info: Adding error field to netCDF file
└ @ Main In[33]:63
┌ Info: Echinocyamus_pusillus
└ @ Main In[33]:9
┌ Info: Working on species Echinocyamus pusillus
└ @ Main In[33]:10
┌ Info: Column index for pa124273: 21
└ @ Main /home/ctroupin/Projects/EMODnet/EMODnet-Biology-Benthos-Interpolated-Maps/scripts/BenthosInterp.jl:25
┌ Info: Number of presence: 0, number of absence: 0
└ @ Main In[33]:15
┌ Info: Computing heatmaps
└ @ Main In[33]:27
┌ Info: Computing error field with CPME
└ @ Main In[33]:37
┌ In

## Analysis 2: Variable correlation length
The correlation length parameters is now variable over the domain. It takes into account the variability in the substrate.     
The information is read from the file `substrate_gini_impurity.nc`, output of the script [`benthos_substrate.jl`](./benthos_substrate.jl).

In [10]:
figdir2 = "../product/figures/2-VariableL/"
outputdir2 = "../product/netCDF/2-VariableL/"
isdir(figdir2) ? "Figure directory already exists" : mkpath(figdir2)
isdir(outputdir2) ? "Output directory already exists" : mkpath(outputdir2)

domain2 = [-16., 9., 47., 66.]; # [West East South North]
longrid2 = domain2[1]:Δlon:domain2[2]
latgrid2 = domain2[3]:Δlat:domain2[4]

47.0:0.1:66.0

### Read variable correlation length

In [16]:
Lfile = joinpath(datadir, "substrate_gini_impurity.nc")
if !isfile(Lfile)
    download("https://dox.ulg.ac.be/index.php/s/55IVEmn2xb8zHyR/download/download", bathname)
else
    @info("Variable correlation length file already downloaded")
end

lonL, latL, g = read_substrate(Lfile);
@info(size(g));

maxlen = 0.5
minlen = 0.1
Lfield = minlen .+ (maxlen - minlen) * (1 .- g);

┌ Info: Variable correlation length file already downloaded
└ @ Main In[16]:5
┌ Info: (1479, 683)
└ @ Main In[16]:9


Re-interpolate the L field on the interpolation grid

In [17]:
bx2, by2, b2 = load_bath(bathname, true, longrid2, latgrid2)
_, (pm2, pn2), (xi, yi) = DIVAnd.DIVAnd_rectdom(longrid2, latgrid2);
xi2, yi2, mask2 = DIVAnd.load_mask(bathname, true, longrid2, latgrid2, 0.0);
xx2, yy2 = ndgrid(xi2, yi2);
@time lon_interp, lat_interp, Linterp = interp_horiz(lonL, latL, Lfield, longrid2, latgrid2);

  0.000262 seconds (20 allocations: 397.500 KiB)


Create plot

In [18]:
llonL, llatL = ndgrid(lonL, latL)
clf();
PyPlot.figure(figsize=(10, 10))
if usecartopy
    ax = PyPlot.subplot(111, projection=myproj)
else
    ax = PyPlot.subplot(111)
end
pcm = ax.pcolor(lon_interp, lat_interp, Linterp')
decorate_map_domain(ax, domain=domain2)
PyPlot.title("Correlation length field")
colorbar(pcm, shrink=0.7)
PyPlot.savefig(joinpath(figdir, "../variableL4.jpg"), dpi=300, bbox_inches="tight")
PyPlot.close();

### Loop

In [19]:
doplot = false

false

In [23]:
speciesnamelist = get_species_list(datafile)
sort!(speciesnamelist)
for species in speciesnamelist

    speciesslug = get_species_slug(String(species))
    
    @info(speciesslug)
    @info("Working on species $(String(species))");
    lon_pre, lat_pre, lon_abs, lat_abs = read_coords_species(datafile, species);
    npre = length(lon_pre)
    nabs = length(lon_abs)

    @info("Number of presence: $(npre), number of absence: $(nabs)")
    
    
    @info("Computing heatmaps")
    dens_pre, LHM2, LCV2, LSCV2 = DIVAnd_heatmap(mask2, (pm2,pn2), (xx2, yy2), 
        (lon_pre, lat_pre), ones(npre), (Linterp, Linterp));
    dens_abs, LHM3, LCV3, LSCV3 = DIVAnd_heatmap(mask2, (pm2,pn2), (xx2, yy2), 
        (lon_abs, lat_abs), ones(nabs), (Linterp, Linterp));


    d = npre .* dens_pre ./ (npre .* dens_pre .+ nabs .* dens_abs);

    @info("Computing error field with CPME")
    lon = [lon_pre ; lon_abs]
    lat = [lat_pre ; lat_abs]

    cpme = DIVAnd_cpme(mask2, (pm2, pn2), (xx2, yy2), (lon, lat), 
        ones(length(lon)), (5. .* Linterp, 5. .* Linterp), 5.);

    
    if doplot
        plot_heatmap(longrid2, latgrid2, d, lon_pre, lat_pre, lon_abs, lat_abs,
            "$(species): probability", figname=joinpath(figdir2, "$(speciesslug)_density.jpg"), 
            usecartopy=usecartopy)            
        plot_error(longrid2, latgrid2, cpme, "$(species): error", 
            figname=joinpath(figdir2, "$(speciesslug)_error.png"),
            usecartopy=usecartopy)
    end   

    if writenc
        @info("Creating the netCDF file with results")
        create_nc_results(joinpath(outputdir2, "$(speciesslug)_density.nc"), 
            longrid2, latgrid2, d, String(species), domain=domain);

        @info("Adding error field to netCDF file")
        write_nc_error(joinpath(outputdir2, "$(speciesslug)_density.nc"), cpme);
    end


end

┌ Info: Abludomelita_obtusata
└ @ Main In[23]:7
┌ Info: Working on species Abludomelita_obtusata
└ @ Main In[23]:8
┌ Info: Column index for Abludomelita_obtusata: 9
└ @ Main /home/ctroupin/Projects/EMODnet/EMODnet-Biology-Benthos-Interpolated-Maps/scripts/BenthosInterp.jl:25
┌ Info: Number of presence: 2097, number of absence: 66463
└ @ Main In[23]:13
┌ Info: Computing heatmaps
└ @ Main In[23]:16
└ @ DIVAnd /home/ctroupin/.julia/packages/DIVAnd/LkI0S/src/DIVAnd_heatmap.jl:48
┌ Info: Computing error field with CPME
└ @ Main In[23]:25
┌ Info: Creating the netCDF file with results
└ @ Main In[23]:43
┌ Info: Adding error field to netCDF file
└ @ Main In[23]:47
┌ Info: Abra_alba
└ @ Main In[23]:7
┌ Info: Working on species Abra_alba
└ @ Main In[23]:8
┌ Info: Column index for Abra_alba: 5
└ @ Main /home/ctroupin/Projects/EMODnet/EMODnet-Biology-Benthos-Interpolated-Maps/scripts/BenthosInterp.jl:25
┌ Info: Number of presence: 14068, number of absence: 65977
└ @ Main In[23]:13
┌ Info: Computin

┌ Info: Computing error field with CPME
└ @ Main In[23]:25
┌ Info: Creating the netCDF file with results
└ @ Main In[23]:43
┌ Info: Adding error field to netCDF file
└ @ Main In[23]:47
┌ Info: Diplocirrus_glaucus
└ @ Main In[23]:7
┌ Info: Working on species Diplocirrus_glaucus
└ @ Main In[23]:8
┌ Info: Column index for Diplocirrus_glaucus: 7
└ @ Main /home/ctroupin/Projects/EMODnet/EMODnet-Biology-Benthos-Interpolated-Maps/scripts/BenthosInterp.jl:25
┌ Info: Number of presence: 4087, number of absence: 64473
└ @ Main In[23]:13
┌ Info: Computing heatmaps
└ @ Main In[23]:16
└ @ DIVAnd /home/ctroupin/.julia/packages/DIVAnd/LkI0S/src/DIVAnd_heatmap.jl:48
┌ Info: Computing error field with CPME
└ @ Main In[23]:25
┌ Info: Creating the netCDF file with results
└ @ Main In[23]:43
┌ Info: Adding error field to netCDF file
└ @ Main In[23]:47
┌ Info: Diplocirrus_stopbowitzi
└ @ Main In[23]:7
┌ Info: Working on species Diplocirrus_stopbowitzi
└ @ Main In[23]:8
┌ Info: Column index for Diplocirrus_