# Large data set

* DINEOF analysis of Western Mediterranean sea surface temperature (SST).
* Download file and view content of NetCDF file with the following:

In [2]:
using PyPlot
using NCDatasets
using Missings

Helper function for plotting tranposed arrays or arrays with missing data 

In [3]:
using PyCall
using PyCall: PyObject

# allow for plotting with missing values
function PyObject(a::Array{Union{T,Missing},N}) where {T,N}
    numpy_ma = PyCall.pyimport("numpy").ma
    pycall(numpy_ma.array, Any, coalesce.(a,zero(T)), mask=ismissing.(a))
end


PyObject

Download data file

In [4]:
if !isfile("WesternMedSST.nc")
    download("https://dox.ulg.ac.be/index.php/s/XkNUzGGVtnSCdT3/download","WesternMedSST.nc")
end
    
ds = Dataset("WesternMedSST.nc")

[31mNCDataset: WesternMedSST.nc[39m
Group: /

[31mDimensions[39m
   lon = 327
   lat = 217
   time = 384

[31mVariables[39m
[32m  lon[39m   (327)
    Datatype:    Float64
    Dimensions:  lon
    Attributes:
     standard_name        = [36mlongitude[39m
     units                = [36mdegree_east[39m

[32m  lat[39m   (217)
    Datatype:    Float64
    Dimensions:  lat
    Attributes:
     standard_name        = [36mlatitude[39m
     units                = [36mdegree_north[39m

[32m  time[39m   (384)
    Datatype:    Float64
    Dimensions:  time
    Attributes:
     standard_name        = [36mlatitude[39m
     units                = [36mdays since 1900-01-01 00:00:00[39m

[32m  seviri_sst[39m   (327 × 217 × 384)
    Datatype:    Float32
    Dimensions:  lon × lat × time
    Attributes:
     standard_name        = [36msea_water_temperature[39m
     units                = [36mdegree_Celsius[39m
     long_name            = [36msea surface temperature[39m
  

In [5]:
close(ds);

# Useful functions

 * display the content of a NetCDF file.
```julia
Dataset("WesternMedSST.nc")
```

 * Read a variable from a NetCDF file.
```julia
ds = Dataset("WesternMedSST.nc")
SST = ds["seviri_sst_filled"][:]
close(ds)
```

More info at https://github.com/Alexander-Barth/NCDatasets.jl

The NetCDF variable `seviri_sst` is a 3D array with the sea surface temperature. Its dimensions are longitude, latitude and time. The corresponding coordinates are the variables `lon`, `lat` and `time`.
The variable `mask` is a 2D array (longitude, latitude). In this binary mask 1 corresponds to ocean and 0 to land.

In [6]:
fname = "WesternMedSST.nc";
ds = Dataset(fname)
lon = ds["lon"][:];
lat = ds["lat"][:];
times = nomissing(ds["time"][:]);
SST = ds["seviri_sst"][:];
mask = ds["mask"][:];
close(ds)


closed NetCDF NCDataset

###  Exercices

See [JuliaPlotting.ipynb](JuliaPlotting.ipynb) for the plotting commands.

1. Plot the first time instance of the data set with pcolor.
1. Plot the percentage of valid data grid point over time.
1. For all time instances, what is the percentage of sea grid points not covered by clouds?
1. Plot the time average of SST
1. Plot the space average of SST  (assuming that all pixels have the same area)
1. Make a time serie with the number of pixels with the temperature larger than 25 degree Celsius.
1. Make a time serie of the area (in km2) with the temperature larger than 25 degree Celsius

Other exercises
* Compute the standard deviation over time for every pixel
* Make a map with the minimum temperature
* Make a map with the time index at which the temperature is minimum
