Multiple files can also be aggregated over a given dimension (or the record dimension). In this example, 3 sea surface temperature fields from the 1992-01-01 to 1992-01-03 are aggregated using the OPeNDAP service from PODAAC.
using NCDatasets, Printf, Dates
function url(dt)
doy = @sprintf("%03d",Dates.dayofyear(dt))
y = @sprintf("%04d",Dates.year(dt))
yyyymmdd = Dates.format(dt,"yyyymmdd")
return "https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/CMC/CMC0.2deg/v2/$y/$doy/$(yyyymmdd)120000-CMC-L4_GHRSST-SSTfnd-CMC0.2deg-GLOB-v02.0-fv02.0.nc"
end
ds = NCDataset(url.(DateTime(1992,1,1):Dates.Day(1):DateTime(1992,1,3)),aggdim = "time");
SST2 = ds["analysed_sst"][:,:,:];
close(ds)
If there is a network or server issue, you will see an error message like NetCDF: I/O failure
.
The CF Conventions do not define how the different NetCDF variables are named, but the meaning of a variable is defined by the standard_name
attribute.
using NCDatasets, DataStructures
ds = NCDataset(tempname(),"c")
nclon = defVar(ds,"lon", 1:10, ("lon",),attrib = OrderedDict(
"standard_name" => "longitude",
))
nclat = defVar(ds,"lat", 1:11, ("lat",),attrib = OrderedDict(
"standard_name" => "latitude",
))
ncvar = defVar(ds,"bat", zeros(10,11), ("lon", "lat"), attrib = OrderedDict(
"standard_name" => "height",
))
ncbat = ds[CF"height"]
# the same as
# ncbat = varbyattrib(ds,standard_name = "height")[1]
name(ncbat)
# output
"bat"
If there are multiple variables with the standard_name
equal to height
, an error is returned because it is ambiguous which variable should be accessed.
All variables whose dimensions are also dimensions of ncbat
are considered as related and can also be accessed by sub-setting ncbat
with their variable names
of CF Standard name:
nclon_of_bat = ncbat[CF"longitude"]
# same as
# nclon_of_bat = ncbat["lon"]
name(nclon_of_bat)
# output
"lon"
The previous call to ncbat[CF"longitude"]
would also worked if there are multiple variables with a standard name longitude
defined in a dataset as long as they have different dimension names (which is commonly the case for model output on staggered grid such as Regional Ocean Modeling System).
In Julia, a view of an array is a subset of an array but whose elements still point to the original parent array. If one modifies an element of a view, the corresponding element in the parent array is modified too:
A = zeros(4,4)
subset = @view A[2:3,2:4]
# or
# subset = view(A,2:3,2:4)
subset[1,1] = 2
A[2,2]
# output
2.0
Views do not use copy of the array. The parent array and the indices of the view are obtained via the function parent
and parentindices
.
parent(subset) == A
# true, as both arrays are the same
parentindices(subset)
# output
(2:3, 2:4)
In NCDatasets, variables can also be sliced as a view:
using NCDatasets, DataStructures
ds = NCDataset(tempname(),"c")
nclon = defVar(ds,"lon", 1:10, ("lon",))
nclat = defVar(ds,"lat", 1:11, ("lat",))
ncvar = defVar(ds,"bat", zeros(10,11), ("lon", "lat"), attrib = OrderedDict(
"standard_name" => "height",
))
ncvar_subset = @view ncvar[2:4,2:3]
# or
# ncvar_subset = view(ncvar,2:4,2:3)
ncvar_subset[1,1] = 2
# ncvar[2,2] is now 2
ncvar_subset.attrib["standard_name"]
# output
"height"
This is useful for example when even the sliced array is too large to be loaded in RAM or when all attributes need to be preserved for the sliced array.
The variables lon
and lat
are related to bat
because all dimensions of the variables lon
and lat
are also dimensions of bat
(which is commonly the case for coordinate variables). Such related variables can be retrieved by indexing the NetCDF variables with the name of the corresponding variable:
lon_subset = ncvar_subset["lon"]
lon_subset[:] == [2, 3, 4]
# output
true
A view of a NetCDF variable also implements the function parent
and parentindices
with the same meaning as for julia Array
s.
A whole dataset can also be sliced using a view(ds, dim1=range1, dim2=range2...)
. For example:
ds_subset = view(ds, lon = 2:3, lat = 2:4)
# or
# ds_subset = @view ds[lon = 2:3, lat = 2:4]
ds_subset.dim["lon"]
# output
2
Such sliced datasets can for example be saved into a new NetCDF file using write
:
write("slice.nc",ds_subset)
Any dimension not mentioned in the @view
call is not sliced.
While @view
produces a slice based on indices, the NCDatasets.@select
macro produces a slice (of an NetCDF variable or dataset)
based on the values of other related variables (typically coordinates).
NCDatasets.@select
In the NetCDF CF conventions there are the attributes _FillValue
(single scalar) and missing_value
(single scalar or possibly a vector with multiple missing values). Value in the netCDF file matching, these attributes are replaced by the Missing
type in Julia per default. However, for some applications, it is more convenient to use another special value like the special floating point number NaN
.
For example, this is a netCDF file where the variable var
contains a missing
which is automatically replaced by the fill value 9999.
using NCDatasets
data = [1. 2. 3.; missing 20. 30.]
ds = NCDataset("example.nc","c")
defVar(ds,"var",data,("lon","lat"),fillvalue = 9999.)
The raw data as stored in the NetCDF file is available using the property .var
:
ds["var"].var[:,:]
# 2×3 Matrix{Float64}:
# 1.0 2.0 3.0
# 9999.0 20.0 30.0
Per default, the fill value is replaced by missing
when indexing ds["var"]
:
ds["var"][:,:]
# 2×3 Matrix{Union{Missing, Float64}}:
# 1.0 2.0 3.0
# missing 20.0 30.0
The function nomissing
allows to replace all missing value with a different value like NaN
:
var_nan = nomissing(ds["var"][:,:],NaN)
# 2×3 Matrix{Float64}:
# 1.0 2.0 3.0
# NaN 20.0 30.0
close(ds)
Such substitution can also be made more automatic using the experimental parameter maskingvalue
that can be user per variable:
ds = NCDataset("example.nc","r")
ncvar_nan = cfvariable(ds,"var",maskingvalue = NaN)
ncvar_nan[:,:]
# 2×3 Matrix{Float64}:
# 1.0 2.0 3.0
# NaN 20.0 30.0
close(ds)
Or per data-set:
ds = NCDataset("example.nc","r", maskingvalue = NaN)
ds["var"][:,:]
# 2×3 Matrix{Float64}:
# 1.0 2.0 3.0
# NaN 20.0 30.0
close(ds)
Note, choosing the maskingvalue
affects the element type of the NetCDF variable using julia type promotion rules, in particular note that following vector:
[1, NaN]
# 2-element Vector{Float64}:
# 1.0
# NaN
is a vector with the element type Float64
and not Union{Float64,Int}
. All integers
are thus promoted to floating point number as NaN
is a Float64
.
Since NaN is considered as a Float64
in Julia, we have also a promotion to Float64
in such cases:
[1f0, NaN]
# 2-element Vector{Float64}:
# 1.0
# NaN
where 1f0
is the Float32
number 1. Consider to use NaN32
to avoid this promotion (which is automatically converted to 64-bit NaN for a Float64
array):
using NCDatasets
data32 = [1f0 2f0 3f0; missing 20f0 30f0]
data64 = [1. 2. 3.; missing 20. 30.]
ds = NCDataset("example_float32_64.nc","c")
defVar(ds,"var32",data32,("lon","lat"),fillvalue = 9999f0)
defVar(ds,"var64",data64,("lon","lat"),fillvalue = 9999.)
close(ds)
ds = NCDataset("example_float32_64.nc","r", maskingvalue = NaN32)
ds["var32"][:,:]
# 2×3 Matrix{Float32}:
# 1.0 2.0 3.0
# NaN 20.0 30.0
ds["var64"][:,:]
# 2×3 Matrix{Float64}:
# 1.0 2.0 3.0
# NaN 20.0 30.0
Promoting an integer to a floating point number can lead to loss of precision. These are the smallest integers that cannot be represented as 32 and 64-bit floating numbers:
Float32(16_777_217) == 16_777_217 # false
Float64(9_007_199_254_740_993) == 9_007_199_254_740_993 # false
NaN
should not be used for an array of dates, character or strings as it will result in an array with the element type Any
following julia's promotion rules.
The use of missing
as fill value, is thus preferable in the general case.
NCDatasets.ancillaryvariables
NCDatasets.filter