# NetCDF
## 1. What is NetCDF?
The Network Comman Data Form, or [NetCDF](https://www.unidata.ucar.edu/software/netcdf/docs/) is **a set of software libraries** 
and self-describing, machine-independant **data formats**. It supports the creation, access and sharing of scientific data. 

To make data "self-describing" and meaningful to both humans and machines, the names, units of measure and other metadata should be meaningful and conform to **Conventions**. 

## 2. How to install NetCDF?
To install NetCDF is to install the base NetCDF libraries and associated tools. 

Setting up NetCDF on Ubuntu follows [here](https://skygiant.com.au/setting-up-netcdf-on-ubuntu/).

In [36]:
! sudo apt-get install libnetcdf-dev netcdf-bin
! sudo apt-get install ncview

[sudo] password for l8tan: 
[sudo] password for l8tan: 


KeyboardInterrupt: 

To install Python's netCDF4 lib:

In [None]:
! pip install netCDF4

## 3. How to open a NetCDF file?
NetCDF files should have the file name extension "**.nc**". 

A classic netCDF format file should **not** be larger than **2GB**, except on platforms that have "Large File Support"(LFS). 

### Command line Tools
* **[ncdump](https://www.unidata.ucar.edu/software/netcdf/docs/netcdf_utilities_guide.html#ncdump_guide)**
The ncdump command is used to show the contents of netCDF files. It reads a netCDF file and outputs text. The output text is in a format called Common Data Language ([CDL](https://www.unidata.ucar.edu/software/netcdf/docs/netcdf_utilities_guide.html#cdl_guide)), describing netCDF objects and data. 

In [None]:
! ncdump example.nc

In [None]:
! ncdump -h example.nc

* **[ncview](http://meteora.ucsd.edu/~pierce/ncview_home_page.html)** is a visual browser for netCDF format files. 

In [None]:
! ncview example.nc

*Tips: when showing a netCDF file, if the coordinate systems are correctly identified, the values for x and y axes should change as the cursor moving on the plot.*

### Other Viewer Tools
* **[Panoply](https://www.giss.nasa.gov/tools/panoply/)** can output CDL descriptions as well as plot variable data. Download and installation [here](https://www.giss.nasa.gov/tools/panoply/download/). 


*Tips: if lon-lat coordinates are correctly identified, it can plot data on a global or regional map using lon/lat variable data.
With a projection system, if the grid mapping variable is correctly described, it can plot data on the global map even without lon/lat variable data(Panoply will do the unprojection for you).*

## 4. What can be seen in a NetCDF file (file structure and components)?
A netCDF file comprises two parts: a header part and a data part:
* **Header part**: contains all the information/metadata about **Dimensions**, **Attributes** and **Variables**. 
* **Data part**: fixed-size data (the data for variables without unlimited dimension); variable-size data (the data for variables with unlimited dimension)

An example netCDF file read by ```ncdump```.

In [None]:
netcdf sfc_pres_temp {
dimensions:
   latitude = 6 ;
   longitude = 12 ;
variables:
   float latitude(latitude) ;
       latitude:units = "degrees_north" ;
   float longitude(longitude) ;
       longitude:units = "degrees_east" ;
   float pressure(latitude, longitude) ;
       pressure:units = "hPa" ;
data:
 latitude = 25, 30, 35, 40, 45, 50 ;
 longitude = -125, -120, -115, -110, -105, -100, -95, -90, -85, -80, -75, -70 ;
 pressure =
  900, 906, 912, 918, 924, 930, 936, 942, 948, 954, 960, 966,
  901, 907, 913, 919, 925, 931, 937, 943, 949, 955, 961, 967,
  902, 908, 914, 920, 926, 932, 938, 944, 950, 956, 962, 968,
  903, 909, 915, 921, 927, 933, 939, 945, 951, 957, 963, 969,
  904, 910, 916, 922, 928, 934, 940, 946, 952, 958, 964, 970,
  905, 911, 917, 923, 929, 935, 941, 947, 953, 959, 965, 971 ;
}

**In fact, data in netCDF files is a group of functions(called variables) with zero to multiple dimensions. Each variable has some meta data to self-describe the meaning of this variable, the meta data is called attributes. **

#### **NAMING:** 
```alphanumeric_seperated_by_underscores```. Starting from letter; accepting period '.', plus '+', hyphen '-', or at sign '@'; case sensitive; ```_reserved``` for system names. 

**Import Python [netCDF4](http://unidata.github.io/netcdf4-python/) module** and **Create an example .nc file**

In [None]:
import netCDF4
path = "example.nc"
exp_nc = netCDF4.Dataset(path, 'w')

### Dimensions
A dimension represents a quantity. It can be a physical spatiotemporal dimension: date or time (T), height or depth (Z), latitude (Y), longitude (X); or other user-defined quantities. 

Each dimension has a **name** and a **length**. There are not standard dimension names. Dimensions should be named meaningful. The dimmensions must all have different names.
Dimension length is an arbitrary positive integer or UNLIMITED. 

The length of the dimension indicates the number of points of this dimension. UNLIMITED length means the dimension can grow to any length. It is usually used as an index for appending more records. For example, time dimension is always defined as UNLIMITED. 

If any information, description, meaning of the dimension needs to be attached, a variable using the same name as the dimension with only this single dimension should be used. This variable will have the same size of the dimension. Then all the attributes of this variable can be used to describe the single-valued quantity. The actual values of the dimension are also stored as the data of this variable (i.e. a 1D array). 

A netCDF file allows any number of dimensions, and any number of UNLIMITED dimensions, but the old convention strongly recommends limiting the total number of dimensions to **four**, the number of UNLIMITED dimension to up to **one**. 

Dimensions can be renamed after creation. 

**For example:** In the netCDF above: 

```latitude``` and ```longitude``` are both dimensions and variables. The dimension ```latitude``` has a name "latitude" and a length "6". The variable ```latitude``` has the same name as the dimension ```latitude``` and takes the single dimension ```latitude```. The variable ```latitude``` has an attribute: ```units```. The data of the variable ```latitude``` is a 1D array with size 6: ```latitude = 25, 30, 35, 40, 45, 50 ;```. 

In [None]:
lat = exp_nc.createDimension('latitude', 6)
lon = exp_nc.createDimension('longitude', 12)
time = exp_nc.createDimension('time', None)  # unlimited

In [None]:
print(exp_nc.dimensions.keys())
print(time.isunlimited())

There are several uses for netCDF dimensions[[cites]](http://www.bic.mni.mcgill.ca/users/sean/Docs/netcdf/guide.txn_12.html):
* Specifying the shapes and sizes of variables.
* Identifying and relating variables that are defined on a common grid.
* Providing a way to define coordinate systems. 

### (Global) Attributes

A list of attributes for different types of usages [here](https://www.star.nesdis.noaa.gov/sod/mecb/coastwatch/cwf/cw_cf_metadata.pdf).

Global attributes are optional. There are some standard global attribute names. A file can also have non-standart attributes. Application programs will ignore the attributes that they do not recognize. 

Usually, the global attributes are description of the file contents/origin, such as where the data came from and what has been done to it. This information is mainly for the benefit of human readers. The attribute values are all strings. For pretty output of ```ncdump```, ```'\n'``` is recommanded to be embedded to long strings. 

Standard global attribute names include: ```title```, ```institution```, ```source```, ```history```, ```comment```, ```Conventions```, ```external_variables```, ```references```. Some of these attribute names can also be assigned to individual variables: ```institution```, ```source```,```comment```, ```references```.

In [None]:
exp_nc.Conventions = "CF-1.6"
exp_nc.title = 'An example netCDF dataset'
exp_nc.institution = 'University of Waterloo'

### Variables
A variable is a multidimensional object that has, among other characteristics, a *shape*. *Shape* is defined by the number, order and sizes of its dimensions. 

Each variable has **5 parts**: data type, variable name, dimensions, attributes and data.

Data type, variable name and dimension list must be specified when the variable is created. Dimension list fix the shape of the data array/matrix. So data type and dimension list can not be modified after the variable is created. Variables can be renamed. Attributes of the variable can be created, deleted, modified at any time. 

Variables are related by the dimensions they share. Eg. if two variables are defined with the same dimensions, they might represent observations or model output for the same set of points. 

* There are TWO different kinds of variables: physical value variables (non-coordinate data) -- **D varialbes** and coordinate related variables (variables containing coordinate data, i.e. coordinate variables and auxiliary coordinate variables) -- **C variables**.

In [None]:
lat = exp_nc.createVariable('latitude', 'i4', ('latitude', ))
lat.units = 'degrees_north'
lat.standard_name = 'latitude'
lat.axis = 'Y'

# rename variable
exp_nc.renameVariable('latitude', 'lat')

#### Data Types and Data
Data of a variable is an array(matrix) of values(all in the same data type) .

Data can be stored in clasical programming data types or user-defined types. The clasical programming types include: byte, char, short, ushort, int, uint, int64, uint64, float or real, double, string. All integer types are treated as signed. String type is represented as a 1D array of char data. 

In [None]:
lat[:] = [25, 30, 35, 40, 45, 50]

#### Dimensions of the Variable

A variable can have 0 to any arbitrary number of dimensions. Some special types of variables may have 0 dimension (eg. grid mapping variable, scalar coordinate variable).

The dimensions of variables with more than 0 dimension should be chosen from the already created dimensions. If any or all of the dimensions of a variable have date or time (T), height or depth (Z), latitude (Y), longitude (X), then those dimensions are **recommanded** to appear in the **relative order: T, Z, Y, X**. All other dimensions should be placed to the **left** of these spatiotemporal dimensions. 

A variable should be given the same name as a dimension **ONLY** when it is to be used as a coordinate variable. It is not necessary to provide a coordinate variable for each dimension. If no such variable is defined, the coordinate values of the dimension are assumed to be indices (0, 1, 2... for C programs, or 1, 2, 3... for FORTRAN programs). 

#### Coordinate Variables
Coordinate variables are variables containing coordinate data. They are **single-dimension** arrays that have the same size and **same name** as the dimension they are assigned to. These arrays contain **distinct** (all values are different), **monotonically** increasing or decreasing values. **Missing values** are **not allowed** in coordinate variables. *That is to say, only the variables with the same name as the dimensions are called coordinate variables.*

Coordinate variables includes variables for locating data in space and time, and variables related to other continuous geophysical quantities. Only the space and time variables receives special treatment by the conventions. 

Coordinate type (whether it is latitude, longitude, vertical, time, other geophysical or discrete quantities) of a coordinate variable can be identified by the values of some of its attributes: **axis**, **standard_name**, **units** and **positive**. 
* axis with values T, Z, X, Y. 
* standard_name such as "latitude", "longitude", "grid_latitude", "projection_x_coordinate". 

The methods of identifying coordinate types apply both to coordinate variables and to auxiliary coordinate variables.

There are also **scalar coordinate variables** (0-dimensional variables or single strings, should have a name other than any of the dimension names) and **auxiliary coordinate variables**. They associated with the **D-variables** through the **coordinates** attribute of data variables.

##### Auxiliary Coordinate Variables
Variable that contains coordinate data, but is not a coordinate variable. It doesn't have the same name(s) with its dimension(s). It can have any subset of the dimensions and is not necessarily monotonic. 

Auxiliary coordinate variable should not be given the name of any of its dimensions. 

**Multidimensional Coordinate Variables** are auxiliary coordinate variables that is multidimensional. 

*An application that is trying to find the latitude coordinate of a variable should first search for latitude coordinate variable. If there is not a latitude coordinate variable, it then check the auxiliary coordinate variables listed by the "coordinates" attribute. It can check the "axis" attribute valued as "Y" and "standard_name" valued as "latitude".*

There must not be more than one coordinate variables and auxiliary coordinate variables with "axis" attribute valued "X" (similar to others YZT). 

**In [CF-Conventions-1.7](https://github.com/cf-convention/cf-conventions/blob/master/ch05.adoc)([pdf version](http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/cf-conventions.pdf)) Chapter 5(page 31):**
If the coordinate variables for a horizontal grid are not longitude and latitude, it is recommended that they be **supplied in addition** to the required coordinates.

For example, the Cartesian coordinates of a **map projection** should be supplied as coordinate variables in addition to the required two-dimensional latitude and longitude variables that are identified via the "coordinates" attribute.

(Page 32 bottom)This faciliates processing of this data by generic applications that don't recognize the nultidimensional latitude and longitude coordinates. 

(Page 34)When the coordinate variables for a horizontal grid are not longitude and latitude, it is required that the true latitude and longitude coordinates be supplied via the "coordinates" attribute. 

In [None]:
dimensions:
  xc = 128 ;             
  yc = 64 ;
  lev = 18 ;
variables:
  float T(lev,yc,xc) ;
    T:long_name = "temperature" ;
    T:units = "K" ;
    T:coordinates = "lon lat" ;   # identifies lon and lat are auxiliary coordinate variables
  float xc(xc) ;    # supplied in addition to lat lon variables
    xc:axis = "X" ;
    xc:long_name = "x-coordinate in Cartesian system" ;
    xc:units = "m" ;
  float yc(yc) ;    # supplied in addition to lat lon variables
    yc:axis = "Y" ;
    yc:long_name = "y-coordinate in Cartesian system" ;
    yc:units = "m" ;
  float lev(lev) ;
    lev:long_name = "pressure level" ;
    lev:units = "hPa" ;
  float lon(yc,xc) ; # multidimentional coordinate variable
    lon:long_name = "longitude" ;
    lon:units = "degrees_east" ;
  float lat(yc,xc) ;
    lat:long_name = "latitude" ;
    lat:units = "degrees_north" ;

#### Data in Projected Coordinate Systems (PCS) 
Reference: CF-Conventions-1.7 Chapter 5.6

The coordinate variables should be supplied. The true latitude and longitude auxiliary coordinate variables (e.g. named as "lat" and "lon") should be supplied as well. Data variables should have "coordinated" attribute with the value "lat lon" or "lon lat". 

There should be a **grid mapping variable** to describe the mapping/projection via a collection of attributes. Data type of grid mapping variable can be anything, since there is no data. **"grid_mapping_name"** attribute is **required**. The valid values of "grid_mapping_name" and the attribute names for the parameters describing map projections are listed in [CF-Conventions-1.7 Appendix F:Grid Mappings](http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/cf-conventions.pdf)

The grid mapping variable names should be given as value of "grid_mapping" attribute of D variables.

In order to make use of a grid mapping to directly calculate latitude and longitude values, "standard_name" matching the grid mapping should be used in coordinate variables. e.g. "standard_name" values of "grid_longitude" and "grid_latitude" can be recognized as rotated longitude and latitude axes. 

"crs_wkt" is an optional attribute of grid mapping variables. It means [CRS well-known text format](http://www.opengeospatial.org/standards/wkt-crs) (CRS WKT or OGC WKT). There is a mapping from CF grid mapping attributes to CRS WKT elements [here](https://github.com/cf-convention/cf-conventions/wiki/Mapping-from-CF-Grid-Mapping-Attributes-to-CRS-WKT-Elements). It acts as a supplement to other grid mapping attributes, but cannot replace them. For duplicate or inconsistent information between crs_wkt and other grid mapping attributes, the later one will be used. 

In [None]:
dimensions:
  y = 228;
  x = 306;
  time = 41;
variables:
  int Lambert_Conformal;  # grid mapping variable indicates the lambert conformal conic projection
    Lambert_Conformal:grid_mapping_name = "lambert_conformal_conic";
    Lambert_Conformal:standard_parallel = 25.0;
    Lambert_Conformal:longitude_of_central_meridian = 265.0;
    Lambert_Conformal:latitude_of_projection_origin = 25.0;
  double y(y);
    y:units = "km";
    y:long_name = "y coordinate of projection";
    y:standard_name = "projection_y_coordinate"; # identifies the projection coordinate y
  double x(x);
    x:units = "km";
    x:long_name = "x coordinate of projection";
    x:standard_name = "projection_x_coordinate";
  double lat(y, x);
    lat:units = "degrees_north";
    lat:long_name = "latitude coordinate";
    lat:standard_name = "latitude"; # identifies latitude coordinate
  double lon(y, x);
    lon:units = "degrees_east";
    lon:long_name = "longitude coordinate";
    lon:standard_name = "longitude";
  int time(time);
    time:long_name = "forecast time";
    time:units = "hours since 2004-06-23T22:00:00Z";
  float Temperature(time, y, x);
    Temperature:units = "K";
    Temperature:long_name = "Temperature @ surface";
    Temperature:missing_value = 9999.0;
    Temperature:coordinates = "lat lon"; # identifies latitude and longitude auxiliary coordinate variables
    Temperature:grid_mapping = "Lambert_Conformal"; # identifies the grid mapping variable 

In [None]:
var_prj = exp_nc.createVariable('Lambert_Azimuthal_EA', 'i4')
var_prj.grid_mapping_name = 'lambert_azimuthal_equal_area'
var_prj.longitude_of_projection_origin = 0.
var_prj.latitude_of_projection_origin = 90.
var_prj.false_easting = 0.
var_prj.false_northing = 0.
var_prj.long_name = 'CRS definition'

##### Time Coordinate Variable
Time (year, month, day, hour, minute, second) is encoded with units
 ```time_unit since reference_time```. The encoding depends on the calendar. 
The acceptable units includes:
* strings of "day(days, d)", "hour(hours, hr, h)", "minute(minutes, min)", second(seconds, sec, s). "year" and "month" are also acceptable, but should be used with caution. Since the Udunits package defines a year to be exactly 365.2421 days, not a calendar year. month=year/12. 
* identifier "since"
* date, optionally with time and time zone. The default of time and time zone are "00:00:00 UTC"

In [None]:
from netCDF4 import date2num
var_time = exp_nc.createVariable('time', 'i4', ('time', ))
var_time.units = 'hours since 1990-01-01 00:00:00'
var_time.calendar = 'gregorian'
var_time.long_name = 'time'
var_time.standard_name = 'time'
var_time.axis = 'T'
# Add value to time variable
var_time[:] = date2num([date], units=var_time.units, calendar=var_time.calendar)

In [None]:
setattr(var_time, test, 'a')

#### Attributes (Conform CF-Conventions)

[CF-Conventions](http://cfconventions.org/latest.html): Climate and forecaset(CF) metadata convention. [Overview](http://cfconventions.org/Data/cf-documents/overview/viewgraphs.pdf)

*Some attributes are changed among different CF-Conventions.*
##### CF attributes for both C variables and D variables
* **units**: **Required** for all variables (except two C variables: boundary variable and climatology variable). Variables without dimensions may optionally include units. The units values must be chosen from [udunits.dat](https://www.unidata.ucar.edu/software/udunits/udunits-1/udunits.txt), which can make it recognized by UNIDATA's [Udunits package](https://www.unidata.ucar.edu/software/udunits/). 

*If standard_name is assigned, the units should match the cooresponding units of the standard name.*

        NOTE: 
          1. The acceptable units for longitude are "degrees_east", "degree_east", "degrees_E", "degree_E", "degreesE", and "degreeE". 
          2. Similarly, the acceptable units for latitude are "degrees_north", "degree_north", "degrees_N", "degree_N", "degreesN", and "degreeN". 
          3. Units for representing fractions or parts of a whole is "1". 
          4. Units for coordinates of latitude with respect to a Rotated Pole should be given units of "degrees", not "degrees_north" or equivalents.

* **long_name**: Optional, but **highly recommended** to be included either this or "standard_name". String for human reader.
* **standard_name**: identifies the quantity. It is a list of string, blank separated. Each string is a case sensitive standard name without whitespace. Standard names can be found in [table](http://cfconventions.org/Data/cf-standard-names/50/build/cf-standard-name-table.html). The table is expanding on request. 
* valid_max, valid_min, valid_range: indicates valid values of a variable.
* add_offset
* scale_factor
* comment

##### CF attributes for C variables
* **axis**: identifies space and time axes. Given one of the values X, Y, Z and T which stands for a longitude, latitude, vertical and time axis respectively.

for Time axes:
* **calendar**: calendar chosen from "gregorian" or "standard", "proleptic_gregorian", "noleap" or "365_day", "all_leap" or "366_day", "360_day", "julian", "none"
* leap_month, leap_year, month_lengths: for user defined calendar

for vertical axes:
* **positive**: direction of increasing vertical coordinate value. eg. "up" or "down". The value should be consistent with the value of "standard_name".
* computed_standard_name: from the standard name table, for computed vertical coordinate values, computed according to the formula in the definition
* formular_terms: identifieds variables that coorespond to the terms in a formula

* bounds: for boundary variable
* climatology: for climatology variable
* cf_role: roles of variables, geometries
* compress: compressed dimensions

##### CF attributes for D variables
* **coordinates**: a blank separated list of the names of auxiliary coordinate variables. There is no restriction on the order of the appearance of these auxiliary coordinate variables. 
* **grid_mapping**: given by a grid mapping variable name
* _FillValue, missing_value, actual_range
* ancillary_variables: a pointer to variables providing metadata about the individual data values, represents having relationships with other variables. eg. standard error, data quality information.
* cell_measures, cell_methods
* flag_masks, flag_meanings, flag_values: mutually exclusive coded values
* instituition, reference, source
* standard_error_multiplier

In [None]:
exp_nc.close()

### Other useful links:
* A R example of reading and writing a projected netCDF file [here](http://geog.uoregon.edu/bartlein/courses/geog490/week04-netCDFprojected.html)
* Some contents of the conventions doc are not included here, such as using labels (scalar coordinate variable), cells, boundries, climatological statistics, compression of data
* [Python for GeoScientists tutorial](https://github.com/koldunovn/python_for_geosciences)
* [Useful Python Tools](https://unidata.github.io/python-gallery/useful_tools.html)