# Reading metadata with netCDF4

**Link to documentation:** http://unidata.github.io/netcdf4-python/
**Link to netCDF4 plot tutorial:** https://github.com/M6ASP/getting-started/blob/master/Reading%20data%20with%20netCDF4.ipynb

### Environment installation

`conda create -n nc4 python=3.6 jupyter nb_conda_kernels numpy scipy matplotlib netcdf4`

Remember to change the kernel in the dropdown menu in Jupyter notebook to that of the newly created environment.

As always, we first run `%pylab inline` to embed figures into the notebook. Additionally, we `import` the `scipy.io.netcdf4` package to let us import data from another file.

In [1]:
%pylab inline
import netCDF4

Populating the interactive namespace from numpy and matplotlib


## Understanding the Dataset
We read the dataset using the `Dataset` function from netCDF4. It is interesting to note that the variables here include all of the metadata that were previously accessed through looping through each variables' `._attributes`.

Information on how to do that with netcdf is here as well as descriptions of metadata is here: https://github.com/M6ASP/getting-started/blob/master/Reading%20data%20with%20scipy.io.netcdf.ipynb

In [2]:
from netCDF4 import Dataset
ncf = Dataset('example_data/WOA13_annual_SST_nc3_classic.nc', "r", format="NetCDF4")

### An Overview of the Dataset

We can see that all the global metadata associated with ncf as representing the file as a netCDF4 dataset allows us to see all of this information upfront. For example, we see that the convention number is specified here- something that may be of importance to using netCDF4 in conjunction to other modules. While the majority of the information here are strings (e.g. summary, geospatial_lat_units, publisher_name, etc.), there are also quantitative measures present (e.g. geospatial_lat_min, geospatial_vertical_max, etc.) that represent that minimum and maximum values for latitutde and longitude.

Especially when first examining the data, netcdf4 makes it simple for us to glean important information at a high level. 

In [3]:
ncf

<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF3_CLASSIC data model, file format NETCDF3):
    Conventions: CF-1.6
    title: World Ocean Atlas 2013 version 2 : sea_surface_temperature Annual 1.00 degree
    summary: Climatological mean temperature for the global ocean from in situ profile data
    references: Locarnini, R. A., A. V. Mishonov, J. I. Antonov, T. P. Boyer, H. E. Garcia, O. K. Baranova, M. M. Zweng, C. R. Paver, J. R. Reagan,D. R. Johnson, M. Hamilton, and D. Seidov , 2013: World Ocean Atlas 2013, Volume 1: Temperature. S. Levitus, Ed., A. Mishonov technical editor, NOAA Atlas NESDIS 73.
    institution: National Oceanographic Data Center(NODC)
    comment: global climatology as part of the World Ocean Atlas project
    id: woa13_decav_t00_01.ncv2.0
    naming_authority: gov.noaa.nodc
    standard_name_vocabulary: NetCDF Climate and Forecast (CF) Metadata Convention Standard Name Table v29
    sea_name: World-Wide Distribution
    time_coverage_start: 0000-01-01
   

### Variables

The variables in the dataset are actually written in the dataset display from above. To get a more in depth view of what each of the variables in the dataset represent, we can run the cell block below to see the ordered dictionary that contains the variables.

We see that there are three variables: `lat`, `lon` and `tos`, representing latitude, longitude, and sea surface temperature respectively. We see each variable's datatype as well as callable attributes that will later be used when accessing more specific information as well as plotting (https://github.com/M6ASP/getting-started/blob/master/Reading%20data%20with%20netCDF4.ipynb).

In [16]:
ncf.variables

OrderedDict([('lat', <class 'netCDF4._netCDF4.Variable'>
              float32 lat(lat)
                  standard_name: latitude
                  long_name: latitude
                  units: degrees_north
                  axis: Y
                  bounds: lat_bnds
              unlimited dimensions: 
              current shape = (180,)
              filling off), ('lon', <class 'netCDF4._netCDF4.Variable'>
              float32 lon(lon)
                  standard_name: longitude
                  long_name: longitude
                  units: degrees_east
                  axis: X
                  bounds: lon_bnds
              unlimited dimensions: 
              current shape = (360,)
              filling off), ('tos', <class 'netCDF4._netCDF4.Variable'>
              float32 tos(lat, lon)
                  standard_name: sea_surface_temperature
                  long_name: Objectively analyzed mean fields for sea_surface_temperature at standard depth levels.
                  c

We can also look at individual variables and assign variables to them to make accessing the data easier.

In [8]:
tos = ncf.variables['tos']
tos

<class 'netCDF4._netCDF4.Variable'>
float32 tos(lat, lon)
    standard_name: sea_surface_temperature
    long_name: Objectively analyzed mean fields for sea_surface_temperature at standard depth levels.
    coordinates: time lat lon depth
    cell_methods: area: mean depth: mean time: mean
    grid_mapping: crs
    units: degrees_celsius
    _FillValue: 9.96921e+36
unlimited dimensions: 
current shape = (180, 360)
filling off

### Accessing variable attributes

We can access the attributes from above. From the `current shape` of `tos`, we know that it is an array. To access the data in the array, we can do an empty array slice to get the array data.

In [13]:
print("Standard name:", tos.standard_name)
print("Units:", tos.units)
print("Array data:", tos[:])

Standard name: sea_surface_temperature
Units: degrees_celsius
Array data: [[-- -- -- ... -- -- --]
 [-- -- -- ... -- -- --]
 [-- -- -- ... -- -- --]
 ...
 [-1.5491100549697876 -1.5532100200653076 -1.5553100109100342 ...
  -1.5446100234985352 -1.5458099842071533 -1.546910047531128]
 [-1.5583100318908691 -1.5600099563598633 -1.561110019683838 ...
  -1.5565099716186523 -1.5568000078201294 -1.5574100017547607]
 [-1.5742100477218628 -1.5742100477218628 -1.5742100477218628 ...
  -1.5742100477218628 -1.5742100477218628 -1.5742100477218628]]


Additionally, if we just look at `tos[:]`, we also see that netCDF4 has already masked the `_FillValue`s for us already as it is a `masked_array`.

In [14]:
tos[:]

masked_array(
  data=[[--, --, --, ..., --, --, --],
        [--, --, --, ..., --, --, --],
        [--, --, --, ..., --, --, --],
        ...,
        [-1.5491100549697876, -1.5532100200653076, -1.5553100109100342,
         ..., -1.5446100234985352, -1.5458099842071533,
         -1.546910047531128],
        [-1.5583100318908691, -1.5600099563598633, -1.561110019683838,
         ..., -1.5565099716186523, -1.5568000078201294,
         -1.5574100017547607],
        [-1.5742100477218628, -1.5742100477218628, -1.5742100477218628,
         ..., -1.5742100477218628, -1.5742100477218628,
         -1.5742100477218628]],
  mask=[[ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        ...,
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False]],
  fill_value=9.96921e+36