# Exporting data to NetCDF files <img align="right" src="../Supplementary_data/DE_Africa_Logo_Stacked_RGB_small.jpg">

* **Products used:** 
[ga_ls8c_gm_2_annual](https://explorer.digitalearth.africa/ga_ls8c_gm_2_annual)

## Background
NetCDF is a file format for storing multidimensional scientific data. 
This file format supports datasets containing multiple observation dates, as well as multiple bands. 
It is a native format for storing the `xarray` datasets that are produced by Open Data Cube, i.e. by `dc.load` commands. 

NetCDF files should follow [Climate and Forecast (CF) metadata conventions](http://cfconventions.org/) for the description of Earth sciences data.
By providing metadata such as geospatial coordinates and sensor information in the same file as the data, CF conventions allow NetCDF files to be "self-describing".
This makes CF-compliant NetCDFs a useful way to save multidimensional data loaded from Digital Earth Africa, as the data can later be loaded with all the information required for further analysis.

The `xarray` library which underlies the Open Data Cube (and hence Digital Earth Africa) was specifically designed for representing NetCDF files in Python. 
However, some geospatial metadata is represented quite differently between the NetCDF-CF conventions versus the GDAL (or proj4) model that is common to most geospatial software (including ODC, e.g. for reprojecting raster data when necessary). 
The main difference between `to_netcdf` (in `xarray` natively) and `write_dataset_to_netcdf` (provided by `datacube`) is that the latter is able to appropriately serialise the *coordinate reference system* object which is associated to the dataset.

## Description
In this notebook we will load some data from Digital Earth Africa and then write it to a (CF-compliant) NetCDF file using the `write_dataset_to_netcdf` function provided by `datacube`. 
We will then verify the file was saved correctly, and (optionally) clean up.

---

## Getting started

To run this analysis, run all the cells in the notebook, starting with the "Load packages" cell. 

### Load packages

In [13]:
%matplotlib inline

import sys
import datacube
import xarray as xr
from datacube.drivers.netcdf import write_dataset_to_netcdf

sys.path.append('../Scripts')
from deafrica_datahandling import mostcommon_crs

### Connect to the datacube

In [2]:
dc = datacube.Datacube(app='Exporting_NetCDFs')

## Load data from the datacube
Here we load a sample dataset from the DE Africa Landsat-8 Annual Geomedian product (`ga_ls8c_gm_2_annual`).
The loaded data is multidimensional, and contains one time-step (2018) and six satellite bands (`blue`, `green`, `red`, `nir`, `swir_1`, `swir_2`).

In [15]:
lat, lon = 13.94, -16.54
buffer = 0.1

# Create a reusable query
query = {
    'x': (lon-buffer, lon+buffer),
    'y': (lat+buffer, lat-buffer),
    'time': ('2018'),
    'resolution': (-30, 30),
}

# Identify the most common projection system in the input query
output_crs = mostcommon_crs(dc=dc, product='ga_ls8c_gm_2_annual', query=query)

# Load data from the datacube
ds = dc.load(product='ga_ls8c_gm_2_annual',
             output_crs=output_crs,
             **query)

# Print output data
print(ds)

<xarray.Dataset>
Dimensions:  (time: 1, x: 644, y: 827)
Coordinates:
  * time     (time) datetime64[ns] 2018-07-02T11:59:59.999500
  * y        (y) float64 1.774e+06 1.774e+06 1.774e+06 ... 1.749e+06 1.749e+06
  * x        (x) float64 -1.606e+06 -1.605e+06 ... -1.586e+06 -1.586e+06
Data variables:
    red      (time, y, x) uint16 9828 9869 9865 10042 ... 9431 9038 9385 10361
    green    (time, y, x) uint16 9791 9841 9841 9993 ... 9428 9014 9219 9948
    blue     (time, y, x) uint16 8564 8585 8600 8698 ... 8296 8155 8420 8910
    nir      (time, y, x) uint16 15239 15527 14783 15285 ... 10802 10059 11351
    swir_1   (time, y, x) uint16 11271 11332 11162 11340 ... 9346 9046 9538
    swir_2   (time, y, x) uint16 9601 9635 9519 9680 ... 9363 8588 8439 8710
Attributes:
    crs:      epsg:6933


## Export to a NetCDF file
To export a CF-compliant NetCDF file, we use the `write_dataset_to_netcdf` function:

In [47]:
write_dataset_to_netcdf(ds, 'output_netcdf.nc')

RuntimeError: NetCDF: HDF error

That's all.
The file has now been produced, and stored in the current working directory.

## Reading back from saved NetCDF

Let's start just by confirming the file now exists.
We can use the special `!` command to run command line tools directly within a Jupyter notebook. 
In the example below, `! ls *.nc` runs the `ls` shell command, which will give us a list of any files in the NetCDF file format (i.e. with file names ending with `.nc`).

> For an introduction to using shell commands in Jupyter, [see the guide here](https://jakevdp.github.io/PythonDataScienceHandbook/01.05-ipython-and-shell-commands.html).

In [48]:
! ls *.nc

output_netcdf.nc


We could inspect this file using external utilities such as `gdalinfo` or `ncdump`, or open it for visualisation e.g. in `QGIS`.

We can also load the file back into Python using `xarray`:

In [49]:
# Load the NetCDF from file
reloaded_ds = xr.open_dataset('output_netcdf.nc')

# Print loaded data
print(reloaded_ds)

OSError: [Errno -101] NetCDF: HDF error: b'/home/jovyan/dev/deafrica-sandbox-notebooks/Frequently_used_code/output_netcdf.nc'

We can now use this reloaded dataset just like the original dataset, for example by plotting one of its colour bands:

In [50]:
reloaded_ds.red.plot(col='time')

NameError: name 'reloaded_ds' is not defined

### Clean-up
To remove the saved NetCDF file that we created, run the cell below. This is optional.

In [51]:
! rm output_netcdf.nc

***

## Additional information

**License:** The code in this notebook is licensed under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0). 
Digital Earth Africa data is licensed under the [Creative Commons by Attribution 4.0](https://creativecommons.org/licenses/by/4.0/) license.

**Contact:** If you need assistance, please post a question on the [Open Data Cube Slack channel](http://slack.opendatacube.org/) or on the [GIS Stack Exchange](https://gis.stackexchange.com/questions/ask?tags=open-data-cube) using the `open-data-cube` tag (you can view previously asked questions [here](https://gis.stackexchange.com/questions/tagged/open-data-cube)).
If you would like to report an issue with this notebook, you can file one on [Github](https://github.com/digitalearthafrica/deafrica-sandbox-notebooks).

**Last modified:** February 2020

**Compatible datacube version:** 

In [52]:
print(datacube.__version__)

1.7+253.ga031f3f4.dirty


## Tags
Browse all available tags on the DE Africa User Guide's [Tags Index](https://) (placeholder as this does not exist yet)