# Saving `Datasets` and `DataArrays` to NetCDF

## Objectives

Introduce an easy method for saving `Datasets` and `DataArrays` objects to NetCDF

## Introduction

Saving your `Datasets` and `DataArrays` objects to NetCDF couldn't be simpler.  The `xarray` module that we've been using to load NetCDF files from disk provides methods for direct writing to NetCDF.

Here is the manual page on the subjet: http://xarray.pydata.org/en/stable/generated/xarray.Dataset.to_netcdf.html

The method is called `._to_netcdf( )` and it available to both `Datasets` and `DataArrays` objects.

### Syntax
``
your_dataset.to_netcdf('/your_filepath/your_netcdf_filename.nc')
``

## Saving a `Dataset`

First, let's load a `Dataset`

In [1]:
import numpy as np
import xarray as xr
import ecco_v4_py as ecco

ECCO_dir = '/Users/ifenty/ECCOv4/R3'

# Load all tiles of the LLC90 Grid    
data_dir= ECCO_dir + '/nctiles_grid/'    
var = 'GRID'
var_type = 'grid'

grid_all_tiles = ecco.load_all_tiles_from_netcdf(data_dir, 
                                                 var, var_type,
                                                 less_output=True)

# Load all tiles of SSH
data_dir= ECCO_dir + '/nctiles_monthly/SSH/'    
var = 'SSH'
var_type = 'c'
ssh_all_tiles = ecco.load_all_tiles_from_netcdf(data_dir, 
                                                var, var_type,
                                                less_output=True)

# minimize the metadata (optional)
data = xr.merge([ssh_all_tiles, grid_all_tiles])

Finished loading all 13 tiles of GRID
Finished loading all 13 tiles of SSH


### Saving a `Dataset`

Now that we've loaded *ssh_all_tiles*, let's save it in the *SSH* file directory.

In [2]:
new_filename = data_dir + 'data_all_tiles.nc'
print 'saving to ', new_filename

data.to_netcdf(path=new_filename)
print 'finished saving'

saving to  /Users/ifenty/ECCOv4/R3/nctiles_monthly/SSH/data_all_tiles.nc
finished saving


Now let's create a new `Dataset` that only including *SSH* and some grid parameter variables that are on the same $c$ grid points as *SSH*.

First, make three new `Datasets`, one for each variable.

In [3]:
# Extract these three DataArrays from the data object
# and simultaneously convert them to one-variable Dataset objects
SSH = data.SSH.to_dataset(name = 'SSH')
XC  = data.XC.to_dataset(name = 'XC')
YC  = data.YC.to_dataset(name = 'YC')

# Merge these Datasets
data_subset = xr.merge([SSH, XC , YC])

# Give data_subset the same metadata attributes as data
data_subset.attrs = data.attrs

# Examine the results
data_subset

<xarray.Dataset>
Dimensions:   (i: 90, j: 90, tile: 13, time: 288)
Coordinates:
  * time      (time) float64 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 ...
  * j         (j) float64 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 12.0 ...
  * i         (i) float64 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 12.0 ...
    tim       (time) datetime64[ns] 1992-01-16 1992-02-16 1992-03-16 ...
    timestep  (time) float64 732.0 1.428e+03 2.172e+03 2.892e+03 3.636e+03 ...
    lon_c     (tile, j, i) float64 -111.6 -111.3 -110.9 -110.5 -110.0 -109.3 ...
    lat_c     (tile, j, i) float64 -88.24 -88.38 -88.52 -88.66 -88.8 -88.94 ...
  * tile      (tile) int64 1 2 3 4 5 6 7 8 9 10 11 12 13
Data variables:
    SSH       (time, tile, j, i) float64 nan nan nan nan nan nan nan nan nan ...
    XC        (tile, j, i) float64 -111.6 -111.3 -110.9 -110.5 -110.0 -109.3 ...
    YC        (tile, j, i) float64 -88.24 -88.38 -88.52 -88.66 -88.8 -88.94 ...

and now we can easily save it:

In [4]:
new_filename = data_dir + 'data_subset_all_tiles.nc'
print 'saving to ', new_filename

data_subset.to_netcdf(path=new_filename)
print 'finished saving'

saving to  /Users/ifenty/ECCOv4/R3/nctiles_monthly/SSH/data_subset_all_tiles.nc
finished saving


### Loading our saved  `Dataset`

To verify that our worked let's load it up and compare with *data_subset*

In [5]:
new_filename = data_dir + 'data_subset_all_tiles.nc'

data_subset_loaded = xr.open_dataset(new_filename)

print 'finished loading'

finished loading


In [6]:
data_subset_loaded

<xarray.Dataset>
Dimensions:   (i: 90, j: 90, tile: 13, time: 288)
Coordinates:
  * time      (time) float64 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 ...
  * j         (j) float64 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 12.0 ...
  * i         (i) float64 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 12.0 ...
    tim       (time) datetime64[ns] ...
    timestep  (time) float64 ...
    lon_c     (tile, j, i) float64 ...
    lat_c     (tile, j, i) float64 ...
  * tile      (tile) int64 1 2 3 4 5 6 7 8 9 10 11 12 13
Data variables:
    SSH       (time, tile, j, i) float64 ...
    XC        (tile, j, i) float64 ...
    YC        (tile, j, i) float64 ...

The `equals` method does element-by-element comparison of `Dataset` and `DataArray` objects.

In [7]:
data_subset.equals(data_subset_loaded)

True

So nice to hear!

### Saving a `DataArray`

Let's save the *SSH* `DataArray` object inside *ssh_all_tiles*

In [8]:
new_filename = data_dir + 'ssh_dataArray.nc'
print 'saving to ', new_filename

ssh_all_tiles.to_netcdf(path=new_filename)
print 'finished saving'

saving to  /Users/ifenty/ECCOv4/R3/nctiles_monthly/SSH/ssh_dataArray.nc
finished saving


## Saving the results of calculations

### Calculations in the form of `DataArrays`
Often we would like to store the results of our calculations to disk.  If your operations are made at the level of `DataArray` objects (and not the lower `ndarray` level) then you can use these same methods to save your results.  All of the coordinates will be preserved (although attributes be lost).  Let's demonstrate by making a dummy calculation on SSH

$$SSH_{sq}(i) = SSH(i)^2$$

In [9]:
SSH_sq = data.SSH * data.SSH

SSH_sq

<xarray.DataArray 'SSH' (time: 288, tile: 13, j: 90, i: 90)>
array([[[[     nan, ...,      nan],
         ...,
         [2.200343, ..., 1.833477]],

        ...,

        [[0.508914, ...,      nan],
         ...,
         [2.147456, ...,      nan]]],


       ...,


       [[[     nan, ...,      nan],
         ...,
         [1.984065, ..., 1.741136]],

        ...,

        [[0.336955, ...,      nan],
         ...,
         [1.933404, ...,      nan]]]])
Coordinates:
  * time      (time) float64 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 ...
  * j         (j) float64 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 12.0 ...
  * i         (i) float64 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 12.0 ...
    tim       (time) datetime64[ns] 1992-01-16 1992-02-16 1992-03-16 ...
    timestep  (time) float64 732.0 1.428e+03 2.172e+03 2.892e+03 3.636e+03 ...
    lon_c     (tile, j, i) float64 -111.6 -111.3 -110.9 -110.5 -110.0 -109.3 ...
    lat_c     (tile, j, i) float64 -88.24 -88.38 -88.52

*SSH_sq* is itself a `DataArray`.  If instead we had operated on the `numpy` arrays stored in data.SSH directly we would have been returned a `ndarray`

Before saving, let's give our new *SSH_sq* variable a better name and descriptive attributes. 

In [10]:
SSH_sq.name = 'SSH^2'
SSH_sq.attrs['long_name'] = 'Square of Surface Height Anomaly'
SSH_sq.attrs['units'] = 'm^2'
SSH_sq.attrs['grid_layout'] = 'original llc'

# Let's see the result
SSH_sq

<xarray.DataArray 'SSH^2' (time: 288, tile: 13, j: 90, i: 90)>
array([[[[     nan, ...,      nan],
         ...,
         [2.200343, ..., 1.833477]],

        ...,

        [[0.508914, ...,      nan],
         ...,
         [2.147456, ...,      nan]]],


       ...,


       [[[     nan, ...,      nan],
         ...,
         [1.984065, ..., 1.741136]],

        ...,

        [[0.336955, ...,      nan],
         ...,
         [1.933404, ...,      nan]]]])
Coordinates:
  * time      (time) float64 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 ...
  * j         (j) float64 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 12.0 ...
  * i         (i) float64 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 12.0 ...
    tim       (time) datetime64[ns] 1992-01-16 1992-02-16 1992-03-16 ...
    timestep  (time) float64 732.0 1.428e+03 2.172e+03 2.892e+03 3.636e+03 ...
    lon_c     (tile, j, i) float64 -111.6 -111.3 -110.9 -110.5 -110.0 -109.3 ...
    lat_c     (tile, j, i) float64 -88.24 -88.38 -88.

much better!  Now we'll save.

In [11]:
new_filename = data_dir + 'ssh_sq_DataArray.nc'
print 'saving to ', new_filename

SSH_sq.to_netcdf(path=new_filename)
print 'finished saving'

saving to  /Users/ifenty/ECCOv4/R3/nctiles_monthly/SSH/ssh_sq_DataArray.nc
finished saving



### Calculations in the form of `numpy ndarrays`

If calculations are made at the `ndarray` level then the results will also be `ndarrays`.

In [12]:
SSH_dummy_ndarray = data.SSH.values *  data.SSH.values

type(SSH_dummy_ndarray)

numpy.ndarray

You'll need to find different methods to save these results to NetCDF files, one of which is described here: http://pyhogs.github.io/intro_netcdf4.html

## Summary

Now you know how to save `Datasets` and `DataArrays` to disk as NetCDF!