# Saving `Datasets` and `DataArrays` to NetCDF

## Objectives

Introduce an easy method for saving `Datasets` and `DataArrays` objects to NetCDF

## Introduction

Saving your `Datasets` and `DataArrays` objects to NetCDF files couldn't be simpler.  The `xarray` module that we've been using to load NetCDF files provides methods for saving your `Datasets` and `DataArrays` as NetCDF files.

Here is the manual page on the subjet: http://xarray.pydata.org/en/stable/generated/xarray.Dataset.to_netcdf.html

The method `._to_netcdf( )` is available to **both** `Datasets` and `DataArrays` objects.  So useful!

### Syntax
``
your_dataset.to_netcdf('/your_filepath/your_netcdf_filename.nc')
``

## Saving an existing `Dataset` to NetCDF

First, let's set up the environment and load a `Dataset`

In [1]:
## SET UP ENVIRONMENT
import numpy as np
import xarray as xr
import sys
sys.path.append('/Users/ifenty/ECCOv4-py')
import ecco_v4_py as ecco
import matplotlib.pyplot as plt
import json
%matplotlib inline

load a single tile, monthly averaged *THETA* for March 2010 for model tile 2. 

In [2]:
## LOAD NETCDF FILE

# define a high-level directory for ECCO fields
ECCO_dir = '/Users/ifenty/eccov4r3_native_grid_netcdf_tiles/'
# directory of the file
data_dir= ECCO_dir + '/mon_mean/THETA/2010/03/'
# filename
fname = 'THETA_2010_03_tile_02.nc'
# load the dataset file
theta_dataset = xr.open_dataset(data_dir + fname).load()

Now that we've loaded *theta_dataset*, let's save it in the **/tmp** file directory with a new name.

In [3]:
new_filename_1 = '/tmp/test_output.nc'
print 'saving to ', new_filename_1
theta_dataset.to_netcdf(path=new_filename_1)
print 'finished saving'

saving to  /tmp/test_output.nc
finished saving


*It's really that simple!*

## Saving a new custom ``Dataset`` to NetCDF


Now let's create a new custom `Dataset` that with *THETA*, *SSH* and model grid parameter variables for a few tiles and depth level 10.

In [4]:
data_dir= ECCO_dir + '/mon_mean/'

SSH_THETA_201003 = \
    ecco.recursive_load_ecco_var_from_tiles_nc(data_dir, \
                                              ['SSH', 'THETA'], \
                                              tiles_to_load = [0,1,2],
                                              years_to_load = 2010)
    
grid = ecco.load_ecco_grid_from_tiles_nc(ECCO_dir + '/grid')    
grid.close()

custom_dataset = xr.merge([SSH_THETA_201003, grid])

searching EXFqnet for variables 
searching SALT for variables 
searching SSH for variables 
searching THETA for variables 
searching UVEL for variables 
searching VVEL for variables 
located directories with SSH 
located directories with THETA 


and now we can easily save it:

In [5]:
new_filename_2 = '/tmp/test_output_2.nc'
print 'saving to ', new_filename_2
custom_dataset.to_netcdf(path=new_filename_2)
custom_dataset.close()
print 'finished saving'

saving to  /tmp/test_output_2.nc


  compute=compute)


finished saving


In [6]:
custom_dataset

<xarray.Dataset>
Dimensions:    (i: 90, i_g: 90, j: 90, j_g: 90, k: 50, k_l: 50, k_p1: 51, k_u: 50, nv: 2, tile: 13, time: 12)
Coordinates:
  * tile       (tile) int64 0 1 2 3 4 5 6 7 8 9 10 11 12
  * j          (j) int32 0 1 2 3 4 5 6 7 8 9 ... 80 81 82 83 84 85 86 87 88 89
  * i          (i) int32 0 1 2 3 4 5 6 7 8 9 ... 80 81 82 83 84 85 86 87 88 89
  * k          (k) int32 0 1 2 3 4 5 6 7 8 9 ... 40 41 42 43 44 45 46 47 48 49
    Z          (k) float32 -5.0 -15.0 -25.0 -35.0 ... -5039.25 -5461.25 -5906.25
    PHrefC     (k) float32 49.05 147.15 245.25 ... 49435.043 53574.863 57940.312
    drF        (k) float32 10.0 10.0 10.0 10.0 10.0 ... 387.5 410.5 433.5 456.5
    XC         (tile, j, i) float32 dask.array<shape=(13, 90, 90), chunksize=(2, 90, 90)>
    YC         (tile, j, i) float32 dask.array<shape=(13, 90, 90), chunksize=(2, 90, 90)>
    rA         (tile, j, i) float32 dask.array<shape=(13, 90, 90), chunksize=(2, 90, 90)>
    hFacC      (tile, k, j, i) float32 dask.array<shap

## Verifying our new NetCDF files

To verify that ``to_netcdf()`` worked, load them and compare with the originals.

### Compare *theta_dataset* with *dataset_1*

In [7]:
# the first test dataset
dataset_1 = xr.open_dataset(new_filename_1)
# release the file handle (not necessary but generally a good idea)
dataset_1.close()

The `np.allclose` method does element-by-element comparison of variables

In [8]:
# loop through the data variables in dataset_1
for key in dataset_1.keys():
    print ('checking %s ' % key)
    print ('-- identical in dataset_1 and theta_dataset : %s' % \
           np.allclose(dataset_1[key], theta_dataset[key], equal_nan=True))
    
# note: ``equal_nan`` means nan==nan (default nan != nan)

checking THETA 
-- identical in dataset_1 and theta_dataset : True


*THETA* is the same in both datasets.

### Compare *custom_dataset* with *dataset_2*

In [9]:
# our custom dataset
dataset_2 = xr.open_dataset(new_filename_2)
dataset_2.close()
print 'finished loading'

finished loading


In [10]:
for key in dataset_2.keys():
    print ('checking %s ' % key)
    print ('-- identical in dataset_2 and custom_dataset : %s'\
           % np.allclose(dataset_2[key], custom_dataset[key], equal_nan=True))

checking THETA 
-- identical in dataset_2 and custom_dataset : True
checking SSH 
-- identical in dataset_2 and custom_dataset : True


*THETA* and *SSH* are the same in both datasets.

So nice to hear!

## Saving the results of calculations

### Calculations in the form of `DataArrays`
Often we would like to store the results of our calculations to disk.  If your operations are made at the level of `DataArray` objects (and not the lower `ndarray` level) then you can use these same methods to save your results.  All of the coordinates will be preserved (although attributes be lost).  Let's demonstrate by making a dummy calculation on SSH

$$SSH_{sq}(i) = SSH(i)^2$$

In [11]:
SSH_sq = custom_dataset.SSH * custom_dataset.SSH

SSH_sq

<xarray.DataArray 'SSH' (time: 12, tile: 13, j: 90, i: 90)>
dask.array<shape=(12, 13, 90, 90), dtype=float32, chunksize=(1, 1, 90, 90)>
Coordinates:
  * tile     (tile) int64 0 1 2 3 4 5 6 7 8 9 10 11 12
  * j        (j) int32 0 1 2 3 4 5 6 7 8 9 10 ... 80 81 82 83 84 85 86 87 88 89
  * i        (i) int32 0 1 2 3 4 5 6 7 8 9 10 ... 80 81 82 83 84 85 86 87 88 89
    XC       (tile, j, i) float32 dask.array<shape=(13, 90, 90), chunksize=(2, 90, 90)>
    YC       (tile, j, i) float32 dask.array<shape=(13, 90, 90), chunksize=(2, 90, 90)>
    rA       (tile, j, i) float32 dask.array<shape=(13, 90, 90), chunksize=(2, 90, 90)>
    iter     (time) int32 158532 159204 159948 160668 ... 165084 165804 166548
  * time     (time) datetime64[ns] 2010-01-16T12:00:00 ... 2010-12-16T12:00:00
    Depth    (tile, j, i) float32 dask.array<shape=(13, 90, 90), chunksize=(2, 90, 90)>

*SSH_sq* is itself a `DataArray`.

Before saving, let's give our new *SSH_sq* variable a better name and descriptive attributes. 

In [12]:
SSH_sq.name = 'SSH^2'
SSH_sq.attrs['long_name'] = 'Square of Surface Height Anomaly'
SSH_sq.attrs['units'] = 'm^2'

# Let's see the result
SSH_sq

<xarray.DataArray 'SSH^2' (time: 12, tile: 13, j: 90, i: 90)>
dask.array<shape=(12, 13, 90, 90), dtype=float32, chunksize=(1, 1, 90, 90)>
Coordinates:
  * tile     (tile) int64 0 1 2 3 4 5 6 7 8 9 10 11 12
  * j        (j) int32 0 1 2 3 4 5 6 7 8 9 10 ... 80 81 82 83 84 85 86 87 88 89
  * i        (i) int32 0 1 2 3 4 5 6 7 8 9 10 ... 80 81 82 83 84 85 86 87 88 89
    XC       (tile, j, i) float32 dask.array<shape=(13, 90, 90), chunksize=(2, 90, 90)>
    YC       (tile, j, i) float32 dask.array<shape=(13, 90, 90), chunksize=(2, 90, 90)>
    rA       (tile, j, i) float32 dask.array<shape=(13, 90, 90), chunksize=(2, 90, 90)>
    iter     (time) int32 158532 159204 159948 160668 ... 165084 165804 166548
  * time     (time) datetime64[ns] 2010-01-16T12:00:00 ... 2010-12-16T12:00:00
    Depth    (tile, j, i) float32 dask.array<shape=(13, 90, 90), chunksize=(2, 90, 90)>
Attributes:
    long_name:  Square of Surface Height Anomaly
    units:      m^2

much better!  Now we'll save.

In [13]:
new_filename_3 = '/tmp/ssh_sq_DataArray.nc'
print 'saving to ', new_filename_3

SSH_sq.to_netcdf(path=new_filename_3)
print 'finished saving'

saving to  /tmp/ssh_sq_DataArray.nc
finished saving



### Calculations in the form of `numpy ndarrays`

If calculations are made at the `ndarray` level then the results will also be `ndarrays`.

In [14]:
SSH_dummy_ndarray = custom_dataset.SSH.values *  custom_dataset.SSH.values

type(SSH_dummy_ndarray)

numpy.ndarray

You'll need to use different methods to save these results to NetCDF files, one of which is described here: http://pyhogs.github.io/intro_netcdf4.html

## Summary

Saving `Datasets` and `DataArrays` to disk as NetCDF files is fun and easy with ``xarray``!