<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Using-NetCDF4-Compression-with-CDMS" data-toc-modified-id="Using-NetCDF4-Compression-with-CDMS-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Using NetCDF4 Compression with CDMS<a id="top"></a></a></span></li><li><span><a href="#Table-Of-Contents" data-toc-modified-id="Table-Of-Contents-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Table Of Contents</a></span></li><li><span><a href="#Preparing-The-Notebook" data-toc-modified-id="Preparing-The-Notebook-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Preparing The Notebook<a id="prepare"></a></a></span></li><li><span><a href="#Default-Settings" data-toc-modified-id="Default-Settings-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Default Settings<a id="defaults"></a></a></span></li><li><span><a href="#Turning-Off-Compression" data-toc-modified-id="Turning-Off-Compression-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Turning Off Compression<a id="nocompress"></a></a></span></li><li><span><a href="#Pure-NetCDF3" data-toc-modified-id="Pure-NetCDF3-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Pure NetCDF3<a id="netcdf3"></a></a></span></li><li><span><a href="#NetCDF4-non-classic" data-toc-modified-id="NetCDF4-non-classic-7"><span class="toc-item-num">7&nbsp;&nbsp;</span>NetCDF4 non classic<a id="nc4_no_classic"></a></a></span></li><li><span><a href="#Using-Shuffling" data-toc-modified-id="Using-Shuffling-8"><span class="toc-item-num">8&nbsp;&nbsp;</span>Using Shuffling<a id="shuffle"></a></a></span></li><li><span><a href="#Controling-Deflate-Level" data-toc-modified-id="Controling-Deflate-Level-9"><span class="toc-item-num">9&nbsp;&nbsp;</span>Controling Deflate Level<a id="deflate"></a></a></span></li><li><span><a href="#Summarizing-All-Options" data-toc-modified-id="Summarizing-All-Options-10"><span class="toc-item-num">10&nbsp;&nbsp;</span>Summarizing All Options<a id="summary"></a></a></span></li></ul></div>

# Using NetCDF4 Compression with CDMS<a id="top"></a>


CDMS2 writes out data using the [NetCDF library](https://www.unidata.ucar.edu/software/netcdf/)

NetCDF4 allows for file compression, a good blog about NetCDF4 and compression can be found [here](http://www.unidata.ucar.edu/blogs/developer/entry/netcdf_compression)

From this blog:

*"The netCDF-4 libraries inherit the capability for data compression from the HDF5 storage layer underneath the netCDF-4 interface. Linking a program that uses netCDF to a netCDF-4 library allows the program to read compressed data without changing a single line of the program source code."*

and

*"Also, we're only dealing with lossless compression"*

This Notebook shows how to control NetCDF4 compression (shuffling/deflating) capabilities via cdms2.

You can download the Notebook [here](NetCDF4_Compression.ipynb)

# Table Of Contents

- [Preparing The Notebook](#prepare)
- [Defaults Setting](#defaults)
- [Turning Off Compression](#nocompress)
- [Pure NetCDF3](#netcdf3)
- [Netcdf4 non classic format](#nc4_no_classic)
- [Shuffling](#shuffle)
- [Deflate](#deflate)
- [Summarizing most options](#summary)



[Back To Top](#top)

# Preparing The Notebook<a id="prepare"></a>

In order to look at a NetCDF content the easiest way is to use [ncdump](https://www.unidata.ucar.edu/software/netcdf/netcdf-4/newdocs/netcdf/ncdump.html). The following function helps us do a line call within Python, for Notebook clarity.

We also prepare some random data

[Back To Top](#top)

In [1]:
from __future__ import print_function
import subprocess
import shlex
import numpy
import os
import io
import time

# Get file size
def size_it(filename):
    statinfo = os.stat(filename)
    return statinfo.st_size

# Write and return time
def dump(data,filename="example.nc"):
    start = time.time()
    f = cdms2.open(filename,"w")
    f.write(data,id="data")
    f.close()
    return time.time()-start,size_it(filename)

class HTML(object):
    def __init__(self,html):
        self.html = html
    def _repr_html_(self):
        return self.html


# Nice html output for ncdump
class NCINFO(object):
    def __init__(self, filename, variable=None, options=""):
        self.filename = filename
        self.variable = variable
        self.options = options
    def _repr_html_(self):
        out = self.nc_info()
        lines = []
        for l in out.split("\n"):
            for kw in ["chunk","deflate","classic","netcdf4","netcdf-4"]:
                if l.lower().find(kw)>-1:
                    l = "<b>{0}</b>".format(l)
            lines.append(l.replace("\t","&emsp;&emsp;"))
        return "{0}".format("<br>".join(lines))
    def nc_info(self):
        """calls ncdump on file
    Can opass a variable or optional ncdump arguments
    Default call `ncdump -hs filename`"""
        with io.BytesIO() as out:
            ncdumpOptions = "-hs {options}".format(options=self.options)
            if self.variable is not None:
                ncdumpOptions += "-v {variable}".format(self.variable)
            cmd = "ncdump {options} {file}".format(options=ncdumpOptions, file=self.filename)
            print("Runnning {0}".format(cmd),file=out)
            cmd = shlex.split(cmd)
            p = subprocess.Popen(cmd,stdout=subprocess.PIPE,stderr=subprocess.PIPE)
            o, e = p.communicate()
            print("-------",file=out)
            print(o,file=out)
            print("-------",file=out)
            print("File Size {0} bytes".format(size_it(self.filename)),file=out)
            return out.getvalue()
        
import requests
def download(fnm):
    r = requests.get("https://uvcdat.llnl.gov/cdat/sample_data/%s" % fnm,stream=True)
    with open(fnm,"wb") as f:
        for chunk in r.iter_content(chunk_size=1024):
            if chunk:  # filter local_filename keep-alive new chunks
                f.write(chunk)

download("clt.nc")
data = numpy.random.random((120,180,360))
# Random data do not compress well at all, switching to 0/1
data = numpy.greater(data,.5).astype(numpy.float)

# Default Settings<a id="defaults"></a>

By default cdms writes out data in NetCDF4 ***classic*** with no ***shuffling*** and a ***deflate*** level of 1

[Back To Top](#top)

To access the netcdf value used to write data out use the following commands:

In [2]:
import cdms2
print("NetCDF4? ",cdms2.getNetcdf4Flag())
print("NetCDF Classic?",cdms2.getNetcdfClassicFlag())
print("NetCDF4 Shuffling",cdms2.getNetcdfShuffleFlag())
print("NetCDF4 Deflate?",cdms2.getNetcdfDeflateFlag())
print("NetCDF4 Deflate Level?",cdms2.getNetcdfDeflateLevelFlag())

NetCDF4?  1
NetCDF Classic? 1
NetCDF4 Shuffling 0
NetCDF4 Deflate? 1
NetCDF4 Deflate Level? 1


These values are read in at the time you **open** the file for writing

Note the **BOLD** lines

In [3]:
dump(data)
NCINFO("example.nc")

You can query different values of compression using the functions:
cdms2.getNetcdfShuffleFlag() returning 1 if shuffling is enabled, 0 otherwise
cdms2.getNetcdfDeflateFlag() returning 1 if deflate is used, 0 otherwise
cdms2.getNetcdfDeflateLevelFlag() returning the level of compression for the deflate method

If you want to turn that off or set different values of compression use the functions:
value = 0
cdms2.setNetcdfShuffleFlag(value) ## where value is either 0 or 1
cdms2.setNetcdfDeflateFlag(value) ## where value is either 0 or 1
cdms2.setNetcdfDeflateLevelFlag(value) ## where value is a integer between 0 and 9 included

To produce NetCDF3 Classic files use:
cdms2.useNetCDF3()
To Force NetCDF4 output with classic format and no compressing use:
cdms2.setNetcdf4Flag(1)
NetCDF4 file with no shuffling or deflate and noclassic will be open for parallel i/o


# Turning Off Compression<a id="nocompress"></a>

[Back to Top](#top)

We can use no compression by runnnig

In [4]:
value = 0
cdms2.setNetcdfShuffleFlag(value) ## where value is either 0 or 1
cdms2.setNetcdfDeflateFlag(value) ## where value is either 0 or 1
cdms2.setNetcdfDeflateLevelFlag(value) ## where value is a integer between 0 and 9 included
dump(data)
NCINFO("example.nc")

# Pure NetCDF3<a id="netcdf3"></a>

[Back To Top](#top)

All these option can either be turned to 0 to enable NetCDF3 (as the warning above shows). One can also use the single command:

In [5]:
cdms2.useNetcdf3()
# or for versions earlier than 2.12.2017.10.25
value = 0
cdms2.setNetcdfShuffleFlag(value) ## where value is either 0 or 1
cdms2.setNetcdfDeflateFlag(value) ## where value is either 0 or 1
cdms2.setNetcdfDeflateLevelFlag(value) ## where value is a integer between 0 and 9 included
cdms2.setNetcdf4Flag(0)
dump(data)
NCINFO("example.nc")

# NetCDF4 non classic<a id="nc4_no_classic"></a>

[Back To TOp](#top)

We can also turn off the classic option for netcdf4

In [6]:
cdms2.setNetcdf4Flag(1)
cdms2.setNetcdfClassicFlag(0)
dump(data)
NCINFO("example.nc")

# Using Shuffling<a id="shuffle"></a>

[Back To Top](#top)

We can turn on/off shuffling

In [7]:
cdms2.setNetcdf4Flag(1)
cdms2.setNetcdfClassicFlag(0)
cdms2.setNetcdfShuffleFlag(1)
dump(data)
NCINFO("example.nc")

# Controling Deflate Level<a id="deflate"></a>

[Back To top](#top)

We can choose our deflate level (at the expense of time)

In [8]:
cdms2.setNetcdfShuffleFlag(0)
cdms2.setNetcdfDeflateFlag(1)
cdms2.setNetcdfDeflateLevelFlag(5)
dump(data)
NCINFO("example.nc")

# Summarizing All Options<a id="summary"></a>

[Back To Top](#top)

Let's try with a real life example

In [9]:
f=cdms2.open("clt.nc")
clt = f("clt")

html = "<table border='2'><tr><th>Deflate Level</th><th>NC3</th><th>NC4 Classic no shuffle</th><th>NC4 Classic shuffled</th><th>NC4 no shuffle</th><th>NC4 shuffled</th></tr>"

def addCell():
    t,s = dump(clt)
    return "<td align='center'>{:.2f}/{:d}</td>".format(t,s)

def nc4s():
    out = ""
    for classic in [1,0]:
        cdms2.setNetcdfClassicFlag(classic)
        for shuffle in [0,1]:
            cdms2.setNetcdfShuffleFlag(shuffle)
            out+=addCell()
    out+="</tr>"
    return out

# NetCDF3
html+="<tr><td align='center'>0</td>"
cdms2.useNetcdf3()
cdms2.setNetcdf4Flag(0)
html+=addCell()
cdms2.setNetcdf4Flag(1)
html+=nc4s()
cdms2.setNetcdfDeflateFlag(1)
for i in range(1,10):
    cdms2.setNetcdfDeflateLevelFlag(i)
    html += "<tr><td align='center'>{0}</td><td align='center'>N/A</td>".format(i)
    html += nc4s()
html+="<caption>Time To Write NetCDF File and size for various NC4 settings</caption></table>"
HTML(html)

Deflate Level,NC3,NC4 Classic no shuffle,NC4 Classic shuffled,NC4 no shuffle,NC4 shuffled
0,0.02/1625482,0.01/1625323,0.01/1633052,0.01/1625482,0.01/1633197
1,,0.12/1201105,0.09/1227739,0.12/1201250,0.09/1227943
2,,0.12/1200471,0.09/1223895,0.12/1200616,0.09/1224099
3,,0.12/1200371,0.09/1220275,0.12/1200516,0.09/1220479
4,,0.13/1206352,0.10/1218159,0.13/1206497,0.10/1218363
5,,0.13/1206092,0.11/1215330,0.13/1206237,0.11/1215534
6,,0.14/1205961,0.11/1213353,0.14/1206106,0.11/1213557
7,,0.14/1205905,0.12/1212713,0.14/1206050,0.12/1212917
8,,0.14/1205888,0.16/1211808,0.14/1206033,0.16/1212012
9,,0.14/1205888,0.20/1211449,0.14/1206033,0.19/1211653
