# Table of Contents
* [Learning Objectives:](#Learning-Objectives:)
	* [NetCDF](#NetCDF)
	* [Exercise (export to scientific formats)](#Exercise-%28export-to-scientific-formats%29)
	* [Review of HDF5 and NetCDF](#Review-of-HDF5-and-NetCDF)


# Learning Objectives:

* Work with data stored in fast, hierarchical scientific data formats:
  * NetCDF

## NetCDF

More details at http://unidata.github.io/netcdf4-python/

If you do not have netCDF4 installed in your conda environment run
```
% conda install -y netcdf4
```

In [1]:
import pandas as pd
import netCDF4
f = netCDF4.Dataset('../notebooks/data/sresa1b_ncar_ccsm3-example.nc')
f

<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF3_CLASSIC data model, file format NETCDF3):
    CVS_Id: $Id$
    creation_date: 
    prg_ID: Source file unknown Version unknown Date unknown
    cmd_ln: bds -x 256 -y 128 -m 23 -o /data/zender/data/dst_T85.nc
    history: Tue Oct 25 15:08:51 2005: ncks -O -x -v va -m sresa1b_ncar_ccsm3_0_run1_200001.nc sresa1b_ncar_ccsm3_0_run1_200001.nc
Tue Oct 25 15:07:21 2005: ncks -d time,0 sresa1b_ncar_ccsm3_0_run1_200001_201912.nc sresa1b_ncar_ccsm3_0_run1_200001.nc
Tue Oct 25 13:29:43 2005: ncks -d time,0,239 sresa1b_ncar_ccsm3_0_run1_200001_209912.nc /var/www/html/tmp/sresa1b_ncar_ccsm3_0_run1_200001_201912.nc
Thu Oct 20 10:47:50 2005: ncks -A -v va /data/brownmc/sresa1b/atm/mo/va/ncar_ccsm3_0/run1/sresa1b_ncar_ccsm3_0_run1_va_200001_209912.nc /data/brownmc/sresa1b/atm/mo/tas/ncar_ccsm3_0/run1/sresa1b_ncar_ccsm3_0_run1_200001_209912.nc
Wed Oct 19 14:55:04 2005: ncks -F -d time,01,1200 /data/brownmc/sresa1b/atm/mo/va/ncar_ccsm3_0/run1/sresa1b

In [2]:
f.variables.keys()

KeysView(OrderedDict([('area', <class 'netCDF4._netCDF4.Variable'>
float32 area(lat, lon)
    long_name: Surface area
    units: meter2
unlimited dimensions: 
current shape = (128, 256)
filling off
), ('lat', <class 'netCDF4._netCDF4.Variable'>
float32 lat(lat)
    long_name: latitude
    units: degrees_north
    axis: Y
    standard_name: latitude
    bounds: lat_bnds
unlimited dimensions: 
current shape = (128,)
filling off
), ('lat_bnds', <class 'netCDF4._netCDF4.Variable'>
float64 lat_bnds(lat, bnds)
unlimited dimensions: 
current shape = (128, 2)
filling off
), ('lon', <class 'netCDF4._netCDF4.Variable'>
float32 lon(lon)
    long_name: longitude
    units: degrees_east
    axis: X
    standard_name: longitude
    bounds: lon_bnds
unlimited dimensions: 
current shape = (256,)
filling off
), ('lon_bnds', <class 'netCDF4._netCDF4.Variable'>
float64 lon_bnds(lon, bnds)
unlimited dimensions: 
current shape = (256, 2)
filling off
), ('msk_rgn', <class 'netCDF4._netCDF4.Variable'>
int32 

In [3]:
f['pr']

<class 'netCDF4._netCDF4.Variable'>
float32 pr(time, lat, lon)
    comment: Created using NCL code CCSM_atmm_2cf.ncl on
 machine eagle163s
    missing_value: 1e+20
    _FillValue: 1e+20
    cell_methods: time: mean (interval: 1 month)
    history: (PRECC+PRECL)*r[h2o]
    original_units: m-1 s-1
    original_name: PRECC, PRECL
    standard_name: precipitation_flux
    units: kg m-2 s-1
    long_name: precipitation_flux
    cell_method: time: mean
unlimited dimensions: time
current shape = (1, 128, 256)
filling off

In [None]:
f['pr'][:].squeeze()

In [4]:
f['pr'].dimensions

('time', 'lat', 'lon')

In [5]:
precip_flux = pd.DataFrame(f['pr'][:].squeeze())
precip_flux.columns = f['lon']
precip_flux.index = f['lat']
precip_flux

lon,0.0,1.40625,2.8125,4.21875,5.625,7.03125,8.4375,9.84375,11.25,12.65625,...,345.9375,347.34375,348.75,350.15625,351.5625,352.96875,354.375,355.78125,357.1875,358.59375
lat,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
-88.927734,1.091546e-06,1.054535e-06,1.078923e-06,1.016285e-06,1.000412e-06,1.016951e-06,1.066608e-06,1.060027e-06,1.044903e-06,1.051413e-06,...,1.202293e-06,1.184171e-06,1.114668e-06,1.061854e-06,1.101390e-06,1.127960e-06,1.154505e-06,1.199903e-06,1.182608e-06,1.114067e-06
-87.538704,8.509192e-07,8.014720e-07,7.743964e-07,7.742306e-07,7.471818e-07,7.214264e-07,7.192943e-07,6.997910e-07,6.833975e-07,6.746832e-07,...,1.043371e-06,1.005386e-06,1.014945e-06,1.016438e-06,9.585875e-07,9.211429e-07,9.270179e-07,8.938150e-07,8.467285e-07,8.410706e-07
-86.141472,7.783782e-07,7.642489e-07,7.554475e-07,7.512014e-07,7.494529e-07,7.440023e-07,7.534388e-07,7.484928e-07,7.397288e-07,7.412472e-07,...,9.679144e-07,9.930501e-07,9.276194e-07,9.580637e-07,9.059450e-07,8.706998e-07,8.499051e-07,8.133784e-07,7.914716e-07,7.896999e-07
-84.742386,1.041085e-06,1.039284e-06,1.040344e-06,1.049900e-06,1.060029e-06,1.063219e-06,1.070073e-06,1.079799e-06,1.085161e-06,1.085857e-06,...,1.150984e-06,1.125480e-06,1.066172e-06,1.070847e-06,1.054800e-06,1.036703e-06,1.044141e-06,1.040945e-06,1.040274e-06,1.044430e-06
-83.342598,1.407368e-06,1.378008e-06,1.345738e-06,1.315741e-06,1.308426e-06,1.288609e-06,1.280522e-06,1.262286e-06,1.241339e-06,1.244224e-06,...,1.574488e-06,1.587396e-06,1.568889e-06,1.542182e-06,1.530070e-06,1.500665e-06,1.483569e-06,1.464641e-06,1.445240e-06,1.423410e-06
-81.942467,1.471507e-06,1.412269e-06,1.356593e-06,1.283661e-06,1.215027e-06,1.164251e-06,1.108047e-06,1.058651e-06,1.015321e-06,9.884709e-07,...,2.033032e-06,2.015487e-06,1.996671e-06,1.960370e-06,1.911490e-06,1.848096e-06,1.764632e-06,1.705394e-06,1.620996e-06,1.551488e-06
-80.542145,1.652186e-06,1.582822e-06,1.513313e-06,1.431963e-06,1.385713e-06,1.325140e-06,1.255218e-06,1.209177e-06,1.153289e-06,1.105668e-06,...,2.558888e-06,2.541636e-06,2.489525e-06,2.416039e-06,2.328655e-06,2.235187e-06,2.155240e-06,2.033637e-06,1.871468e-06,1.754503e-06
-79.141708,2.018894e-06,1.885578e-06,1.777444e-06,1.697303e-06,1.627110e-06,1.537826e-06,1.471922e-06,1.415199e-06,1.368018e-06,1.357794e-06,...,3.433475e-06,3.355223e-06,3.271434e-06,3.161988e-06,3.098266e-06,2.908355e-06,2.717661e-06,2.538467e-06,2.375599e-06,2.217573e-06
-77.741196,2.375112e-06,2.169358e-06,1.960048e-06,1.830111e-06,1.701772e-06,1.611450e-06,1.488631e-06,1.420678e-06,1.371058e-06,1.395804e-06,...,4.435786e-06,4.266038e-06,4.031639e-06,3.780113e-06,3.626003e-06,3.368363e-06,3.161669e-06,2.986097e-06,2.790995e-06,2.605248e-06
-76.340630,3.002707e-06,2.797371e-06,2.578995e-06,2.399389e-06,2.252330e-06,2.112775e-06,2.018548e-06,1.925144e-06,1.892580e-06,1.849656e-06,...,5.789106e-06,5.576721e-06,5.415387e-06,5.108515e-06,4.615413e-06,4.129436e-06,3.824391e-06,3.580251e-06,3.404121e-06,3.220558e-06


## Exercise (export to scientific formats)

Using the NYC Harbor data set—and perhaps also the normalization work done in the previous exercise—save the data to compact scientific data formats, HDF5 and/or NetCDF. 

* Take advantage of the option of saving multiple datasets into HDF5 or NetCDF to break down the data.
* Store the data in its native types per column/cell (Pandas does a good job of inferring data types)
* How large is the resulting HDF5/NetCDF file compared to the original Excel file.
* Compose some interesting queries of the database to extract patterns or features of the data.

## Review of HDF5 and NetCDF

The tutorial at http://docs.h5py.org/en/latest/quick.html is likely to be useful.

The basic creation of a new NetCDF data file is done with:

```python
>>> from netCDF4 import Dataset
>>> rootgrp = Dataset("test.nc", "w", format="NETCDF4")
>>> print rootgrp.data_model
NETCDF4
>>> rootgrp.close()
```

The tutorial at http://nbviewer.ipython.org/github/Unidata/netcdf4-python/blob/master/examples/writing_netCDF.ipynb is likely to be useful.