# Reading netCDF data
- adapted from https://github.com/Unidata/netcdf4-python/blob/master/examples/reading_netCDF.ipynb
- requires [numpy](http://numpy.scipy.org) and netCDF/HDF5 C libraries.
- see e.g. github site: https://github.com/Unidata/netcdf4-python
- Online docs: http://unidata.github.io/netcdf4-python/

## Interactively exploring a netCDF File

Let's explore a netCDF file from the *Atlantic Real-Time Ocean Forecast System*

first, import netcdf4-python and numpy

In [None]:
import netCDF4
import numpy as np

In [None]:
f = netCDF4.Dataset('../../data/rtofs_glo_3dz_f006_6hrly_reg3.nc')
print(f) 

## Create a netCDF4.Dataset object
- **`f`** is a `Dataset` object, representing an open netCDF file.
- printing the object gives you summary information, similar to *`ncdump -h`*.

In [None]:
f = netCDF4.Dataset('/pool/data/CMIP6/data/CMIP/MPI-M/MPI-ESM1-2-LR/1pctCO2/r1i1p1f1/day/tasmax/gn/v20190710/tasmax_day_MPI-ESM1-2-LR_1pctCO2_r1i1p1f1_gn_18500101-18691231.nc')
print(f) 

## Access a netCDF variable
- variable objects stored by name in **`variables`** dict.
- print the variable yields summary info (including all the attributes).
- no actual data read yet (just have a reference to the variable object with metadata).

In [None]:
print(f.variables.keys()) # get all variable names
temp = f.variables['temperature']  # temperature variable
print(temp) 

## List the Dimensions

- All variables in a netCDF file have an associated shape, specified by a list of dimensions.
- Let's list all the dimensions in this netCDF file.
- Note that the **`time`** dimension is special (*`unlimited`*), which means it can be appended to.

In [None]:
for d in f.dimensions.items():
    print(d)

Each variable has a **`dimensions`** and a **`shape`** attribute.

In [None]:
tasmax.dimensions

In [None]:
tasmax.shape

### Each dimension typically has a variable associated with it (called a *coordinate* variable).
- *Coordinate variables* are 1D variables that have the same name as dimensions.
- Coordinate variables and *auxiliary coordinate variables* (named by the *coordinates* attribute) locate values in time and space.

In [None]:
mt = f.variables['MT']
depth = f.variables['Depth']
x,y = f.variables['X'], f.variables['Y']
print(mt)
print(x)       

## Accessing data from a netCDF variable object

- netCDF variables objects behave much like numpy arrays.
- slicing a netCDF variable object returns a numpy array with the data.
- Boolean array and integer sequence indexing behaves differently for netCDF variables than for numpy arrays. Only 1-d boolean arrays and integer sequences are allowed, and these indices work independently along each dimension (similar to the way vector subscripts work in fortran).

In [None]:
time = mt[:]  # Reads the netCDF variable MT, array of one element
print(time) 

In [None]:
dpth = depth[:] # examine depth array
print(dpth) 


In [None]:
xx,yy = x[:],y[:]
print('shape of temp variable: %s' % repr(temp.shape))
tempslice = temp[0, dpth > 400, yy > yy.max()/2, xx > xx.max()/2]
print('shape of temp slice: %s' % repr(tempslice.shape))

## What is the sea surface temperature and salinity at 50N, 140W?
### Finding the latitude and longitude indices of 50N, 140W

- The `X` and `Y` dimensions don't look like longitudes and latitudes
- Use the auxilary coordinate variables named in the `coordinates` variable attribute, `Latitude` and `Longitude`

In [None]:
lat, lon = f.variables['Latitude'], f.variables['Longitude']
print(lat)

Aha!  So we need to find array indices `iy` and `ix` such that `Latitude[iy, ix]` is close to 50.0 and `Longitude[iy, ix]` is close to -140.0 ...

In [None]:
# extract lat/lon values (in degrees) to numpy arrays
latvals = lat[:]; lonvals = lon[:] 
# a function to find the index of the point closest pt
# (in squared distance) to give lat/lon value.
def getclosest_ij(lats,lons,latpt,lonpt):
    # find squared distance of every point on grid
    dist_sq = (lats-latpt)**2 + (lons-lonpt)**2  
    # 1D index of minimum dist_sq element
    minindex_flattened = dist_sq.argmin()    
    # Get 2D index for latvals and lonvals arrays from 1D index
    return np.unravel_index(minindex_flattened, lats.shape)
iy_min, ix_min = getclosest_ij(latvals, lonvals, 50., -140)

### Now we have all the information we need to find our answer.


```
|----------+--------|
| Variable |  Index |
|----------+--------|
| MT       |      0 |
| Depth    |      0 |
| Y        | iy_min |
| X        | ix_min |
|----------+--------|
```

### What is the sea surface temperature and salinity at the specified point?

In [None]:
sal = f.variables['salinity']
# Read values out of the netCDF file for temperature and salinity
print('%7.4f %s' % (temp[0,0,iy_min,ix_min], temp.units))
print('%7.4f %s' % (sal[0,0,iy_min,ix_min], sal.units))

##Packed integer data
There is a similar feature for variables with `scale_factor` and `add_offset` attributes.

- short integer data will automatically be returned as float data, with the scale and offset applied.  

## Dealing with dates and times
- time variables usually measure relative to a fixed date using a certain calendar, with units specified like ***`hours since YY:MM:DD hh-mm-ss`***.
- **`num2date`** and **`date2num`** convenience functions provided to convert between these numeric time coordinates and handy python datetime instances.  
- **`date2index`** finds the time index corresponding to a datetime instance.

In [None]:
from netCDF4 import num2date, date2num, date2index


##Simple multi-file aggregation

What if you have a bunch of netcdf files, each with data for a different year, and you want to access all the data as if it were in one file?

In [None]:
!ls -l ../../data/prmsl*nc

**`MFDataset`** uses file globbing to patch together all the files into one big Dataset.
You can also pass it a list of specific files.

Limitations:

- It can only  aggregate the data along the leftmost dimension of each variable.
- only works with `NETCDF3`, or `NETCDF4_CLASSIC` formatted files.
- kind of slow.

In [None]:
mf = netCDF4.MFDataset('../../data/prmsl*nc')
times = mf.variables['time']
dates = num2date(times[:],times.units)
print('starting date = %s' % dates[0])
print('ending date = %s'% dates[-1])
prmsl = mf.variables['prmsl']
print('times shape = %s' % times.shape)
print('prmsl dimensions = %s, prmsl shape = %s' %\
     (prmsl.dimensions, prmsl.shape))

## Closing your netCDF file

It's good to close netCDF files, but not actually necessary when Dataset is open for read access only.


In [None]:
f.close()


##That's it!

Now you're ready to start exploring your data interactively.

To be continued with **Writing netCDF data** ....