# 3. Xarray & PyNIO

To read and write scientific data xarray and PyNIO are very efficient, and easy to use because the internal data structure is netCDF like. Xarray can read netCDF and Grib files, and handle the metadata following the netCDF CF-convention. The same is true for PyNIO, that can additionally read HDF and WRF files. 

See also http://xarray.pydata.org and https://github.com/NCAR/pynio.

The examples below shows the use of xarray and PyNIO software to read data from file, work with coordinates and metadata.


## 3.1 Read a netCDF file

We want to start directly with opening and reading the netCDF file tsurf.nc from the subdirectory data. 

Import the common used modules and define the variable fname (file name).

In [1]:
import numpy as np

fname = './data/tsurf.nc'

**1. xarray**

First we have to load the xarray module, and because we are too lazy, we want to use the abbreviation **xr** for it. 

The function <i>**xr.open_dataset**</i> of xarray is used to read the content of the file. 

The variable name **_ds_** is often used and is the abbreviation of **_dataset_**.

In [2]:
import xarray as xr

ds = xr.open_dataset(fname)

print(ds)

<xarray.Dataset>
Dimensions:  (lat: 96, lon: 192, time: 40)
Coordinates:
  * time     (time) datetime64[ns] 2001-01-01 ... 2001-01-10T18:00:00
  * lon      (lon) float64 -180.0 -178.1 -176.2 -174.4 ... 174.4 176.2 178.1
  * lat      (lat) float64 88.57 86.72 84.86 83.0 ... -83.0 -84.86 -86.72 -88.57
Data variables:
    tsurf    (time, lat, lon) float32 ...
Attributes:
    CDI:          Climate Data Interface version 1.9.6 (http://mpimet.mpg.de/...
    Conventions:  CF-1.6
    history:      Thu Oct 10 16:08:50 2019: cdo selname,tsurf rectilinear_gri...
    CDO:          Climate Data Operators version 1.9.6 (http://mpimet.mpg.de/...


Printing the dataset content gives you an overview of the dimension and variable names, their sizes, and the global file attributes.

<br>

**2. PyNIO**

Like above, we have to import the module first, but this time it's **Nio** (that's short enough). 

PyNIO's function to read the file is <i>**Nio.open_file**</i>.

The name **_f_** of the file object is often used in NCL scripts, that's why we use it here as well, but you can call it what ever you want.

In [3]:
import Nio

f =  Nio.open_file(fname,"r")

print(f)

Nio file:	tsurf.nc
   global attributes:
      CDI : Climate Data Interface version 1.9.6 (http://mpimet.mpg.de/cdi)
      Conventions : CF-1.6
      history : Thu Oct 10 16:08:50 2019: cdo selname,tsurf rectilinear_grid_2D.nc tsurf.nc
      CDO : Climate Data Operators version 1.9.6 (http://mpimet.mpg.de/cdo)
   dimensions:
      time = 40 // unlimited
      lon = 192
      lat = 96
   variables:
      double time [ time ]
         standard_name :	time
         units :	hours since 2001-01-01 00:00:00
         calendar :	standard
         axis :	T
      double lon [ lon ]
         standard_name :	longitude
         long_name :	longitude
         units :	degrees_east
         axis :	X
      double lat [ lat ]
         standard_name :	latitude
         long_name :	latitude
         units :	degrees_north
         axis :	Y
      float tsurf [ time, lat, lon ]
         long_name :	surface temperature
         units :	K
         code :	169
         table :	128



This is very similar to the _ncdump_ output, and corresponds to the output from xarray.

<br>

## 3.2 Show variable names and coordinates

It is always good to have a closer look at your data, and this can be done very easily with xarray and PyNIO.

Ok, show me the variables stored in that file (ups - just one :D) and the coordinate variables, too.

**1. xarray**

In [4]:
coords    = ds.coords
variables = ds.variables

print('--> coords:    \n\n', coords)
print('--> variables: \n\n', variables)

--> coords:    

 Coordinates:
  * time     (time) datetime64[ns] 2001-01-01 ... 2001-01-10T18:00:00
  * lon      (lon) float64 -180.0 -178.1 -176.2 -174.4 ... 174.4 176.2 178.1
  * lat      (lat) float64 88.57 86.72 84.86 83.0 ... -83.0 -84.86 -86.72 -88.57
--> variables: 

 Frozen(OrderedDict([('time', <xarray.IndexVariable 'time' (time: 40)>
array(['2001-01-01T00:00:00.000000000', '2001-01-01T06:00:00.000000000',
       '2001-01-01T12:00:00.000000000', '2001-01-01T18:00:00.000000000',
       '2001-01-02T00:00:00.000000000', '2001-01-02T06:00:00.000000000',
       '2001-01-02T12:00:00.000000000', '2001-01-02T18:00:00.000000000',
       '2001-01-03T00:00:00.000000000', '2001-01-03T06:00:00.000000000',
       '2001-01-03T12:00:00.000000000', '2001-01-03T18:00:00.000000000',
       '2001-01-04T00:00:00.000000000', '2001-01-04T06:00:00.000000000',
       '2001-01-04T12:00:00.000000000', '2001-01-04T18:00:00.000000000',
       '2001-01-05T00:00:00.000000000', '2001-01-05T06:00:00.00000000

Ah, that's better. Here we can see the time displayed in a readable way, because xarray use the datetime64 module under the hood. Also the variable and coordinate attributes are shown.

<br>

**2. PyNIO**

Let us see how PyNIO will do that.


In [5]:
coords_nio    = f.dimensions.keys()
variables_nio = f.variables.keys()

print(coords_nio)
print(variables_nio)

#print f.variables['varName']


dict_keys(['time', 'lon', 'lat'])
dict_keys(['time', 'lon', 'lat', 'tsurf'])


In [21]:
coord_nio = f.dimensions.keys()
varNames  = f.variables.keys()

for i in varNames:
    print(f.variables[i])
    print(f.variables[i][:])

Variable: time
Type: double
Total Size: 320 bytes
            40 values
Number of Dimensions: 1
Dimensions and sizes:	[time | 40]
Coordinates: 
            time: [   0.. 234]
Number of Attributes: 4
         standard_name :	time
         units :	hours since 2001-01-01 00:00:00
         calendar :	standard
         axis :	T

[  0.   6.  12.  18.  24.  30.  36.  42.  48.  54.  60.  66.  72.  78.
  84.  90.  96. 102. 108. 114. 120. 126. 132. 138. 144. 150. 156. 162.
 168. 174. 180. 186. 192. 198. 204. 210. 216. 222. 228. 234.]
Variable: lon
Type: double
Total Size: 1536 bytes
            192 values
Number of Dimensions: 1
Dimensions and sizes:	[lon | 192]
Coordinates: 
            lon: [-180..178.125]
Number of Attributes: 4
         standard_name :	longitude
         long_name :	longitude
         units :	degrees_east
         axis :	X

[-180.    -178.125 -176.25  -174.375 -172.5   -170.625 -168.75  -166.875
 -165.    -163.125 -161.25  -159.375 -157.5   -155.625 -153.75  -151.875
 -150. 


## 3.3  Select variable and coordinate variables

At the moment, we only have created a dataset respectively a file object containing the coordinate variables and variable data. Now, we want to select the variable **tsurf** and the coordinate variables **lat** and **lon**.

**1. xarray**


In [22]:
tsurf = ds.tsurf
lat   = tsurf.lat
lon   = tsurf.lon

print('Variable tsurf: \n', tsurf.data)
print('\nCoordinate variable lat: \n', lat.data)
print('\nCoordinate variable lon: \n', lon.data)

Variable tsurf: 
 [[[242.38832 242.35121 242.23402 ... 242.62465 242.6266  242.63051]
  [246.98988 247.12074 247.23207 ... 246.65785 246.79262 246.9059 ]
  [246.2145  246.40785 246.66566 ... 245.78285 245.78285 246.00941]
  ...
  [256.27307 256.78674 257.43127 ... 254.5895  255.06606 255.69496]
  [242.54457 242.53676 242.91371 ... 241.52113 241.91566 242.33168]
  [236.11879 235.98012 235.96059 ... 236.11488 236.09145 236.07191]]

 [[245.4956  245.50732 245.51123 ... 245.4956  245.50732 245.4956 ]
  [246.65186 246.68701 246.75146 ... 246.62256 246.63232 246.6128 ]
  [244.76709 243.81787 243.66162 ... 245.00342 245.20264 245.35693]
  ...
  [257.26514 257.70264 258.19873 ... 255.81396 256.26318 256.78076]
  [243.08154 243.18115 243.6128  ... 242.05225 242.49365 242.85107]
  [236.3374  236.19873 236.17725 ... 236.33936 236.31396 236.29443]]

 [[246.92685 246.9542  246.97568 ... 246.87021 246.88583 246.90927]
  [244.24521 244.28036 243.90536 ... 245.24911 244.33505 244.1749 ]
  [242.99521 2



**2. PyNIO**

If you use PyNIO to open a file the handling differs a little bit. While with xarray you can retrieve the coordinate variable data from the file, PyNIO gets them from the file object.

In [23]:
tsurf_nio = f.variables['tsurf'][:,:,:]
lat_nio   = f.variables['lat'][:]
lon_nio   = f.variables['lon'][:]

print('Variable tsurf_nio: \n', tsurf_nio)
print('\nCoordinate variable lat_nio: \n', lat_nio)
print('\nCoordinate variable lon_nio: \n', lon_nio)


Variable tsurf_nio: 
 [[[242.38832 242.35121 242.23402 ... 242.62465 242.6266  242.63051]
  [246.98988 247.12074 247.23207 ... 246.65785 246.79262 246.9059 ]
  [246.2145  246.40785 246.66566 ... 245.78285 245.78285 246.00941]
  ...
  [256.27307 256.78674 257.43127 ... 254.5895  255.06606 255.69496]
  [242.54457 242.53676 242.91371 ... 241.52113 241.91566 242.33168]
  [236.11879 235.98012 235.96059 ... 236.11488 236.09145 236.07191]]

 [[245.4956  245.50732 245.51123 ... 245.4956  245.50732 245.4956 ]
  [246.65186 246.68701 246.75146 ... 246.62256 246.63232 246.6128 ]
  [244.76709 243.81787 243.66162 ... 245.00342 245.20264 245.35693]
  ...
  [257.26514 257.70264 258.19873 ... 255.81396 256.26318 256.78076]
  [243.08154 243.18115 243.6128  ... 242.05225 242.49365 242.85107]
  [236.3374  236.19873 236.17725 ... 236.33936 236.31396 236.29443]]

 [[246.92685 246.9542  246.97568 ... 246.87021 246.88583 246.90927]
  [244.24521 244.28036 243.90536 ... 245.24911 244.33505 244.1749 ]
  [242.995

The variables have different data types:

- xarray gets the variable object data into a special data array which is called DataArray.
- PyNIO gets the variable object data into a numpy ndarray.

In [24]:
print(type(tsurf))
print(type(tsurf_nio))

<class 'xarray.core.dataarray.DataArray'>
<class 'numpy.ndarray'>


## 3.3 Dimensions, shape and size

To get more informations about the dimension, shape and size of a variable we can use the approbriate attributes.

**1. xarray**


In [25]:
dimensions = ds.dims
shape = tsurf.shape
size  = tsurf.size
rank  = len(shape)

print('dimensions: ', dimensions)
print('shape:      ', shape)
print('size:       ', size)
print('rank:       ', rank)

dimensions:  Frozen(SortedKeysDict({'time': 40, 'lon': 192, 'lat': 96}))
shape:       (40, 96, 192)
size:        737280
rank:        3



**2. PyNIO**


In [26]:
dimensions_nio = f.dimensions
shape_nio = tsurf_nio.shape
size_nio  = tsurf_nio.size
rank_nio  = len(shape_nio)   # or rank_nio = f.variables["tsurf"].rank

print('dimensions: ', dimensions_nio)
print('shape:      ', shape_nio)
print('size:       ', size_nio)
print('rank_nio:   ', rank_nio)

dimensions:  {'time': 40, 'lon': 192, 'lat': 96}
shape:       (40, 96, 192)
size:        737280
rank_nio:    3


## 3.4 Variable attributes

Variable attributes are very important to work in a correct manor with the data.

**1. xarray**


In [27]:
attributes = list(tsurf.attrs)

print('attributes: ', attributes)

attributes:  ['long_name', 'units', 'code', 'table']



**2. PyNIO**

To get the attributes we have to use the file variable object **f.variables['tsurf']** and not the numpy array **tsurf_nio**. 

In [28]:
attributes_nio = list(f.variables['tsurf'].attributes.keys())

print('attributes_nio: ', attributes_nio)

attributes_nio:  ['long_name', 'units', 'code', 'table']


Let's see how we can get the content of an attribute.

**1. xarray**


In [29]:
long_name = tsurf.long_name
units = tsurf.units

print('long_name: ', long_name)
print('units:     ', units)

long_name:  surface temperature
units:      K



**2. PyNIO**

And here we have to use the file variable object **f.variables['tsurf']** again.


In [30]:
long_name_nio = f.variables["tsurf"].attributes['long_name']
units_nio = f.variables["tsurf"].attributes['units']

print('long_name_nio: ', long_name_nio)
print('units_nio:     ', units_nio)

long_name_nio:  surface temperature
units_nio:      K


## 3.5 Time

Xarray and PyNIO are working with times totally diffent. Xarray is able to convert the time values to readable times using the internally datetime64 module. While PyNIO only depicts the numeric values of the coordinate variable time.

**1. xarray**


In [31]:
time = ds.time.data

print('timestep 0: ', time[0])

timestep 0:  2001-01-01T00:00:00.000000000



**2. PyNIO**


In [32]:
time_nio =  f.variables['time'][:]

print('timestep 0: ', time_nio[0])

timestep 0:  0.0


The returned time value is the value stored in the netCDF file and it has to be converted to a date string.
To convert the time value to a string like xarray's above, the units and the calendar attribute have to be known. 
In this example, we use the **netCDF4** module to convert the time values.


In [33]:
import netCDF4

time_nio_units    = f.variables["time"].attributes['units']
time_nio_calendar = f.variables["time"].attributes['calendar']

date_nio = netCDF4.num2date(time_nio[0], units=time_nio_units, calendar=time_nio_calendar)

print('timestep 0: ', date_nio)

timestep 0:  2001-01-01 00:00:00


## 3.6 Read a GRIB file

To read a GRIB file nothing has to be done for PyNIO (except to change the file name) but xarray needs an additional module cfgrib, which is used as an so called _engine_.

**1. xarray**


In [34]:
import cfgrib

ds2 = xr.open_dataset('./data/MET9_IR108_cosmode_0909210000.grb2', engine='cfgrib')

variables2 = ds2.variables

print('--> variables2: \n\n', variables2)

--> variables2: 

 Frozen(OrderedDict([('time', <xarray.Variable ()>
array('2009-09-21T00:00:00.000000000', dtype='datetime64[ns]')
Attributes:
    long_name:      initial time of forecast
    standard_name:  forecast_reference_time), ('latitude', <xarray.Variable (y: 461, x: 421)>
[194081 values with dtype=float64]
Attributes:
    units:          degrees_north
    standard_name:  latitude
    long_name:      latitude), ('longitude', <xarray.Variable (y: 461, x: 421)>
[194081 values with dtype=float64]
Attributes:
    units:          degrees_east
    standard_name:  longitude
    long_name:      longitude), ('valid_time', <xarray.Variable ()>
array('2009-09-21T00:00:00.000000000', dtype='datetime64[ns]')
Attributes:
    standard_name:  time
    long_name:      time), ('p260532', <xarray.Variable (y: 461, x: 421)>
[194081 values with dtype=float32]
Attributes:
    GRIB_paramId:                             500393
    GRIB_shortName:                           OBSMSG_BT_IR10.8
    GRIB_uni


**2. PyNIO**


In [35]:
f2 =  Nio.open_file('./data/MET9_IR108_cosmode_0909210000.grb2',"r")

variables_nio2 = f2.variables.keys()

for i in variables_nio2:
    print(f2.variables[i])
    print(f2.variables[i][:])

Variable: SBTMP_P31_GRLL0_I207
Type: float
Total Size: 776324 bytes
            194081 values
Number of Dimensions: 2
Dimensions and sizes:	[ygrid_0 | 461] x [xgrid_0 | 421]
Coordinates: 
            ygrid_0: not a coordinate variable
            xgrid_0: not a coordinate variable
Number of Attributes: 17
         center :	Offenbach (RSMC)
         production_status :	Operational products
         long_name :	Scaled brightness temperature
         units :	numeric
         _FillValue :	1e+20
         coordinates :	gridlat_0 gridlon_0
         grid_type :	Rotated latitude/longitude
         parameter_discipline_and_category :	Space products, Image format products
         parameter_template_discipline_category_number :	[31, 3, 0, 2]
         forecast_time :	0
         forecast_time_units :	hours
         initial_time :	09/21/2009 (00:00)
         satellite_identifier :	72
         agency_and_instrument_description :	EUMETSAT - Spinning enhanced visible and infrared imager
         instru