# CDO - Climate Data Operators

CDO User Guide https://code.mpimet.mpg.de/projects/cdo/embedded/cdo.pdf

CDO Python bindings introduction https://code.mpimet.mpg.de/attachments/download/18824/cdo-bindings.pdf

### Table of contents

- About CDO
- Import CDO module
- Show CDO version
- Set temporary directory
- Delete temporay files
- List CDO operators
- List all operators starting with 'sel'
- Show information about an operator
- Turn debugging on/off
- Display information about the file content
- Show information about the variables value range, times, levels in more detail
- Copy file
- Select variables
- Select timesteps
- Operator chaining
- Select a sub-region
- Compute the field mean
- Assign data to variable
- Remapping
- Remap data to grid of another file
- Create a land-sea mask
- Masking data
- Delete temporary files

<br>


## About CDO

See also https://code.mpimet.mpg.de/projects/cdo/wiki

CDO is a large tool set for working on climate and NWP model data. NetCDF 3/4, GRIB 1/2 including SZIP (or AEC) and JPEG compression, EXTRA, SERVICE and IEG are supported as IO-formats. Apart from that CDO can be used to analyse any kind of gridded data not related to climate science. CDO has very small memory requirements and can process files larger than the physical memory.

CDO is open source and released under the terms of the GNU General Public License v2 (GPL).

<br>

## Import CDO module

Import the CDO module and set cdo to Cdo() which makes writing a little easier.

In [None]:
from cdo import *

cdo = Cdo()

<br>

## Show CDO version

python-cdo version:

In [None]:
print(cdo.__version__())


Based on CDO version:

In [None]:
print(cdo.version())

<br>

## Set temporary directory

Set another directory for storing tempfiles with a constructor option and remove anything left in there when you experienced a crash or something like this

In [None]:
tempPath = './tmp/'
cdo = Cdo(tempdir=tempPath)

<br>

## Delete temporary files

In [None]:
cdo.cleanTempDir()

<br>

## List CDO operators

More than 800 operators are available.

In [None]:
# list of operators
cdo.operators

In [None]:
# get the number of existing operators of CDO

**Excercise** To get the first 50 operators you can do what?

<br>

## List all operators starting with 'sel'

Use list comprehension


In [None]:
[key for key, value in cdo.operators.items() if key.startswith('sel')]

## Turn debugging on/off

Use the debug method of CDO to turn on or off debugging.


In [None]:
cdo.debug = True

cdo.debug = False

<br>

## Show information about file content


In [None]:
!pwd

In [None]:
cdo.sinfon(input='../../data/rectilinear_grid_2D.nc')

<br>

## Display information about the file content


In [None]:
infile = '../data/rectilinear_grid_2D.nc'

#cdo.sinfon(input=infile)

In comparison to xarray's dataset information.

In [None]:
import xarray as xr

ds = xr.open_dataset('../../data/rectilinear_grid_2D.nc')

print(ds.info())

<br>

## Show information about the variables value range, times, levels in more detail

In [None]:
cdo.infon(input='../../data/rectilinear_grid_2D.nc')

<br>

## Copy file

Copy the input file to outfile.nc and change the data precision from float32 to float64.

In [None]:
cdo.copy(input='../../data/rectilinear_grid_2D.nc', options='-b F64', output='outfile.nc')
cdo.sinfon(input='outfile.nc')

<br>

To make things easier for us, we define the variables infile and outfile for the input and output files.


In [None]:
infile  = '../../data/rectilinear_grid_2D.nc'
outfile = 'outfile.nc'

<br>

## Select variables


In [None]:
cdo.selvar('tsurf', input=infile, output=outfile)
cdo.sinfon(input=outfile)

In [None]:
cdo.selvar('u10,v10', input=infile, output=outfile)
cdo.sinfon(input=outfile)

<br>

## Select timesteps

Select timestep 1 and 10:

In [None]:
cdo.seltimestep('1,10', input=infile, output=outfile)
cdo.sinfon(input=outfile)

Select timestep 1 to 10:

In [None]:
cdo.seltimestep('1/10', input=infile, output=outfile)
cdo.sinfon(input=outfile)

<br>

## Operator chaining

Operators with a fixed number of input files (streams) and only one output file can be combined. The input parameters must begin with an '-' and they will be executed from right to left.

Select the variables u10 and v10 and then select the first 10 timesteps:

In [None]:
cdo.seltimestep('1/10', input='-selvar,u10,v10 '+infile, output=outfile)
cdo.sinfon(input=outfile)

Use operators and options at once to do the above sections and change the output precision:

In [None]:
cdo.seltimestep('1/10', input='-selvar,u10,v10 '+infile, options='-b F64', output=outfile)
cdo.sinfon(input=outfile)

<br>

## Select a sub-region

In [None]:
cdo.sellonlatbox('20,30,70,80', input='-seltimestep,1 '+infile, output=outfile)
cdo.sinfon(input=outfile)

<br>

## Compute the field mean

Compute the mean of the horizontal field for each timestep (-> time series).

In [None]:
cdo.fldmean(input="-selname,tsurf "+infile, output='outfile.nc')
cdo.sinfon(input=outfile)

Plot the fieldmean data:

In [None]:
cdo.fldmean(input="-selname,tsurf "+infile, returnXArray='tsurf').plot()

<br>

## Assign data to variable

Assign the file variable precip to the python variable precipitation:

In [None]:
precipitation = cdo.selvar('precip', input=infile, returnXArray='precip')
print(precipitation)

Print the value of the first timestep, first latitude, first longitude:

In [None]:
print(precipitation.values[0,0,0])

Compute the fieldmean:

In [None]:
tsurf_fldmean = cdo.fldmean(input=infile, returnXArray='tsurf')
print(tsurf_fldmean.values[0:10,0,0])

<br>

## Remapping

Interpolate the data of input file to a new grid using the bilinear interpolation method.

For the ease of use, we select the variable tsurf and only the first timestep to interpolate the data to a longitude 1 deg x latitude 1 deg (r360x180) grid.

In [None]:
cdo.remapbil('r360x180', input='-seltimestep,1 -selvar,tsurf '+infile, output=outfile)

To demonstrate the functionality we increase the resolution of the input data to a 0.5 deg grid and plot the original and the interpolated data.

Original input data:

In [None]:
import cartopy.crs as ccrs
import xarray as xr
tsurf_orig = xr.open_dataset(infile).tsurf[0,:,:]

data = tsurf_orig.plot(subplot_kws=dict(projection=ccrs.PlateCarree(), facecolor='gray'), transform=ccrs.PlateCarree(),)

data.axes.set_extent([-10.,20.,30.,60.])
data.axes.coastlines()

In [None]:
tsurf = cdo.remapbil('r720x360', input='-seltimestep,1 '+infile, returnXArray='tsurf')

data = tsurf.plot(subplot_kws=dict(projection=ccrs.PlateCarree(), facecolor='gray'), transform=ccrs.PlateCarree(),)

data.axes.set_extent([-10.,20.,30.,60.])
data.axes.coastlines()

<br>

## Remap data to a grid of another file

In [None]:
cdo.topo(options='-f nc', output='topo.nc')

tsurf_2 = cdo.remapbil('topo.nc', input="-seltimestep,1 "+infile, returnXArray='tsurf')

<br>

## Create a land-sea mask

CDO provides a global 0.5 degree topography dataset that can be used to generate a land sea mask. First, we interpolate the topography data to the same grid as the data file. With the operator `gtc` we set all values greater than 0.5 m to 1 and all other values to 0.

In [None]:
lsm = cdo.gtc(0.5, input='-remapbil,'+infile+' -topo', returnXArray='topo')
print(lsm)

In [None]:
p = lsm.plot(subplot_kws=dict(projection=ccrs.PlateCarree(), facecolor='gray'), transform=ccrs.PlateCarree(),)
p.axes.set_extent([-20.,20.,30.,60.])
p.axes.coastlines()

<br>

## Masking data

Before we can use a mask on a data variable we need to create a mask file using the same grid as the data variable.

In [None]:
cdo.setname('lsm', input='-gtc,0.5 -remapbil,'+infile+' -topo', options='-f nc', output='lsm.nc')

Now, we want to get only tsurf values over land for timestep 1.

In [None]:
masked = cdo.setctomiss(0, input='-mul lsm.nc -seltimestep,1 '+infile, returnXArray='tsurf')

Let's see how it looks like.

In [None]:
p = masked.plot(subplot_kws=dict(projection=ccrs.PlateCarree(), facecolor='gray'), transform=ccrs.PlateCarree(),)
p.axes.set_extent([-20.,20.,30.,60.])
p.axes.coastlines()

Use a variables instead of output files.

In [None]:
lsm = cdo.setname('lsm', input='-gtc,0.5 -remapbil,'+infile+' -topo', options='-f nc', returnXArray='lsm')

In [None]:
tsurf = cdo.seltimestep(1, input='-selvar,tsurf '+infile, returnXArray='tsurf')

In [None]:
masked = tsurf * lsm
print(masked)

Plot the masked data.

In [None]:
masked.plot(subplot_kws=dict(projection=ccrs.PlateCarree(), facecolor='gray'), transform=ccrs.PlateCarree(),
           cbar_kwargs={'orientation': 'horizontal'})
p.axes.set_extent([-20.,20.,30.,60.])
p.axes.coastlines()  

And now we want to mask the land part of the data. The easiest way is to set all zeros to one and all ones to NaN. Matplotlib will automatically use the color _grey_ for the missing values.

In [None]:
import numpy as np
import matplotlib as mpl
import copy

lsm = np.where(lsm==0, 1, np.nan)

masked = tsurf * lsm

masked.plot(subplot_kws=dict(projection=ccrs.PlateCarree(), facecolor='gray'), transform=ccrs.PlateCarree(),
           cbar_kwargs={'orientation': 'horizontal'})
p.axes.set_extent([-20.,20.,30.,60.])
p.axes.coastlines()

<br>

## Delete temporary files

In [None]:
cdo.cleanTempDir()

More data analysis packages:
* iris: https://scitools-iris.readthedocs.io/en/stable [example gallery](https://scitools-iris.readthedocs.io/en/stable/generated/gallery/index.html#sphx-glr-generated-gallery)
* SciPy: https://www.scipy.org/docs.html
* geocat: https://geocat.ucar.edu [example gallery](https://geocat-examples.readthedocs.io/en/latest/gallery/index.html)
* seaborn: https://seaborn.pydata.org [example gallery](https://seaborn.pydata.org/examples/index.html)
