# Use CDO to Compare Model and Observation Data


In this notebook we demostrate how to compare model and observation data:

- Look at the data contained in our example file
- Concatenate multiple files
- Data remapping
- Compare model and observation data 

This example uses the Coupled Model Intercomparison Project (CMIP6) collections.(http://dx.doi.org/10.25914/5b98afc88531e).

---
inspired by the notebook in  https://github.com/NCI-data-analysis-platform/climate-cmip.git
- Authors: NCI Virtual Research Environment Team
- Keywords: CMIP, CDO, concatenate data, data remapping
- Create Date: 2019-Oct; Update Date: 2021-Feb
---
Adapted to DKRZ env: S. Kindermann, August 2022

This notebook is licenced under the [Creative Commons Attribution 4.0 International license](https://creativecommons.org/licenses/by/4.0/)


### Load CDO module

To load the CDO module on the VDI, run:

``` $ module load cdo```

### Check our data

Let's look at the near surface temperature from the 20th century all-forcing historical simulation based on the CESM2.0 model of NCAR:

In [None]:
!ls /pool/data/CMIP6/data/CMIP/NCAR/CESM2/historical/r1i1p1f1/Amon/tas/gn/v20190308/tas_Amon_CESM2_historical_r1i1p1f1_gn_185001-201412.nc

### Have a look at the data file using cdo info

**Basic usage:**  
```
cdo info <filename> | less
```
**less** display only one page at a time in the termial. You can move forwards and backwards to see more. Press **q** to quit the view.

### Let's see which years this file includes

We use the function `showyear` to display all the years in this file:

In [None]:
!cdo showyear /pool/data/CMIP6/data/CMIP/NCAR/CESM2/historical/r1i1p1f1/Amon/tas/gn/v20190308/tas_Amon_CESM2_historical_r1i1p1f1_gn_185001-201412.nc

### Select only 10 years pf data from the original model file

**basic usage**:
```
cdo selyear, stratyear/endyear <input.nc> <output.nc>
```

In [None]:
### create output directory if it doesn't already exist
import os
outdir = './output'
if not os.path.exists(outdir):
    os.mkdir(outdir)

!cdo selyear,1991/2000 /pool/data/CMIP6/data/CMIP/NCAR/CESM2/historical/r1i1p1f1/Amon/tas/gn/v20190308/tas_Amon_CESM2_historical_r1i1p1f1_gn_185001-201412.nc  ./output/tas_Amon_CESM2_historical_r1i1p1f1_gn_199101-200012.nc

### show the attributes of the data

In [None]:
!cdo showatts ./output/tas_Amon_CESM2_historical_r1i1p1f1_gn_199101-200012.nc 

The unit of Near-Surface Air Temperature is 'K'--Kelvin. We can convert Kelvin to Celsius in order to be consistent with observation data. First we will substract 273.15 from our Near-Surface Air Temperature data values and second, we will need to change the attribution units:

In [None]:
!cdo setattribute,tas@units=degC -subc,273.15 ./output/tas_Amon_CESM2_historical_r1i1p1f1_gn_199101-200012.nc ./output/tas_Amon_CESM2_historical_r1i1p1f1_gn_199101-200012_unitC.nc 

### Find observational temperature data and select year 1991-2000

The observational temperature data used below has been downloaded from the NOAA website: https://psl.noaa.gov/repository/entry/show?entryid=synth:e570c8f9-ec09-4e89-93b4-babd5651e7a9:L3VkZWwuYWlydC5wcmVjaXAvYWlyLm1vbi5tZWFuLnY1MDEubmM=. This file is the monthly mean of surface temperature reanalysis data.

In [None]:
!ls ../../data/air.mon.mean.v501.nc

In [None]:
!cdo selyear,1991/2000 ../../data/air.mon.mean.v501.nc  ./output/air.mon.mean.v501.199101-200012.nc

### To see the difference between the model data and observation data

**Basic usage:**  
```
cdo sub <input1.nc> <input2.nc> <output.nc> 

```
This operation will substract input2.nc from input1.nc with the results being written to output.nc

In [None]:
!cdo sub ./output/tas_Amon_CESM2_historical_r1i1p1f1_gn_199101-200012.nc ./output/air.mon.mean.v501.199101-200012.nc ./output/CESM2_HADCRU_dif.nc

Hopever, the operation above runs into the following error:

**cdo sub(Abort): Grid size of the input parameter tas do not match!**

This is because the resolution of the model data is different from that of the observation data. CDO provides several ways of data interpolation, one of which is `cdo remapcon`.

**basic usage:**
```
cdo remapcon, <input1.nc> <input2.nc> <output.nc>
```
Here input1.nc is the file that we want the resolution to be consistent with.
So, let's do remapping first and then subtraction.

In [None]:
!cdo sub -remapcon,./output/air.mon.mean.v501.199101-200012.nc ./output/tas_Amon_CESM2_historical_r1i1p1f1_gn_199101-200012_unitC.nc ./output/air.mon.mean.v501.199101-200012.nc ./output/CESM2_DelawareT_dif.nc

### Calculate average difference and show it in ncview

In [None]:
!cdo timavg ./output/CESM2_DelawareT_dif.nc ./output/CESM2_DelawareT_dif_avg.nc

In [None]:
!ls ./output/CESM2_DelawareT_dif_avg.nc

In [None]:
import xarray as xr
file=xr.open_dataset("./output/CESM2_DelawareT_dif_avg.nc")
file.tas.plot()

#!ncview ./output/CESM2_DelawareT_dif_avg.nc

<div class="alert alert-info">
<b>Tip: </b> In CDO, an artificial distinction is made between the notions mean (e.g.timmean) and average (e.g. timavg). The mean is regarded as a statistical function, whereas the average is found simply by adding the sample members and dividing the result by the sample size. For example, the mean of 1, 2, miss and 3 is (1 + 2 + 3)/3 = 2, whereas the average is (1 + 2 + miss + 3)/4. If there are no missing values in the sample, the average and mean are identical.
</div>

![ ](output/cdo_comp3.png)

We can see that in some areas the model simulated temperature is higher than the observation data, whereas other areas are lower than the observation, and the difference seems to be greater at the higher latitude areas.

### Summary

In this example, we show how to use cdo to concatenate data files and remap data in order to change its resolution.

## Reference

https://code.mpimet.mpg.de/projects/cdo/embedded/cdo.pdf
