# CF checker

A short illustration of what is currently checked by the CF (Climate and Forecast) conventions checker implemented in Valenspy.

In [3]:
import valenspy as vp
import xarray as xr
from pathlib import Path

## What is checked?

In Valenspy, the self implemented CF conventions checker checks the following:

- The global attributes `history` and `Conventions` are present.
- Per variable the attributes `standard_name`, `long_name` & `units` are present.
- Checks if the file is a netCDF file.

In addition the cf_checker compares the variables in the file with the CORDEX_variables.yml file. 
If the variable is in the CORDEX_variables.yml file, the checker will check if the variable has the correct attributes.

## Some examples
### CF compliant file
Starting with a netCDF compliant file.
Note that the is_cf_compliant function accepts a file path or an xarray dataset.

In [4]:
print(vp.demo_data_CF)
ds =  xr.open_dataset(vp.demo_data_CF)
ds

/dodrio/scratch/projects/2022_200/project_output/RMIB-UGent/vsc46032_kobe/ValEnsPy/src/valenspy/datafiles/tas_Amon_EC-Earth3-Veg_historical_r1i1p1f1_gr_195301-195312.nc


In [5]:
vp.cf_checks.is_cf_compliant(vp.demo_data_CF)

True

In [6]:
vp.cf_checks.is_cf_compliant(ds)

True

### Non-CF compliant file
The next examples include files which are not (completely) CF compliant.


In [7]:
vp.cf_checks.is_cf_compliant('file_that_does_not_exist.nc') #Any file which is not a netCDF file (no .nc extension and cannot be loaded as an xarray) will return False - even if it does not exist

False

E-OBS temperature file data set. 

In [8]:

eobs = xr.open_dataset(Path("/dodrio/scratch/projects/2022_200/project_input/External/observations/EOBS/0.1deg/tg_ens_mean_0.1deg_reg_v29.0e.nc"), chunks={'time': 100})
eobs

Unnamed: 0,Array,Chunk
Bytes,33.01 GiB,125.06 MiB
Shape,"(27028, 465, 705)","(100, 465, 705)"
Dask graph,271 chunks in 2 graph layers,271 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 33.01 GiB 125.06 MiB Shape (27028, 465, 705) (100, 465, 705) Dask graph 271 chunks in 2 graph layers Data type float32 numpy.ndarray",705  465  27028,

Unnamed: 0,Array,Chunk
Bytes,33.01 GiB,125.06 MiB
Shape,"(27028, 465, 705)","(100, 465, 705)"
Dask graph,271 chunks in 2 graph layers,271 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


In [9]:
vp.cf_checks.is_cf_compliant(Path("/dodrio/scratch/projects/2022_200/project_input/External/observations/EOBS/0.1deg/tg_ens_mean_0.1deg_reg_v29.0e.nc"))

True

It seems that E-OBS files is also CF compliant! However, currently the two datasets both contain temperature data but are not comparable.
E-OBS data set passed the is_cf_compliant function becuase tg is not in the CORDEX_variables.yml file. To be able to use the two datasets together, E-OBS data should be translated to the CORDEX standard.

In [10]:
eobs = eobs.rename({'tg': 'tas'})
eobs['tas'].attrs

{'units': 'Celsius',
 'long_name': 'mean temperature',
 'standard_name': 'air_temperature',
 'cell_methods': 'time: mean'}

In [11]:
vp.cf_checks.is_cf_compliant(eobs)

False

Now it is no longer "CF compliant" because the variable `tas` is expected to be in Kelvin, not in Celsius and some attributes are missing.

In [19]:
#Convert to Kelvin
eobs['tas'] = eobs['tas'] + 273.15
eobs['tas'].attrs['units'] = 'K'
eobs['tas'].attrs['long_name'] = "Near-Surface Air Temperature"
eobs['tas'].attrs['standard_name'] = "air_temperature"
eobs['tas'].attrs

{'units': 'K',
 'long_name': 'Near-Surface Air Temperature',
 'standard_name': 'air_temperature'}

In [21]:
vp.cf_checks.is_cf_compliant(eobs) #Now the data is CF compliant and compatible for direct usage with valenspy.demo_data_CF

True

### TODO

Once the E-OBS data convertor is implemented, show E-OBS conversions results in CF-compliant - CORDEX-compliant data.