# CF checker

A short illustration of what is currently checked by the CF (Climate and Forecast) conventions checker implemented in Valenspy.

In [1]:
import valenspy as vp
import xarray as xr
from pathlib import Path

from valenspy.cf_checks import is_cf_compliant, cf_status

## What is checked?

In Valenspy, the self implemented CF conventions checker checks the following:

- The global attributes `history` and `Conventions` are present.
- Checks if `time` is a dimension and is of the type `datetime64`.
- Per variable the attributes `standard_name`, `long_name` & `units` are present.
- Checks if the file is a netCDF file.

In addition the cf_checker compares the variables in the file with the CORDEX_variables.yml file. 
If the variable is in the CORDEX_variables.yml file, the checker will check if the variable has the correct attributes.

## Some examples
These examples only work on the VSC cluster.
### CF compliant file
Starting with a netCDF compliant file.
Note that the is_cf_compliant function accepts a file path or an xarray dataset.

In [2]:
print(vp.demo_data_CF) # Only works on the VSC cluster
ds =  xr.open_dataset(vp.demo_data_CF)
ds

/dodrio/scratch/projects/2022_200/project_output/RMIB-UGent/vsc46032_kobe/ValEnsPy/src/valenspy/datafiles/tas_Amon_EC-Earth3-Veg_historical_r1i1p1f1_gr_195301-195312.nc


In [3]:
is_cf_compliant(vp.demo_data_CF) # A path or a dataset can be passed - here we pass the path

True

In [4]:
cf_status(ds) # A path or a dataset can be passed - here we pass the dataset

The file is ValEnsPy CF compliant.
25.00% of the variables are ValEnsPy CF compliant
ValEnsPy CF compliant: ['tas']
Unknown to ValEnsPy: ['time_bnds', 'lat_bnds', 'lon_bnds']


### Non-CF compliant file
The next examples include files which are not (completely) CF compliant.


In [5]:
is_cf_compliant('file_that_does_not_exist.nc') #Any file which is not a netCDF file (no .nc extension and cannot be loaded as an xarray) will return False - even if it does not exist

FileNotFoundError: [Errno 2] No such file or directory: '/dodrio/scratch/projects/2022_200/project_output/RMIB-UGent/vsc46032_kobe/ValEnsPy/examples/file_that_does_not_exist.nc'

E-OBS temperature file data set. 

In [6]:

eobs = xr.open_dataset(Path("/dodrio/scratch/projects/2022_200/project_input/External/observations/EOBS/0.1deg/tg_ens_mean_0.1deg_reg_v29.0e.nc"), chunks={'time': 100})
eobs

Unnamed: 0,Array,Chunk
Bytes,33.01 GiB,125.06 MiB
Shape,"(27028, 465, 705)","(100, 465, 705)"
Dask graph,271 chunks in 2 graph layers,271 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 33.01 GiB 125.06 MiB Shape (27028, 465, 705) (100, 465, 705) Dask graph 271 chunks in 2 graph layers Data type float32 numpy.ndarray",705  465  27028,

Unnamed: 0,Array,Chunk
Bytes,33.01 GiB,125.06 MiB
Shape,"(27028, 465, 705)","(100, 465, 705)"
Dask graph,271 chunks in 2 graph layers,271 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


In [7]:
cf_status(eobs)

The file is ValEnsPy CF compliant.
0.00% of the variables are ValEnsPy CF compliant
Unknown to ValEnsPy: ['tg']


It seems that E-OBS files is also CF compliant! However, currently the two datasets both contain temperature data but are not comparable, tg is unuseable within Valenspy. The
E-OBS data set passed the is_cf_compliant function becuase tg is not in the CORDEX_variables.yml file. To be able to use the tg and compare it to our demo_data, E-OBS data should be translated to the CORDEX standard. 

This can be done manually - or for implemented datasets such as E-OBS the input convertor can do this automatically (see EOBS_convertor.ipynb for more details).

In [8]:
from valenspy.inputconverter_functions import EOBS_to_CF

ic = vp.InputConverter(EOBS_to_CF)

eobs = ic.convert_input(eobs) #The input convertor automatically does a cf_status check to see the results
eobs['tas'].attrs #The attributes of the tas variable have been updated to be CF compliant - some additional attributes have been added for bookkeeping

The file is ValEnsPy CF compliant.
100.00% of the variables are ValEnsPy CF compliant
ValEnsPy CF compliant: ['tas']


{'units': 'K',
 'standard_name': 'air_temperature',
 'long_name': 'Near-Surface Air Temperature',
 'original_name': 'tg',
 'original_long_name': 'daily mean temperature',
 'dataset': 'EOBS',
 'freq': 'daily',
 'spatial_resolution': '0.1deg',
 'region': 'europe'}

In [9]:
cf_status(eobs)

The file is ValEnsPy CF compliant.
100.00% of the variables are ValEnsPy CF compliant
ValEnsPy CF compliant: ['tas']


Note that conversion from °C to K is also automatically done by the convertor.

## Incorrect time dimension (and type) example

In [27]:
eobs_wrong = eobs.rename({'time': 'wrong_time'})

The dataset is now no longer CF compliant because the time dimension is not present. 

In [28]:
if is_cf_compliant(eobs_wrong):
    print('The dataset is CF compliant')
else:
    cf_status(eobs_wrong)

Time dimension is missing or has an incorrect type
The file is NOT ValEnsPy CF compliant.
100.00% of the variables are ValEnsPy CF compliant
ValEnsPy CF compliant: ['tas']


Also the if the time dimension is present but not of the type `datetime64`, the dataset is not CF compliant. Here we illustrate this by setting the time dimension to a string/object type.

In [29]:
eobs_wrong = eobs_wrong.rename({'wrong_time': 'time'})
#Change the time format to cf_time
eobs_wrong['time'] = eobs_wrong['time'].dt.strftime('%Y-%m-%d')
print(eobs_wrong['time'].dtype, eobs_wrong['time'][0].values)
if is_cf_compliant(eobs_wrong):
    print('The dataset is CF compliant')
else:
    cf_status(eobs_wrong)

object 1950-01-01
Time dimension is missing or has an incorrect type
The file is NOT ValEnsPy CF compliant.
100.00% of the variables are ValEnsPy CF compliant
ValEnsPy CF compliant: ['tas']
