# CF checker

A short illustration of what is currently checked by the CF (Climate and Forecast) conventions checker implemented in Valenspy.

In [2]:
import valenspy as vp
import xarray as xr
from pathlib import Path

from valenspy import is_cf_compliant, cf_status

## What is checked?

In Valenspy, the self implemented CF conventions checker checks the following:

- The global attributes `history` and `Conventions` are present.
- Checks if `time` is a dimension and is of the type `datetime64`.
- Per variable the attributes `standard_name`, `long_name` & `units` are present.
- Checks if the file is a netCDF file.

In addition the cf_checker compares the variables in the file with the CORDEX_variables.yml file. 
If the variable is in the CORDEX_variables.yml file, the checker will check if the variable has the correct attributes.

## Some examples
These examples only work on the VSC cluster.
### CF compliant file
Starting with a netCDF compliant file.
Note that the is_cf_compliant function accepts a file path or an xarray dataset.

In [23]:
ds =  vp.InputManager(machine='hortense').load_data("ERA5",["tas"], period=[2000],freq="daily",region="europe", path_identifiers=["min"])
ds

File paths found:
/dodrio/scratch/projects/2022_200/project_input/External/observations/era5/europe/daily/2m_temperature/era5-daily_min-europe-2m_temperature-2000.nc
/dodrio/scratch/projects/2022_200/project_input/External/observations/era5/europe/2m_temperature/daily/era5-daily-europe-2m_temperature_min-2000.nc


ValueError: Resulting object does not have monotonic global indexes along dimension time

In [15]:
is_cf_compliant(ds) # A path or a dataset can be passed - here we pass the path

True

In [16]:
cf_status(ds) # A path or a dataset can be passed - here we pass the dataset

The file is ValEnsPy CF compliant.
50.00% of the variables are ValEnsPy CF compliant
ValEnsPy CF compliant: ['ts']
Unknown to ValEnsPy: ['time_bnds']


### Non-CF compliant file
The next examples include files which are not (completely) CF compliant.


In [17]:
is_cf_compliant('file_that_does_not_exist.nc') #Any file which is not a netCDF file (no .nc extension and cannot be loaded as an xarray) will return False - even if it does not exist

TypeError: The input should be a Path, a list of Paths or an xarray dataset.

E-OBS temperature file data set - directly loaded from file without the [input manager](input_manager.ipynb) and hence without an [input convertor](input_convertors.ipynb).

In [18]:

eobs = xr.open_dataset(Path("/dodrio/scratch/projects/2022_200/project_input/External/observations/EOBS/0.1deg/tg_ens_mean_0.1deg_reg_v29.0e.nc"), chunks={'time': 100})
eobs

Unnamed: 0,Array,Chunk
Bytes,33.01 GiB,125.06 MiB
Shape,"(27028, 465, 705)","(100, 465, 705)"
Dask graph,271 chunks in 2 graph layers,271 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 33.01 GiB 125.06 MiB Shape (27028, 465, 705) (100, 465, 705) Dask graph 271 chunks in 2 graph layers Data type float32 numpy.ndarray",705  465  27028,

Unnamed: 0,Array,Chunk
Bytes,33.01 GiB,125.06 MiB
Shape,"(27028, 465, 705)","(100, 465, 705)"
Dask graph,271 chunks in 2 graph layers,271 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


In [19]:
cf_status(eobs)

The file is ValEnsPy CF compliant.
0.00% of the variables are ValEnsPy CF compliant
Unknown to ValEnsPy: ['tg']


It seems that E-OBS files is also CF compliant! However, currently the two datasets both contain temperature data but are not comparable, tg is unuseable within Valenspy. The E-OBS data set passed the is_cf_compliant function becuase tg is not in the CORDEX_variables.yml file. To be able to use the tg and compare it to our demo_data, E-OBS data should be translated to the CORDEX standard:

In [22]:
eobs =  vp.InputManager(machine='hortense').load_data("EOBS",["tas"], path_identifiers=["mean"])
eobs

File paths found:
/dodrio/scratch/projects/2022_200/project_input/External/observations/EOBS/0.1deg/tg_ens_mean_0.1deg_reg_v29.0e.nc
The file is ValEnsPy CF compliant.
100.00% of the variables are ValEnsPy CF compliant
ValEnsPy CF compliant: ['tas']


Unnamed: 0,Array,Chunk
Bytes,33.01 GiB,127.56 MiB
Shape,"(27028, 465, 705)","(102, 465, 705)"
Dask graph,265 chunks in 5 graph layers,265 chunks in 5 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 33.01 GiB 127.56 MiB Shape (27028, 465, 705) (102, 465, 705) Dask graph 265 chunks in 5 graph layers Data type float32 numpy.ndarray",705  465  27028,

Unnamed: 0,Array,Chunk
Bytes,33.01 GiB,127.56 MiB
Shape,"(27028, 465, 705)","(102, 465, 705)"
Dask graph,265 chunks in 5 graph layers,265 chunks in 5 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


In [21]:
cf_status(eobs)

The file is ValEnsPy CF compliant.
100.00% of the variables are ValEnsPy CF compliant
ValEnsPy CF compliant: ['pr']


Note that conversion from °C to K is also automatically done by the convertor.

## Incorrect time dimension (and type) example

In [27]:
eobs_wrong = eobs.rename({'time': 'wrong_time'})

The dataset is now no longer CF compliant because the time dimension is not present. 

In [28]:
if is_cf_compliant(eobs_wrong):
    print('The dataset is CF compliant')
else:
    cf_status(eobs_wrong)

Time dimension is missing or has an incorrect type
The file is NOT ValEnsPy CF compliant.
100.00% of the variables are ValEnsPy CF compliant
ValEnsPy CF compliant: ['tas']


Also the if the time dimension is present but not of the type `datetime64`, the dataset is not CF compliant. Here we illustrate this by setting the time dimension to a string/object type.

In [29]:
eobs_wrong = eobs_wrong.rename({'wrong_time': 'time'})
#Change the time format to cf_time
eobs_wrong['time'] = eobs_wrong['time'].dt.strftime('%Y-%m-%d')
print(eobs_wrong['time'].dtype, eobs_wrong['time'][0].values)
if is_cf_compliant(eobs_wrong):
    print('The dataset is CF compliant')
else:
    cf_status(eobs_wrong)

object 1950-01-01
Time dimension is missing or has an incorrect type
The file is NOT ValEnsPy CF compliant.
100.00% of the variables are ValEnsPy CF compliant
ValEnsPy CF compliant: ['tas']
