# CDAT Migration Regression Testing Notebook (`.nc` files)

This notebook is used to perform regression testing between the development and
production versions of a diagnostic set.

## How it works

It compares the relative differences (%) between ref and test variables between
the dev and `main` branches.

## How to use

PREREQUISITE: The diagnostic set's netCDF stored in `.json` files in two directories
(dev and `main` branches).

1. Make a copy of this notebook under `auxiliary_tools/cdat_regression_testing/<DIR_NAME>`.
2. Run `mamba create -n cdat_regression_test -y -c conda-forge "python<3.12" xarray netcdf4 dask pandas matplotlib-base ipykernel`
3. Run `mamba activate cdat_regression_test`
4. Update `SET_DIR` and `SET_NAME` in the copy of your notebook.
5. Run all cells IN ORDER.
6. Review results for any outstanding differences (>=1e-5 relative tolerance).
   - Debug these differences (e.g., bug in metrics functions, incorrect variable references, etc.)


## Setup Code


In [1]:
import glob

import numpy as np
import xarray as xr
from e3sm_diags.derivations.derivations import DERIVED_VARIABLES


# TODO: Update SET_NAME and SET_DIR
SET_NAME = "lat_lon"
SET_DIR = "792-lat-lon"

DEV_PATH = f"/global/cfs/cdirs/e3sm/www/cdat-migration-fy24/{SET_DIR}/{SET_NAME}/**"
DEV_GLOB = sorted(glob.glob(DEV_PATH + "/*.nc"))
DEV_NUM_FILES = len(DEV_GLOB)

MAIN_PATH = f"/global/cfs/cdirs/e3sm/www/cdat-migration-fy24/main/{SET_NAME}/**"
MAIN_GLOB = sorted(glob.glob(MAIN_PATH + "/*.nc"))
MAIN_NUM_FILES = len(MAIN_GLOB)

In [2]:
def _check_if_files_found():
    if DEV_NUM_FILES == 0 or MAIN_NUM_FILES == 0:
        raise IOError(
            "No files found at DEV_PATH and/or MAIN_PATH. "
            f"Please check {DEV_PATH} and {MAIN_PATH}."
        )


def _check_if_matching_filecount():
    if DEV_NUM_FILES != MAIN_NUM_FILES:
        raise IOError(
            "Number of files do not match at DEV_PATH and MAIN_PATH "
            f"({DEV_NUM_FILES} vs. {MAIN_NUM_FILES})."
        )

    print(f"Matching file count ({DEV_NUM_FILES} and {MAIN_NUM_FILES}).")


def _check_if_missing_files():
    missing_count = 0

    for fp_main in MAIN_GLOB:
        fp_dev = fp_main.replace(SET_DIR, "main")

        if fp_dev not in MAIN_GLOB:
            print(f"No production file found to compare with {fp_dev}!")
            missing_count += 1

    for fp_dev in DEV_GLOB:
        fp_main = fp_main.replace("main", SET_DIR)

        if fp_main not in DEV_GLOB:
            print(f"No development file found to compare with {fp_main}!")
            missing_count += 1

    print(f"Number of files missing: {missing_count}")

In [3]:
def _get_relative_diffs():
    # We are mainly focusing on relative tolerance here (in percentage terms).
    atol = 0
    rtol = 1e-5

    for fp_main in MAIN_GLOB:
        if "test.nc" in fp_main or "ref.nc" in fp_main:
            fp_dev = fp_main.replace("main", SET_DIR)

            print("Comparing:")
            print(f"    * {fp_dev}")
            print(f"    * {fp_main}")

            ds1 = xr.open_dataset(fp_dev)
            ds2 = xr.open_dataset(fp_main)

            var_key = fp_main.split("-")[-3]
            # for 3d vars such as T-200
            var_key.isdigit()
            if var_key.isdigit():
                var_key = fp_main.split("-")[-4]

            print(f"    * var_key: {var_key}")

            dev_data = _get_var_data(ds1, var_key)
            main_data = _get_var_data(ds2, var_key)

            if dev_data is None or main_data is None:
                print("    * Could not find variable key in the dataset(s)")
                continue

            try:
                np.testing.assert_allclose(
                    dev_data,
                    main_data,
                    atol=atol,
                    rtol=rtol,
                )
            except (KeyError, AssertionError) as e:
                print(f"    {e}")
            else:
                print(f"    * All close and within relative tolerance ({rtol})")


def _get_var_data(ds: xr.Dataset, var_key: str) -> np.ndarray:
    """Get the variable data using a list of matching keys.

    The `main` branch saves the dataset using the original variable name,
    while the dev branch saves the variable with the derived variable name.
    The dev branch is performing the expected behavior here.

    Parameters
    ----------
    ds : xr.Dataset
        _description_
    var_key : str
        _description_

    Returns
    -------
    np.ndarray
        _description_
    """

    data = None

    var_keys = DERIVED_VARIABLES[var_key].keys()
    var_keys = [var_key] + list(sum(var_keys, ()))

    for key in var_keys:
        if key in ds.data_vars.keys():
            data = ds[key].values
            break

    return data

## 1. Check for matching and equal number of files


In [4]:
_check_if_files_found()

In [5]:
_check_if_missing_files()

Number of files missing: 0


In [6]:
_check_if_matching_filecount()

OSError: Number of files do not match at DEV_PATH and MAIN_PATH (600 vs. 592).

### Why are there 8 more dev files?


##### Check which files were not produced by `main`


In [7]:
dev_files = [f.split("/")[-1] for f in DEV_GLOB]
main_files = [f.split("/")[-1] for f in MAIN_GLOB]

list(set(dev_files) - set(main_files))

['HadISST-SST-ANN-global_ref.nc',
 'MACv2-AODDUST-ANN-global_diff.nc',
 'MACv2-AODDUST-JJA-global_ref.nc',
 'MACv2-AODDUST-JJA-global_diff.nc',
 'MACv2-AODDUST-ANN-global_ref.nc',
 'HadISST-SST-ANN-global_diff.nc',
 'HadISST-SST-JJA-global_ref.nc',
 'HadISST-SST-JJA-global_diff.nc']

**Root cause: The reason is because these runs are model-only which means test and ref are the same
variables.**

**Conclusion: There is nothing wrong here, just different I/O behaviors when writing out
ref and diff variables. xCDAT will always write out datasets even if they are the same,
while CDAT does not.**

1. `cdat-migration-fy24`

   - The `ref` and `diff` variables are xr.Dataset objects and written out with `_write_vars_to_netcdf()`

2. `main`

   - The `ref` and `diff` variables are `None` when calling `save_netcdf()` are `None`.
     Attempting to write out these variables results in:

   ```python
       2024-03-04 09:39:16,678 [ERROR]: core_parameter.py(_run_diag:267) >> Error in e3sm_diags.driver.lat_lon_driver
       Traceback (most recent call last):
       File "/global/u2/v/vo13/E3SM-Project/e3sm_diags_main/e3sm_diags/parameter/core_parameter.py", line 264, in _run_diag
           single_result = module.run_diag(self)
       File "/global/u2/v/vo13/E3SM-Project/e3sm_diags_main/e3sm_diags/driver/lat_lon_driver.py", line 232, in run_diag
           create_and_save_data_and_metrics(parameter, mv1_domain, mv2_domain)
       File "/global/u2/v/vo13/E3SM-Project/e3sm_diags_main/e3sm_diags/driver/lat_lon_driver.py", line 61, in create_and_save_data_and_metrics
           utils.general.save_ncfiles(
       File "/global/u2/v/vo13/E3SM-Project/e3sm_diags_main/e3sm_diags/driver/utils/general.py", line 352, in save_ncfiles
           if ref.id.startswith("variable_"):
       AttributeError: 'NoneType' object has no attribute 'id'
   ```


## 2 Compare the netCDF files between branches

- Compare "ref" and "test" files
- "diff" files are ignored because getting relative diffs for these does not make sense (relative diff will be above tolerance)


In [8]:
_get_relative_diffs()

Comparing:
    * /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/792-lat-lon/lat_lon/AOD_550/MACv2-AODDUST-ANN-global_test.nc
    * /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/main/lat_lon/AOD_550/MACv2-AODDUST-ANN-global_test.nc
    * var_key: AODDUST
    * All close and within relative tolerance (1e-05)
Comparing:
    * /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/792-lat-lon/lat_lon/AOD_550/MACv2-AODDUST-JJA-global_test.nc
    * /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/main/lat_lon/AOD_550/MACv2-AODDUST-JJA-global_test.nc
    * var_key: AODDUST
    * All close and within relative tolerance (1e-05)
Comparing:
    * /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/792-lat-lon/lat_lon/AOD_550/MACv2-AODVIS-ANN-global_ref.nc
    * /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/main/lat_lon/AOD_550/MACv2-AODVIS-ANN-global_ref.nc
    * var_key: AODVIS
    * All close and within relative tolerance (1e-05)
Comparing:
    * /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/792-lat-lon/la

### Results

- Most files are within rtol 1e-05

Remaining issues:

- `x and y nan location mismatching`: `ALBEDOC`, `TREFHT`, `CLDTOT_TAU1.3_9.4_ISCCP`, `CLDTOT_TAU1.3_ISCCP`, `CLDTOT_TAU9.4_ISCCP`, `CLDTOT_TAU9.4_ISCCP`, `CLDLOW_TAU1.3_9.4_MISR`, `CLDLOW_TAU1.3_MISR`, `CLDLOW_TAU9.4_MISR`, `CLDTOT_TAU1.3_9.4_MISR`, `CLDTOT_TAU1.3_MISR`, `CLDTOT_TAU9.4_MISR`, `CLDHGH_TAU1.3_9.4_MODIS`, `CLDHGH_TAU1.3_MODIS`, `CLDHGH_TAU9.4_MODIS`, `CLDTOT_TAU1.3_9.4_MODIS`, `CLDTOT_TAU1.3_MODIS`, `CLDTOT_TAU9.4_MODIS`, `TAUXY`, `TREFHT`, `TAUXY`
  - Related to https://github.com/E3SM-Project/e3sm_diags/issues/790
- Large relative differences: `PminusE`, `QREFHT`

  - Related to https://github.com/E3SM-Project/e3sm_diags/issues/790

- Shape mismatch: `QREFHT` ((180, 360), (721, 1440))
