# CDAT Migration Regression Testing Notebook (`.nc` files)

This notebook is used to perform regression testing between the development and
production versions of a diagnostic set.

## How it works

It compares the relative differences (%) between ref and test variables between
the dev and `main` branches.

## How to use

PREREQUISITE: The diagnostic set's netCDF stored in `.json` files in two directories
(dev and `main` branches).

1. Make a copy of this notebook under `auxiliary_tools/cdat_regression_testing/<DIR_NAME>`.
2. Run `mamba create -n cdat_regression_test -y -c conda-forge "python<3.12" xarray netcdf4 dask pandas matplotlib-base ipykernel`
3. Run `mamba activate cdat_regression_test`
4. Update `SET_DIR` and `SET_NAME` in the copy of your notebook.
5. Run all cells IN ORDER.
6. Review results for any outstanding differences (>=1e-5 relative tolerance).
   - Debug these differences (e.g., bug in metrics functions, incorrect variable references, etc.)


## Setup Code


In [33]:
from collections import defaultdict
import glob

import numpy as np
import xarray as xr

# TODO: Update SET_NAME and SET_DIR
SET_NAME = "lat_lon"
SET_DIR = "792-lat-lon"

DEV_PATH = f"/global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/{SET_DIR}/{SET_NAME}/**"
MAIN_PATH = f"/global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/main/{SET_NAME}/**"

DEV_GLOB = sorted(glob.glob(DEV_PATH + "/*.nc"))
MAIN_GLOB = sorted(glob.glob(MAIN_PATH + "/*.nc"))

if len(DEV_GLOB) == 0 or len(MAIN_GLOB) == 0:
    raise IOError("No files found at DEV_PATH and/or MAIN_PATH.")

if len(DEV_GLOB) != len(MAIN_GLOB):
    raise IOError("Number of files do not match at DEV_PATH and MAIN_PATH.")

In [34]:
def _get_var_to_filepath_map():
    var_to_file = defaultdict(lambda: defaultdict(dict))

    for dev_file, main_file in zip(DEV_GLOB, MAIN_GLOB):
        # Example:
        # "/global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/660-cosp-histogram/cosp_histogram/ISCCP-COSP/ISCCPCOSP-COSP_HISTOGRAM_ISCCP-ANN-global_test.nc"
        file_arr = dev_file.split("/")

        # Example: "test"
        data_type = dev_file.split("_")[-1].split(".nc")[0]

        # Skip comparing `.nc` "diff" files because comparing relative diffs of
        # does not make sense.
        if data_type == "test" or data_type == "ref":
            # Example: "ISCCP"
            model = file_arr[-2].split("-")[0]
            season = "JJA" if "JJA" in dev_file else "ANN"

            var_to_file[model][data_type][season] = (dev_file, main_file)

    return var_to_file


def _get_relative_diffs(var_to_filepath):
    # Absolute tolerance of 0 and relative tolerance of 1e-5.
    # We are mainly focusing on relative tolerance here (in percentage terms).
    atol = 0
    rtol = 1e-5

    for model, data_types in var_to_filepath.items():
        for _, seasons in data_types.items():
            for _, filepaths in seasons.items():
                print("Comparing:")
                print(filepaths[0], "\n", filepaths[1])
                ds1 = xr.open_dataset(filepaths[0])
                ds2 = xr.open_dataset(filepaths[1])

                try:
                    var_key = f"COSP_HISTOGRAM_{model}"
                    np.testing.assert_allclose(
                        ds1[var_key].values,
                        ds2[var_key].values,
                        atol=atol,
                        rtol=rtol,
                    )
                except AssertionError as e:
                    print(e)
                else:
                    print(f"   * All close and within relative tolerance ({rtol})")

## 1. Compare the netCDF files between branches

- Compare "ref" and "test" files
- "diff" files are ignored because getting relative diffs for these does not make sense (relative diff will be above tolerance)


In [36]:
var_to_filepaths = _get_var_to_filepath_map()

In [37]:
_get_relative_diffs(var_to_filepaths)

Comparing:
/global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/660-cosp-histogram/cosp_histogram/ISCCP-COSP/ISCCPCOSP-COSP_HISTOGRAM_ISCCP-ANN-global_ref.nc 
 /global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/main/cosp_histogram/ISCCP-COSP/ISCCPCOSP-COSP_HISTOGRAM_ISCCP-ANN-global_ref.nc
   * All close and within relative tolerance (1e-05)
Comparing:
/global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/660-cosp-histogram/cosp_histogram/ISCCP-COSP/ISCCPCOSP-COSP_HISTOGRAM_ISCCP-JJA-global_ref.nc 
 /global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/main/cosp_histogram/ISCCP-COSP/ISCCPCOSP-COSP_HISTOGRAM_ISCCP-JJA-global_ref.nc
   * All close and within relative tolerance (1e-05)
Comparing:
/global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/660-cosp-histogram/cosp_histogram/ISCCP-COSP/ISCCPCOSP-COSP_HISTOGRAM_ISCCP-ANN-global_test.nc 
 /global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/main/cosp_histogram/ISCCP-COSP/ISCCPCOSP-COSP_HISTOGRAM_ISCCP-ANN-global_test.nc
   * All close and within relat

### Results

- The relative tolerance of all files are 1e-05, which means things should be good to go.
