# CDAT Migration Regression Testing Notebook (`.nc` files)

This notebook is used to perform regression testing between the development and
production versions of a diagnostic set.

## How it works

It compares the relative differences (%) between ref and test variables between
the dev and `main` branches.

## How to use

PREREQUISITE: The diagnostic set's netCDF stored in `.json` files in two directories
(dev and `main` branches).

1. Make a copy of this notebook under `auxiliary_tools/cdat_regression_testing/<DIR_NAME>`.
2. Run `mamba create -n cdat_regression_test -y -c conda-forge "python<3.12" xarray dask pandas matplotlib-base ipykernel`
3. Run `mamba activate cdat_regression_test`
4. Update `SET_DIR` and `SET_NAME` in the copy of your notebook.
5. Run all cells IN ORDER.
6. Review results for any outstanding differences (>=1e-5 relative tolerance).
   - Debug these differences (e.g., bug in metrics functions, incorrect variable references, etc.)


## Setup Code


In [1]:
import glob
from collections import defaultdict

import numpy as np
import xarray as xr

# TODO: Update SET_NAME and SET_DIR
SET_NAME = "zonal_mean_2d_stratosphere"
SET_DIR = "655-zonal-mean-2d-stratosphere"

DEV_PATH = f"/global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/{SET_DIR}/{SET_NAME}/**"
MAIN_PATH = f"/global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/main/{SET_NAME}/**"

DEV_GLOB = sorted(glob.glob(DEV_PATH + "/*.nc"))
MAIN_GLOB = sorted(glob.glob(MAIN_PATH + "/*.nc"))

if len(DEV_GLOB) == 0 or len(MAIN_GLOB) == 0:
    raise IOError("No files found at DEV_PATH and/or MAIN_PATH.")

dev_num_files = len(DEV_GLOB)
main_num_files = len(MAIN_GLOB)
if dev_num_files != main_num_files:
    raise IOError(
        f"Number of files do not match at DEV_PATH ({dev_num_files}) and MAIN_PATH ({main_num_files})."
    )

In [2]:
def _get_var_to_filepath_map():
    var_to_file = defaultdict(lambda: defaultdict(lambda: defaultdict(dict)))

    for dev_file, main_file in zip(DEV_GLOB, MAIN_GLOB):
        if "relative difference" in dev_file:
            continue

        # Example:
        # "/global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/660-cosp-histogram/cosp_histogram/ISCCP-COSP/ISCCPCOSP-COSP_HISTOGRAM_ISCCP-ANN-global_test.nc"
        file_arr = dev_file.split("/")

        # Example: "test"
        data_type = dev_file.split("_")[-1].split(".nc")[0]

        # Skip comparing `.nc` "diff" files because comparing relative diffs of
        # does not make sense.
        if data_type == "test" or data_type == "ref":
            # Example: [ERA5, OMEGA, ANN, global_ref.nc]
            filename = file_arr[-1].split("-")
            # Example: ERA5
            model = filename[0]
            # Example: OMEGA
            var_key = filename[1]

            season = "JJA" if "JJA" in dev_file else "ANN"

            var_to_file[model][var_key][data_type][season] = (dev_file, main_file)

    return var_to_file


def _get_relative_diffs(var_to_filepath):
    # Absolute tolerance of 0 and relative tolerance of 1e-5.
    # We are mainly focusing on relative tolerance here (in percentage terms).
    atol = 0
    rtol = 1e-5

    for _, var_keys in var_to_filepath.items():
        for var_key, data_types in var_keys.items():
            for _, seasons in data_types.items():
                for _, filepaths in seasons.items():
                    print("Comparing:")
                    print(filepaths[0], "\n", filepaths[1])
                    ds1 = xr.open_dataset(filepaths[0])
                    ds2 = xr.open_dataset(filepaths[1])

                    try:
                        np.testing.assert_allclose(
                            ds1[var_key].values,
                            ds2[var_key].values,
                            atol=atol,
                            rtol=rtol,
                        )
                    except AssertionError as e:
                        print(e)
                    else:
                        print(f"   * All close and within relative tolerance ({rtol})")

## 1. Compare the netCDF files between branches

- Compare "ref" and "test" files
- "diff" files are ignored because getting relative diffs for these does not make sense (relative diff will be above tolerance)


In [3]:
var_to_filepaths = _get_var_to_filepath_map()

In [4]:
var_to_filepaths

defaultdict(<function __main__._get_var_to_filepath_map.<locals>.<lambda>()>,
            {'ERA5': defaultdict(<function __main__._get_var_to_filepath_map.<locals>.<lambda>.<locals>.<lambda>()>,
                         {'OMEGA': defaultdict(dict,
                                      {'ref': {'ANN': ('/global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/655-zonal-mean-2d-stratosphere/zonal_mean_2d_stratosphere/ERA5/ERA5-OMEGA-SON-global_ref.nc',
                                         '/global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/main/zonal_mean_2d_stratosphere/ERA5/ERA5-OMEGA-SON-global_ref.nc'),
                                        'JJA': ('/global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/655-zonal-mean-2d-stratosphere/zonal_mean_2d_stratosphere/ERA5/ERA5-OMEGA-JJA-global_ref.nc',
                                         '/global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/main/zonal_mean_2d_stratosphere/ERA5/ERA5-OMEGA-JJA-global_ref.nc')},
                                    

In [5]:
_get_relative_diffs(var_to_filepaths)

Comparing:
/global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/655-zonal-mean-2d-stratosphere/zonal_mean_2d_stratosphere/ERA5/ERA5-OMEGA-SON-global_ref.nc 
 /global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/main/zonal_mean_2d_stratosphere/ERA5/ERA5-OMEGA-SON-global_ref.nc
   * All close and within relative tolerance (1e-05)
Comparing:
/global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/655-zonal-mean-2d-stratosphere/zonal_mean_2d_stratosphere/ERA5/ERA5-OMEGA-JJA-global_ref.nc 
 /global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/main/zonal_mean_2d_stratosphere/ERA5/ERA5-OMEGA-JJA-global_ref.nc
   * All close and within relative tolerance (1e-05)
Comparing:
/global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/655-zonal-mean-2d-stratosphere/zonal_mean_2d_stratosphere/ERA5/ERA5-OMEGA-SON-global_test.nc 
 /global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/main/zonal_mean_2d_stratosphere/ERA5/ERA5-OMEGA-SON-global_test.nc
   * All close and within relative tolerance (1e-05)
Comparing:
/global/cfs/pro

### Results

Tehre are two datasets not within the relative tolernace:

```python
Comparing:
/global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/655-zonal-mean-2d-stratosphere/zonal_mean_2d_stratosphere/MERRA2/MERRA2-U-SON-global_ref.nc
 /global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/main/zonal_mean_2d_stratosphere/MERRA2/MERRA2-U-SON-global_ref.nc

Not equal to tolerance rtol=1e-05, atol=0

Mismatched elements: 16 / 3610 (0.443%)
Max absolute difference: 1.12679693e-07
Max relative difference: 0.06101062
 x: array([[ 6.962348e-07,  8.762654e-01,  1.605988e+00, ...,  2.470426e+00,
         1.311890e+00,  1.454603e-06],
       [ 1.235800e-06,  1.004880e+00,  1.845646e+00, ...,  2.164340e+00,...
 y: array([[ 6.962348e-07,  8.762654e-01,  1.605988e+00, ...,  2.470426e+00,
         1.311890e+00,  1.454603e-06],
       [ 1.227703e-06,  1.004880e+00,  1.845646e+00, ...,  2.164340e+00,...
Comparing:
/global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/655-zonal-mean-2d-stratosphere/zonal_mean_2d_stratosphere/MERRA2/MERRA2-U-JJA-global_ref.nc
 /global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/main/zonal_mean_2d_stratosphere/MERRA2/MERRA2-U-JJA-global_ref.nc

Not equal to tolerance rtol=1e-05, atol=0

Mismatched elements: 15 / 3610 (0.416%)
Max absolute difference: 2.09578261e-07
Max relative difference: 0.00898989
 x: array([[-9.994902e-07,  1.855954e+00,  3.415055e+00, ..., -1.025078e+00,
        -5.548744e-01, -9.440600e-07],
       [-9.595097e-07,  1.685548e+00,  3.103145e+00, ..., -8.308557e-01,...
 y: array([[-9.994902e-07,  1.855954e+00,  3.415055e+00, ..., -1.025078e+00,
        -5.548744e-01, -9.440600e-07],
       [-9.595177e-07,  1.685548e+00,  3.103145e+00, ..., -8.308557e-01,...
```
