# CDAT Migration Regression Testing Notebook

This notebook is used to perform regression testing between the development and
production versions of a diagnostic set.

## How it works

It compares the relative differences (%) between two sets of `.json` files in two
separate directories, one for the refactored code and the other for the `main` branch.

It will display metrics values with relative differences >= 2%. Relative differences are used instead of absolute differences because:

- Relative differences are in percentages, which shows the scale of the differences.
- Absolute differences are just a raw number that doesn't factor in
  floating point size (e.g., 100.00 vs. 0.0001), which can be misleading.

## How to use

PREREQUISITE: The diagnostic set's metrics stored in `.json` files in two directories
(dev and `main` branches).

1. Make a copy of this notebook under `auxiliary_tools/cdat_regression_testing/<DIR_NAME>`.
2. Run `mamba create -n cdat_regression_test -y -c conda-forge "python<3.12" xarray dask pandas matplotlib-base ipykernel`
3. Run `mamba activate cdat_regression_test`
4. Update `DEV_PATH` and `MAIN_PATH` in the copy of your notebook.
5. Run all cells IN ORDER.
6. Review results for any outstanding differences (>= 2%).
   - Debug these differences (e.g., bug in metrics functions, incorrect variable references, etc.)


## Setup Code


In [72]:
import glob
from auxiliary_tools.cdat_regression_testing.utils import (
    get_metrics,
    get_rel_diffs,
    get_num_metrics_above_diff_thres,
    highlight_large_diffs,
    sort_columns,
    update_diffs_to_pct,
    PERCENTAGE_COLUMNS,
)
import xarray as xr

# TODO: Update DEV_RESULTS and MAIN_RESULTS to your diagnostic sets.
DEV_PATH = "/global/cfs/cdirs/e3sm/www/vo13/cdat-migration-test/660-cosp-histogram/cosp_histogram/cosp_histogram/model_vs_model"
MAIN_PATH = "/global/cfs/cdirs/e3sm/www/vo13/cdat-migration-test/main-cosp-histogram/cosp_histogram/cosp_histogram/model_vs_model"

DEV_GLOB = sorted(glob.glob(DEV_PATH + "/*.nc"))
MAIN_GLOB = sorted(glob.glob(MAIN_PATH + "/*.nc"))

In [73]:
# For every 3 files join them together
import numpy as np

# MAIN_GLOB_ADJ = [fname for fname in MAIN_GLOB if "diff" not in fname]
MAIN_GLOB_SPLIT = np.array_split(np.array(MAIN_GLOB), len(DEV_GLOB))
MAIN_GLOB_SPLIT

[array(['/global/cfs/cdirs/e3sm/www/vo13/cdat-migration-test/main-cosp-histogram/cosp_histogram/cosp_histogram/model_vs_model/MISRCOSP-COSP_HISTOGRAM_MISR-ANN-global_diff.nc',
        '/global/cfs/cdirs/e3sm/www/vo13/cdat-migration-test/main-cosp-histogram/cosp_histogram/cosp_histogram/model_vs_model/MISRCOSP-COSP_HISTOGRAM_MISR-ANN-global_ref.nc',
        '/global/cfs/cdirs/e3sm/www/vo13/cdat-migration-test/main-cosp-histogram/cosp_histogram/cosp_histogram/model_vs_model/MISRCOSP-COSP_HISTOGRAM_MISR-ANN-global_test.nc'],
       dtype='<U164'),
 array(['/global/cfs/cdirs/e3sm/www/vo13/cdat-migration-test/main-cosp-histogram/cosp_histogram/cosp_histogram/model_vs_model/MISRCOSP-COSP_HISTOGRAM_MISR-DJF-global_diff.nc',
        '/global/cfs/cdirs/e3sm/www/vo13/cdat-migration-test/main-cosp-histogram/cosp_histogram/cosp_histogram/model_vs_model/MISRCOSP-COSP_HISTOGRAM_MISR-DJF-global_ref.nc',
        '/global/cfs/cdirs/e3sm/www/vo13/cdat-migration-test/main-cosp-histogram/cosp_histogram/co

In [74]:
xr.open_dataset(DEV_GLOB[0])["COSP_HISTOGRAM_MISR_ref"].values

array([[       nan,        nan, 0.73055494,        nan, 0.07541108,
               nan],
       [       nan,        nan, 1.55165947,        nan, 0.20180927,
               nan],
       [       nan,        nan, 2.78143362,        nan, 0.43480856,
               nan],
       [       nan,        nan, 2.9150847 ,        nan, 0.5316223 ,
               nan],
       [       nan,        nan, 1.87635564,        nan, 0.42303243,
               nan],
       [       nan,        nan, 0.99869349,        nan, 0.2943452 ,
               nan],
       [       nan,        nan, 1.0971112 ,        nan, 0.42190844,
               nan],
       [       nan,        nan, 0.76867767,        nan, 0.3576838 ,
               nan],
       [       nan,        nan, 1.51022843,        nan, 0.71505283,
               nan],
       [       nan,        nan, 0.92953173,        nan, 0.54328104,
               nan],
       [       nan,        nan, 0.74486766,        nan, 0.46928398,
               nan],
       [       nan,  

In [75]:
xr.open_mfdataset(MAIN_GLOB_SPLIT[0])

Unnamed: 0,Array,Chunk
Bytes,720 B,720 B
Shape,"(15, 6)","(15, 6)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 720 B 720 B Shape (15, 6) (15, 6) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",6  15,

Unnamed: 0,Array,Chunk
Bytes,720 B,720 B
Shape,"(15, 6)","(15, 6)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,48 B,48 B
Shape,"(6, 2)","(6, 2)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 48 B 48 B Shape (6, 2) (6, 2) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",2  6,

Unnamed: 0,Array,Chunk
Bytes,48 B,48 B
Shape,"(6, 2)","(6, 2)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,720 B,720 B
Shape,"(15, 6)","(15, 6)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 720 B 720 B Shape (15, 6) (15, 6) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",6  15,

Unnamed: 0,Array,Chunk
Bytes,720 B,720 B
Shape,"(15, 6)","(15, 6)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,240 B,240 B
Shape,"(15, 2)","(15, 2)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 240 B 240 B Shape (15, 2) (15, 2) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",2  15,

Unnamed: 0,Array,Chunk
Bytes,240 B,240 B
Shape,"(15, 2)","(15, 2)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,96 B,96 B
Shape,"(6, 2)","(6, 2)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 96 B 96 B Shape (6, 2) (6, 2) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",2  6,

Unnamed: 0,Array,Chunk
Bytes,96 B,96 B
Shape,"(6, 2)","(6, 2)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,720 B,720 B
Shape,"(15, 6)","(15, 6)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 720 B 720 B Shape (15, 6) (15, 6) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",6  15,

Unnamed: 0,Array,Chunk
Bytes,720 B,720 B
Shape,"(15, 6)","(15, 6)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


## 1. Get the metrics for the development and `main` branches and their differences.


In [76]:
var_key = "COSP_HISTOGRAM_MISR"

for file_a, file_b in zip(DEV_GLOB, MAIN_GLOB_SPLIT):
    ds_a = xr.open_dataset(file_a)
    ds_b = xr.open_mfdataset(file_b)

    np.testing.assert_allclose(
        ds_a[f"{var_key}_test"].values, ds_b[f"{var_key}_test"].values
    )
    try:
        np.testing.assert_allclose(
            ds_a[f"{var_key}_ref"].values, ds_b[f"{var_key}_ref"].values
        )
    except AssertionError:
        print(ds_a[f"{var_key}_ref"])
        print(ds_b[f"{var_key}_ref"].load())
    # np.testing.assert_allclose(
    #     ds_a[f"{var_key}_diff"].values, ds_b[f"{var_key}_diff"].values
    # )

<xarray.DataArray 'COSP_HISTOGRAM_MISR_ref' (misr_cth: 15, misr_tau: 6)>
array([[     nan,      nan, 0.730555,      nan, 0.075411,      nan],
       [     nan,      nan, 1.551659,      nan, 0.201809,      nan],
       [     nan,      nan, 2.781434,      nan, 0.434809,      nan],
       [     nan,      nan, 2.915085,      nan, 0.531622,      nan],
       [     nan,      nan, 1.876356,      nan, 0.423032,      nan],
       [     nan,      nan, 0.998693,      nan, 0.294345,      nan],
       [     nan,      nan, 1.097111,      nan, 0.421908,      nan],
       [     nan,      nan, 0.768678,      nan, 0.357684,      nan],
       [     nan,      nan, 1.510228,      nan, 0.715053,      nan],
       [     nan,      nan, 0.929532,      nan, 0.543281,      nan],
       [     nan,      nan, 0.744868,      nan, 0.469284,      nan],
       [     nan,      nan, 0.653314,      nan, 0.447334,      nan],
       [     nan,      nan, 0.290538,      nan, 0.2414  ,      nan],
       [     nan,      nan, 0.

## 2. Filter differences to those above maximum threshold (2%).

All values below maximum threshold will be labeled as `NaN`.

- **If all cells in a row are NaN (< 2%)**, the entire row is dropped to make the results easier to parse.
- Any remaining NaN cells are below < 2% difference and **should be ignored**.


In [24]:
df_metrics_diffs_thres = df_metrics_diffs[df_metrics_diffs >= 0.02]
df_metrics_diffs_thres = df_metrics_diffs_thres.dropna(
    axis=0, how="all", ignore_index=False
)

## 3. Combine all DataFrames to get the final result.


In [25]:
df_metrics_all = pd.concat(
    [df_metrics_dev.add_suffix("_dev"), df_metrics_main.add_suffix("_main")],
    axis=1,
    join="outer",
)
df_final = df_metrics_diffs_thres.join(df_metrics_all)
df_final = sort_columns(df_final)
df_final = update_diffs_to_pct(df_final)

## 4. Review variables and metrics above difference threshold.

- <span style="color:red">Red</span> cells are differences >= 2%
- `nan` cells are differences < 2% and **should be ignored**


In [26]:
remove_metrics = ["min", "max"]
df_metrics_sub = df_final.reset_index(names=["var_key", "metric"])
df_metrics_sub = df_metrics_sub[~df_metrics_sub.metric.isin(remove_metrics)]
get_num_metrics_above_diff_thres(df_metrics_all, df_metrics_sub)

* Related variables ['FSNTOA', 'LHFLX', 'LWCF', 'NET_FLUX_SRF', 'PRECT', 'PSL', 'RESTOM', 'TREFHT']
* Number of metrics above 2% max threshold: 11 / 96


In [28]:
highlight_large_diffs(df_metrics_sub)

Unnamed: 0,var_key,metric,test_dev,test_main,test DIFF (%),ref_dev,ref_main,ref DIFF (%),test_regrid_dev,test_regrid_main,test_regrid DIFF (%),ref_regrid_dev,ref_regrid_main,ref_regrid DIFF (%),misc_dev,misc_main,misc DIFF (%)
5,FSNTOA,mean,239.859777,240.00186,,241.439641,241.544384,,239.859777,240.00186,,241.439641,241.544384,,,,
8,LHFLX,mean,88.379609,88.47027,,88.96955,88.976266,,88.379609,88.47027,,88.96955,88.976266,,,,
11,LWCF,mean,24.373224,24.370539,,24.406697,24.391579,,24.373224,24.370539,,24.406697,24.391579,,,,
16,NET_FLUX_SRF,mean,0.394016,0.51633,31.04%,-0.068186,0.068584,200.58%,0.394016,0.51633,31.04%,-0.068186,0.068584,200.58%,,,
19,PRECT,mean,3.053802,3.05676,,3.074885,3.074978,,3.053802,3.05676,,3.074885,3.074978,,,,
21,PSL,rmse,,,,,,,,,,,,,1.042884,0.979981,6.03%
23,RESTOM,mean,0.481549,0.65656,36.34%,0.018041,0.162984,803.40%,0.481549,0.65656,36.34%,0.018041,0.162984,803.40%,,,
34,TREFHT,mean,14.769946,14.741707,,13.842013,13.800258,,14.769946,14.741707,,13.842013,13.800258,,,,
35,TREFHT,mean,9.214224,9.114572,,8.083349,7.957917,,9.214224,9.114572,,8.083349,7.957917,,,,
40,TREFHT,rmse,,,,,,,,,,,,,1.160718,1.179995,2.68%


## `NET_FLUX_SRF` and `RESTOM` contain the highest differences and should be investigated further
