# CDAT Migration Regression Testing Notebook (`.png` files)

This notebook is used to perform regression testing between the development and
production versions of a diagnostic set.

## How to use

PREREQUISITE: The diagnostic set's netCDF stored in `.json` files in two directories
(dev and `main` branches).

1. Make a copy of this notebook under `auxiliary_tools/cdat_regression_testing/<DIR_NAME>`.
2. Run `mamba create -n cdat_regression_test -y -c conda-forge "python<3.12" xarray netcdf4 dask pandas matplotlib-base ipykernel`
3. Run `mamba activate cdat_regression_test`
4. Update `SET_DIR` and `SET_NAME` in the copy of your notebook.
5. Run all cells IN ORDER.


## Setup Code


In [1]:
import glob
from typing import List

from auxiliary_tools.cdat_regression_testing.utils import get_image_diffs


DEV_DIR = "843-migration-phase3-model-vs-obs"
DEV_PATH = f"/global/cfs/cdirs/e3sm/www/cdat-migration-fy24/{DEV_DIR}/"

DEV_GLOB = sorted(glob.glob(DEV_PATH + "**/**/*.png"))
DEV_NUM_FILES = len(DEV_GLOB)

MAIN_DIR = "main"
MAIN_PATH = f"/global/cfs/cdirs/e3sm/www/cdat-migration-fy24/{MAIN_DIR}/"
MAIN_GLOB = sorted(glob.glob(MAIN_PATH + "**/**/*.png"))
MAIN_NUM_FILES = len(MAIN_GLOB)


def _remove_unwanted_files(file_glob: List[str]) -> List[str]:
    """Remove files that we don't want to compare.

    * area_mean_time_series -- `main` does not generate netCDF
    * enso_diags -- `main` does not generate netCDF
    * qbo -- variable name differs
    * diurnal_cycle -- variable name differs
    * diff -- comparing the difference between regridded files is not helpful
      between branches because of the influence in floating point errors.
    * ERA5_ext-U10-ANN-global_ref and ERA5_ext-U10-JJA-global_ref -- dev
      branch does not generate these files because it is a model-only run.

    Parameters
    ----------
    file_glob : List[str]
        _description_

    Returns
    -------
    List[str]
        _description_
    """

    new_glob = []

    for fp in file_glob:
        if (
            "area_mean_time_series" in fp
            or "enso_diags" in fp
            or "qbo" in fp
            or "diurnal_cycle" in fp
            or "diff" in fp
            or "ERA5_ext-U10-ANN-global_ref" in fp
            or "ERA5_ext-U10-JJA-global_ref" in fp
        ):
            continue

        new_glob.append(fp)

    return new_glob


DEV_GLOB = _remove_unwanted_files(DEV_GLOB)
MAIN_GLOB = _remove_unwanted_files(MAIN_GLOB)

In [2]:
def _check_if_files_found():
    if DEV_NUM_FILES == 0 or MAIN_NUM_FILES == 0:
        raise IOError(
            "No files found at DEV_PATH and/or MAIN_PATH. "
            f"Please check {DEV_PATH} and {MAIN_PATH}."
        )


def _check_if_matching_filecount():
    if DEV_NUM_FILES != MAIN_NUM_FILES:
        raise IOError(
            "Number of files do not match at DEV_PATH and MAIN_PATH "
            f"({DEV_NUM_FILES} vs. {MAIN_NUM_FILES})."
        )

    print(f"Matching file count ({DEV_NUM_FILES} and {MAIN_NUM_FILES}).")


def _check_if_missing_files():
    missing_dev_files = []
    missing_main_files = []

    for fp_main in MAIN_GLOB:
        fp_dev = fp_main.replace(MAIN_PATH, DEV_PATH)

        if fp_dev not in DEV_GLOB:
            missing_dev_files.append(fp_dev)

    for fp_dev in DEV_GLOB:
        fp_main = fp_dev.replace(DEV_PATH, MAIN_PATH)

        if fp_main not in MAIN_GLOB:
            missing_main_files.append(fp_main)

    return missing_dev_files, missing_main_files

In [3]:
len(DEV_GLOB), len(MAIN_GLOB)

(641, 639)

## 1. Check for matching and equal number of files


In [4]:
_check_if_files_found()

In [5]:
missing_dev_files, missing_main_files = _check_if_missing_files()

In [6]:
missing_dev_files

['/global/cfs/cdirs/e3sm/www/cdat-migration-fy24/843-migration-phase3-model-vs-obs/annual_cycle_zonal_mean/AOD_550/AOD_550-AODVIS-ANNUALCYCLE-global.png']

In [7]:
missing_main_files

['/global/cfs/cdirs/e3sm/www/cdat-migration-fy24/main/annual_cycle_zonal_mean/MACv2/MACv2-AODVIS-ANNUALCYCLE-global.png',
 '/global/cfs/cdirs/e3sm/www/cdat-migration-fy24/main/annual_cycle_zonal_mean/MERRA2_Aerosols/MERRA2_Aerosols-AODVIS-ANNUALCYCLE-global.png',
 '/global/cfs/cdirs/e3sm/www/cdat-migration-fy24/main/viewer/viewer/e3sm_logo.png']

In [8]:
_check_if_matching_filecount()

OSError: Number of files do not match at DEV_PATH and MAIN_PATH (697 vs. 695).

## 2 Compare the plots between branches

- Compare "ref" and "test" files
- "diff" files are ignored because getting relative diffs for these does not make sense (relative diff will be above tolerance)


In [10]:
MAIN_GLOB = [f for f in MAIN_GLOB if "AOD_550" not in f]

for main_path in MAIN_GLOB:
    dev_path = main_path.replace(MAIN_PATH, DEV_PATH)
    print("Comparing:")
    print(f"    * {main_path}")
    print(f"    * {dev_path}")

    get_image_diffs(dev_path, main_path)

Comparing:
    * /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/main/aerosol_aeronet/AERONET/AERONET-AODABS-ANN-global.png
    * /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/843-migration-phase3-model-vs-obs/aerosol_aeronet/AERONET/AERONET-AODABS-ANN-global.png
     * Difference path /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/843-migration-phase3-model-vs-obs/aerosol_aeronet/AERONET_diff/AERONET-AODABS-ANN-global.png
Comparing:
    * /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/main/aerosol_aeronet/AERONET/AERONET-AODVIS-ANN-global.png
    * /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/843-migration-phase3-model-vs-obs/aerosol_aeronet/AERONET/AERONET-AODVIS-ANN-global.png
     * Difference path /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/843-migration-phase3-model-vs-obs/aerosol_aeronet/AERONET_diff/AERONET-AODVIS-ANN-global.png
Comparing:
    * /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/main/annual_cycle_zonal_mean/CERES-EBAF-TOA-v4.1/ceres_ebaf_toa_v4.1-ALBEDO-ANNUAL

### Results

All the plots are virtually identical. There looks like one red dot that is different, which creates a diff plot.
