# CDAT Migration Regression Test (FY24)

This notebook is used to perform regression testing between the development and
production versions of a diagnostic set.

## How it works

It compares the relative differences (%) between two sets of `.json` files in two
separate directories, one for the refactored code and the other for the `main` branch.

It will display metrics values with relative differences >= 2%. Relative differences are used instead of absolute differences because:

- Relative differences are in percentages, which shows the scale of the differences.
- Absolute differences are just a raw number that doesn't factor in
  floating point size (e.g., 100.00 vs. 0.0001), which can be misleading.

## How to use

PREREQUISITE: The diagnostic set's metrics stored in `.json` files in two directories
(dev and `main` branches).

1. Make a copy of this notebook.
2. Run `mamba create -n cdat_regression_test -y -c conda-forge "python<3.12" pandas matplotlib-base ipykernel`
3. Run `mamba activate cdat_regression_test`
4. Update `DEV_PATH` and `PROD_PATH` in the copy of your notebook.
5. Run all cells IN ORDER.
6. Review results for any outstanding differences (>= 2%).
   - Debug these differences (e.g., bug in metrics functions, incorrect variable references, etc.)


## Setup Code


In [1]:
import glob
import math
from typing import List

import pandas as pd

# TODO: Update DEV_RESULTS and PROD_RESULTS to your diagnostic sets.
DEV_PATH = "/global/cfs/cdirs/e3sm/www/vo13/examples_658/ex1_modTS_vs_modTS_3years/lat_lon/model_vs_model"
PROD_PATH = "/global/cfs/cdirs/e3sm/www/vo13/examples/ex1_modTS_vs_modTS_3years/lat_lon/model_vs_model"

DEV_GLOB = sorted(glob.glob(DEV_PATH + "/*.json"))
PROD_GLOB = sorted(glob.glob(PROD_PATH + "/*.json"))

# The names of the columns that store percentage difference values.
PERCENTAGE_COLUMNS = [
    "test DIFF (%)",
    "ref DIFF (%)",
    "test_regrid DIFF (%)",
    "ref_regrid DIFF (%)",
    "diff DIFF (%)",
    "misc DIFF (%)",
]

# Core Functions


In [2]:
def get_metrics(filepaths: List[str]) -> pd.DataFrame:
    """Get the metrics using a glob of `.json` metric files in a directory.

    Parameters
    ----------
    filepaths : List[str]
        The filepaths for metrics `.json` files.

    Returns
    -------
    pd.DataFrame
        The DataFrame containing the metrics for all of the variables in
        the results directory.
    """
    metrics = []

    for filepath in filepaths:
        df = pd.read_json(filepath)

        filename = filepath.split("/")[-1]
        var_key = filename.split("-")[1]

        # Add the variable key to the MultiIndex and update the index
        # before stacking to make the DataFrame easier to parse.
        multiindex = pd.MultiIndex.from_product([[var_key], [*df.index]])
        df = df.set_index(multiindex)
        df.stack()

        metrics.append(df)

    df_final = pd.concat(metrics)

    # Reorder columns and drop "unit" column (string dtype breaks Pandas
    # arithmetic).
    df_final = df_final[["test", "ref", "test_regrid", "ref_regrid", "diff", "misc"]]

    return df_final


def get_rel_diffs(df_actual: pd.DataFrame, df_reference: pd.DataFrame) -> pd.DataFrame:
    """Get the relative differences between two DataFrames.

    Formula: abs(actual - reference) / abs(actual)

    Parameters
    ----------
    df_actual : pd.DataFrame
        The first DataFrame representing "actual" results (dev branch).
    df_reference : pd.DataFrame
        The second DataFrame representing "reference" results (main branch).

    Returns
    -------
    pd.DataFrame
        The DataFrame containing absolute and relative differences between
        the metrics DataFrames.
    """
    df_diff = abs(df_actual - df_reference) / abs(df_actual)
    df_diff = df_diff.add_suffix(" DIFF (%)")

    return df_diff


def sort_columns(df: pd.DataFrame) -> pd.DataFrame:
    """Sorts the order of the columns for the final DataFrame output.

    Parameters
    ----------
    df : pd.DataFrame
        The final DataFrame output.

    Returns
    -------
    pd.DataFrame
        The final DataFrame output with sorted columns.
    """
    columns = [
        "test_dev",
        "test_prod",
        "test DIFF (%)",
        "ref_dev",
        "ref_prod",
        "ref DIFF (%)",
        "test_regrid_dev",
        "test_regrid_prod",
        "test_regrid DIFF (%)",
        "ref_regrid_dev",
        "ref_regrid_prod",
        "ref_regrid DIFF (%)",
        "diff_dev",
        "diff_prod",
        "diff DIFF (%)",
        "misc_dev",
        "misc_prod",
        "misc DIFF (%)",
    ]

    df_new = df.copy()
    df_new = df_new[columns]

    return df_new


def update_diffs_to_pct(df: pd.DataFrame):
    """Update relative diff columns from float to string percentage.

    Parameters
    ----------
    df : pd.DataFrame
        The final DataFrame containing metrics and diffs (floats).

    Returns
    -------
    pd.DataFrame
        The final DataFrame containing metrics and diffs (str percentage).
    """
    df_new = df.copy()
    df_new[PERCENTAGE_COLUMNS] = df_new[PERCENTAGE_COLUMNS].map(
        lambda x: "{0:.2f}%".format(x * 100) if not math.isnan(x) else x
    )

    return df_new

## 1. Get the DataFrame containing development and production metrics.


In [3]:
df_metrics_dev = get_metrics(DEV_GLOB)
df_metrics_prod = get_metrics(PROD_GLOB)
df_metrics_all = pd.concat(
    [df_metrics_dev.add_suffix("_dev"), df_metrics_prod.add_suffix("_prod")],
    axis=1,
    join="outer",
)

## 2. Get DataFrame for differences >= 2%.

- Get the relative differences for all metrics
- Filter down metrics to those with differences >= 2%
  - If all cells in a row are NaN (< 2%), the entire row is dropped to make the results easier to parse.
  - Any remaining NaN cells are below < 2% difference and **should be ignored**.


In [4]:
df_metrics_diffs = get_rel_diffs(df_metrics_dev, df_metrics_prod)
df_metrics_diffs_thres = df_metrics_diffs[df_metrics_diffs >= 0.02]
df_metrics_diffs_thres = df_metrics_diffs_thres.dropna(
    axis=0, how="all", ignore_index=False
)

## 3. Combine both DataFrames to get the final result.


In [5]:
df_final = df_metrics_diffs_thres.join(df_metrics_all)
df_final = sort_columns(df_final)
df_final = update_diffs_to_pct(df_final)

## 4. Display final DataFrame and review results.

- <span style="color:red">Red</span> cells are differences >= 2%
- `nan` cells are differences < 2% and **should be ignored**


In [6]:
df_final.reset_index(names=["var_key", "metric"]).style.map(
    lambda x: "background-color : red" if isinstance(x, str) else "",
    subset=pd.IndexSlice[:, PERCENTAGE_COLUMNS],
)

Unnamed: 0,var_key,metric,test_dev,test_prod,test DIFF (%),ref_dev,ref_prod,ref DIFF (%),test_regrid_dev,test_regrid_prod,test_regrid DIFF (%),ref_regrid_dev,ref_regrid_prod,ref_regrid DIFF (%),diff_dev,diff_prod,diff DIFF (%),misc_dev,misc_prod,misc DIFF (%)
0,FLUT,max,299.911864,299.355074,,300.162128,299.776167,,299.911864,299.355074,,300.162128,299.776167,,9.492359,9.788809,3.12%,,,
1,FLUT,min,124.610884,125.987072,,122.878196,124.148986,,124.610884,125.987072,,122.878196,124.148986,,-15.505809,-17.032325,9.84%,,,
2,FSNS,max,269.789702,269.798166,,272.722362,272.184917,,269.789702,269.798166,,272.722362,272.184917,,20.647929,24.859852,20.40%,,,
3,FSNS,min,16.897423,17.760889,5.11%,16.710134,16.237061,2.83%,16.897423,17.760889,5.11%,16.710134,16.237061,2.83%,-28.822277,-28.324921,,,,
4,FSNTOA,max,360.624327,360.209193,,362.188816,361.778529,,360.624327,360.209193,,362.188816,361.778529,,18.602276,22.624266,21.62%,,,
5,FSNTOA,mean,239.859777,240.00186,,241.439641,241.544384,,239.859777,240.00186,,241.439641,241.544384,,-1.579864,-1.542524,2.36%,,,
6,FSNTOA,min,44.907041,48.256818,7.46%,47.223502,50.339608,6.60%,44.907041,48.256818,7.46%,47.223502,50.339608,6.60%,-23.576184,-23.171864,,,,
7,LHFLX,max,282.280453,289.07994,2.41%,275.792933,276.297281,,282.280453,289.07994,2.41%,275.792933,276.297281,,47.535503,53.168924,11.85%,,,
8,LHFLX,mean,88.379609,88.47027,,88.96955,88.976266,,88.379609,88.47027,,88.96955,88.976266,,-0.589942,-0.505996,14.23%,,,
9,LHFLX,min,-0.878371,-0.549248,37.47%,-1.176561,-0.94611,19.59%,-0.878371,-0.549248,37.47%,-1.176561,-0.94611,19.59%,-34.375924,-33.902769,,,,
