CDAT Migration: Refactor annual_cycle_zonal_mean set #798

chengzhuzhang · 2024-03-08T20:22:20Z

Description

Refactor annual_cycle_zonal_mean with xarray/xcdat
Driver is pretty short and has unique _create_annual_cycle function

Closes CDAT Migration Phase 2: Refactor annual_cycle_zonal_mean set #669

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
My changes generate no new warnings
Any dependent changes have been merged and published in downstream modules

If applicable:

New and existing unit tests pass with my changes (locally and CI/CD build)
I have added tests that prove my fix is effective or that my feature works
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)

chengzhuzhang · 2024-03-22T22:19:50Z

Basic driver and plotting scripts are working. Through only with MULTIPROCESSING=FALSE, if switching it on, I hit errors as follows. It appears from ds = xc.open_mfdataset(**args) which is newly added to ready multi-months data to concatenate into annual cycle time series.

Traceback (most recent call last):
  File "/global/u2/c/chengzhu/.vscode-server/extensions/ms-python.debugpy-2024.2.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_comm.py", line 493, in start_client
    s.connect((host, port))
TimeoutError: timed out
2024-03-22 15:14:20,314 [ERROR]: core_parameter.py(_run_diag:341) >> Error in e3sm_diags.driver.annual_cycle_zonal_mean_driver
Traceback (most recent call last):
  File "/global/homes/c/chengzhu/e3sm_diags/e3sm_diags/parameter/core_parameter.py", line 338, in _run_diag
    single_result = module.run_diag(self)
  File "/global/homes/c/chengzhu/e3sm_diags/e3sm_diags/driver/annual_cycle_zonal_mean_driver.py", line 68, in run_diag
    ds_test = test_ds.get_climo_dataset(var_key, "ANNUALCYCLE")
  File "/global/homes/c/chengzhu/e3sm_diags/e3sm_diags/driver/utils/dataset_xr.py", line 365, in get_climo_dataset
    ds = self._get_climo_dataset(season)
  File "/global/homes/c/chengzhu/e3sm_diags/e3sm_diags/driver/utils/dataset_xr.py", line 393, in _get_climo_dataset
    ds = self._open_annual_cycle_climo_dataset(filepath)
  File "/global/homes/c/chengzhu/e3sm_diags/e3sm_diags/driver/utils/dataset_xr.py", line 425, in _open_annual_cycle_climo_dataset
    ds = xc.open_mfdataset(**args)
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xcdat/dataset.py", line 277, in open_mfdataset
    ds = xr.open_mfdataset(
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/backends/api.py", line 1053, in open_mfdataset
    combined = combine_by_coords(
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/core/combine.py", line 958, in combine_by_coords
    concatenated_grouped_by_data_vars = tuple(
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/core/combine.py", line 959, in <genexpr>
    _combine_single_variable_hypercube(
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/core/combine.py", line 630, in _combine_single_variable_hypercube
    concatenated = _combine_nd(
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/core/combine.py", line 232, in _combine_nd
    combined_ids = _combine_all_along_first_dim(
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/core/combine.py", line 267, in _combine_all_along_first_dim
    new_combined_ids[new_id] = _combine_1d(
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/core/combine.py", line 290, in _combine_1d
    combined = concat(
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/core/concat.py", line 252, in concat
    return _dataset_concat(
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/core/concat.py", line 526, in _dataset_concat
    merged_vars, merged_indexes = merge_collected(
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/core/merge.py", line 290, in merge_collected
    merged_vars[name] = unique_variable(
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/core/merge.py", line 137, in unique_variable
    out = out.compute()
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/core/variable.py", line 547, in compute
    return new.load(**kwargs)
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/core/variable.py", line 520, in load
    loaded_data, *_ = chunkmanager.compute(self._data, **kwargs)
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/core/daskmanager.py", line 70, in compute
    return compute(*data, **kwargs)
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/dask/base.py", line 628, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/multiprocessing/context.py", line 281, in _Popen
    return Popen(process_obj)
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/multiprocessing/popen_fork.py", line 66, in _launch
    self.pid = os.fork()
  File "/global/u2/c/chengzhu/.vscode-server/extensions/ms-python.debugpy-2024.2.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydev_bundle/pydev_monkey.py", line 956, in new_fork
    _on_forked_process(setup_tracing=apply_arg_patch and not is_subprocess_fork)
  File "/global/u2/c/chengzhu/.vscode-server/extensions/ms-python.debugpy-2024.2.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydev_bundle/pydev_monkey.py", line 232, in _on_forked_process
    pydevd.settrace_forked(setup_tracing=setup_tracing)
  File "/global/u2/c/chengzhu/.vscode-server/extensions/ms-python.debugpy-2024.2.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/pydevd.py", line 3134, in settrace_forked
    settrace(
  File "/global/u2/c/chengzhu/.vscode-server/extensions/ms-python.debugpy-2024.2.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/pydevd.py", line 2821, in settrace
    _locked_settrace(
  File "/global/u2/c/chengzhu/.vscode-server/extensions/ms-python.debugpy-2024.2.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/pydevd.py", line 2902, in _locked_settrace
    py_db.connect(host, port)  # Note: connect can raise error.
  File "/global/u2/c/chengzhu/.vscode-server/extensions/ms-python.debugpy-2024.2.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/pydevd.py", line 1421, in connect
    s = start_client(host, port)
  File "/global/u2/c/chengzhu/.vscode-server/extensions/ms-python.debugpy-2024.2.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_comm.py", line 493, in start_client
    s.connect((host, port))
TimeoutError: timed out
80.67s - Could not connect to 127.0.0.1: 49425

chengzhuzhang · 2024-03-22T22:57:01Z

Current results with one variable: https://portal.nersc.gov/cfs/e3sm/cdat-migration-fy24/669-annual_cycle_zonal_mean/viewer/

Other TODO items:

refine axis config for plot
fix viewer
Verify all variable runs

tomvothecoder · 2024-04-11T18:28:28Z

Basic driver and plotting scripts are working. Through only with MULTIPROCESSING=FALSE, if switching it on, I hit errors as follows. It appears from ds = xc.open_mfdataset(**args) which is newly added to ready multi-months data to concatenate into annual cycle time series.

Traceback (most recent call last):
  File "/global/u2/c/chengzhu/.vscode-server/extensions/ms-python.debugpy-2024.2.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_comm.py", line 493, in start_client
    s.connect((host, port))
TimeoutError: timed out
2024-03-22 15:14:20,314 [ERROR]: core_parameter.py(_run_diag:341) >> Error in e3sm_diags.driver.annual_cycle_zonal_mean_driver
Traceback (most recent call last):
  File "/global/homes/c/chengzhu/e3sm_diags/e3sm_diags/parameter/core_parameter.py", line 338, in _run_diag
    single_result = module.run_diag(self)
  File "/global/homes/c/chengzhu/e3sm_diags/e3sm_diags/driver/annual_cycle_zonal_mean_driver.py", line 68, in run_diag
    ds_test = test_ds.get_climo_dataset(var_key, "ANNUALCYCLE")
  File "/global/homes/c/chengzhu/e3sm_diags/e3sm_diags/driver/utils/dataset_xr.py", line 365, in get_climo_dataset
    ds = self._get_climo_dataset(season)
  File "/global/homes/c/chengzhu/e3sm_diags/e3sm_diags/driver/utils/dataset_xr.py", line 393, in _get_climo_dataset
    ds = self._open_annual_cycle_climo_dataset(filepath)
  File "/global/homes/c/chengzhu/e3sm_diags/e3sm_diags/driver/utils/dataset_xr.py", line 425, in _open_annual_cycle_climo_dataset
    ds = xc.open_mfdataset(**args)
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xcdat/dataset.py", line 277, in open_mfdataset
    ds = xr.open_mfdataset(
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/backends/api.py", line 1053, in open_mfdataset
    combined = combine_by_coords(
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/core/combine.py", line 958, in combine_by_coords
    concatenated_grouped_by_data_vars = tuple(
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/core/combine.py", line 959, in <genexpr>
    _combine_single_variable_hypercube(
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/core/combine.py", line 630, in _combine_single_variable_hypercube
    concatenated = _combine_nd(
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/core/combine.py", line 232, in _combine_nd
    combined_ids = _combine_all_along_first_dim(
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/core/combine.py", line 267, in _combine_all_along_first_dim
    new_combined_ids[new_id] = _combine_1d(
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/core/combine.py", line 290, in _combine_1d
    combined = concat(
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/core/concat.py", line 252, in concat
    return _dataset_concat(
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/core/concat.py", line 526, in _dataset_concat
    merged_vars, merged_indexes = merge_collected(
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/core/merge.py", line 290, in merge_collected
    merged_vars[name] = unique_variable(
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/core/merge.py", line 137, in unique_variable
    out = out.compute()
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/core/variable.py", line 547, in compute
    return new.load(**kwargs)
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/core/variable.py", line 520, in load
    loaded_data, *_ = chunkmanager.compute(self._data, **kwargs)
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/core/daskmanager.py", line 70, in compute
    return compute(*data, **kwargs)
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/dask/base.py", line 628, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/multiprocessing/context.py", line 281, in _Popen
    return Popen(process_obj)
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/multiprocessing/popen_fork.py", line 66, in _launch
    self.pid = os.fork()
  File "/global/u2/c/chengzhu/.vscode-server/extensions/ms-python.debugpy-2024.2.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydev_bundle/pydev_monkey.py", line 956, in new_fork
    _on_forked_process(setup_tracing=apply_arg_patch and not is_subprocess_fork)
  File "/global/u2/c/chengzhu/.vscode-server/extensions/ms-python.debugpy-2024.2.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydev_bundle/pydev_monkey.py", line 232, in _on_forked_process
    pydevd.settrace_forked(setup_tracing=setup_tracing)
  File "/global/u2/c/chengzhu/.vscode-server/extensions/ms-python.debugpy-2024.2.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/pydevd.py", line 3134, in settrace_forked
    settrace(
  File "/global/u2/c/chengzhu/.vscode-server/extensions/ms-python.debugpy-2024.2.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/pydevd.py", line 2821, in settrace
    _locked_settrace(
  File "/global/u2/c/chengzhu/.vscode-server/extensions/ms-python.debugpy-2024.2.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/pydevd.py", line 2902, in _locked_settrace
    py_db.connect(host, port)  # Note: connect can raise error.
  File "/global/u2/c/chengzhu/.vscode-server/extensions/ms-python.debugpy-2024.2.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/pydevd.py", line 1421, in connect
    s = start_client(host, port)
  File "/global/u2/c/chengzhu/.vscode-server/extensions/ms-python.debugpy-2024.2.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_comm.py", line 493, in start_client
    s.connect((host, port))
TimeoutError: timed out
80.67s - Could not connect to 127.0.0.1: 49425

This issue seems to be related to these:

Performance
- Related Xarray issue: slow performance with open_mfdataset pydata/xarray#1385 (comment)
- Related xCDAT issue: [Enhancement]: Update coords="minimal" and compat="minimal" as defaults to improve performance of xc.open_mfdataset()? xCDAT/xcdat#641
- Workaround: use data_var="minimal", "coords="minimal" and compat=override
Conflicts with multiprocessing scheduler using context of fork when calling to_netcdf()
- to_netcdf() doesn't work with multiprocessing scheduler pydata/xarray#3781
- e3sm_diags/e3sm_diags/e3sm_diags_driver.py
  
  Lines 304 to 305 in f6c4fdf
  
  bag = db.from_sequence(parameters)
  
  config = {"scheduler": "processes", "multiprocessing.context": "fork"}
- Workaround: If using open_mfdataset(), call .load(scheduler="sync")

I'm currently debugging and will push fixes.

tomvothecoder

This commit fixes the multiprocessing=True TimeoutError issue in this comment.

RE: #798 (comment)

Current results with one variable: portal.nersc.gov/cfs/e3sm/cdat-migration-fy24/669-annual_cycle_zonal_mean/viewer

Other TODO items:
* refine axis config for plot

* fix viewer

* Verify all variable runs

I think the only remaining items are the last two bullets.

tomvothecoder · 2024-04-11T20:24:01Z

e3sm_diags/driver/utils/dataset_xr.py

+        # NOTE: This GitHub issue explains why the "coords" and "compat" args
+        # are defined as they are below: https://github.com/xCDAT/xcdat/issues/641
+        args = {
+            "paths": filepath,
+            "decode_times": False,
+            "add_bounds": ["X", "Y"],
+            "coords": "minimal",
+            "compat": "override",
+            "chunks": "auto",
+        }


Notable change. I am going to remove "chunks": "auto" because we end up loading the dataset into memory. This means downstream computational operations are all serial within the single process.

tomvothecoder · 2024-04-11T20:24:09Z

e3sm_diags/driver/utils/dataset_xr.py

+        # NOTE: There seems to be an issue with `open_mfdataset()` and
+        # using the multiprocessing scheduler defined in e3sm_diags,
+        # resulting in timeouts and resource locking.
+        # To avoid this, we load the multi-file dataset into memory before
+        # performing downstream operations.
+        # Related GH issue: https://github.com/pydata/xarray/issues/3781
+        ds.load(scheduler="sync")
+


Notable change.

tomvothecoder · 2024-04-11T20:24:27Z

e3sm_diags/plot/annual_cycle_zonal_mean_plot.py

+    # --------------------------------------------------------------------------
+    plt.xticks(time, X_TICKS)
+    lat_formatter = LatitudeFormatter()  # type: ignore
+    ax.yaxis.set_major_formatter(lat_formatter)
+    ax.tick_params(labelsize=8.0, direction="out", width=1)
+    ax.xaxis.set_ticks_position("bottom")
+    ax.yaxis.set_ticks_position("left")


I added this block of code from the old plotter because it was missing here.

I think it fixes the "refine axis config for plot" todo item in this comment.

tomvothecoder · 2024-04-11T20:25:00Z

e3sm_diags/driver/annual_cycle_zonal_mean_driver.py

        _save_data_metrics_and_plots(
            parameter,
            plot_func,
            var_key,
            test_zonal_mean.to_dataset(),
            ref_zonal_mean.to_dataset(),
            diff,
-            metrics_dict={},
+            metrics_dict=None,


metrics_dict can be set to None after removing the metrics_dict arg from the plot function.

tomvothecoder · 2024-04-11T20:25:13Z

e3sm_diags/plot/annual_cycle_zonal_mean_plot.py

@@ -36,7 +34,6 @@ def plot(
    da_test: xr.DataArray,
    da_ref: xr.DataArray,
    da_diff: xr.DataArray,
-    metrics_dict: MetricsDict,


Removed unused metrics_dict arg.

tomvothecoder · 2024-04-11T20:29:09Z

e3sm_diags/driver/utils/dataset_xr.py

-    def _open_annual_cycle_climo_dataset(self, filepath: str) -> xr.Dataset:
-        """Open 12 monthly mean climatology dataset.
-
-        Parameters
-        ----------
-        filepath : str
-            The path to the climatology datasets.
-        """
-        args = {"paths": filepath, "decode_times": False, "add_bounds": ["X", "Y"]}
-        ds = xc.open_mfdataset(**args)
-        return ds
-


I also replaced _open_annual_cycle_climo_dataset() with an updated version of _open_climo_dataset() that supports multi-file datasets.

tomvothecoder · 2024-07-03T16:15:05Z

@chengzhuzhang you can pick this set back up. I did not make any progress since our last meeting on 4/15/24 (notes). Specifically, there is still a problem related to:

multiprocessing = True threw timeout error, fixed by loading multi-file dataset into memory (conflicts with Dask multiprocessing scheduler)

chengzhuzhang · 2024-07-10T19:34:00Z

viewer is fixed in 322
I can confirm with mutiprocessing on, it still ran into error:

2024-07-10 11:27:10,547 [ERROR]: run.py(run_diags:91) >> Error traceback:
Traceback (most recent call last):
  File "/global/u2/c/chengzhu/e3sm_diags/e3sm_diags/run.py", line 89, in run_diags
    params_results = main(params)
  File "/global/u2/c/chengzhu/e3sm_diags/e3sm_diags/e3sm_diags_driver.py", line 371, in main
    parameters_results = _run_with_dask(parameters)
  File "/global/u2/c/chengzhu/e3sm_diags/e3sm_diags/e3sm_diags_driver.py", line 316, in _run_with_dask
    results = bag.map(CoreParameter._run_diag).compute(num_workers=num_workers)
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/dask/base.py", line 342, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/dask/base.py", line 628, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

A full run with all variables running in series also stopped midway
Errors also occur with 3 variables which are data specific:

2024-07-10 12:29:55,272 [INFO]: annual_cycle_zonal_mean_driver.py(run_diag:56) >> Variable: SCO
2024-07-10 12:30:46,299 [INFO]: annual_cycle_zonal_mean_driver.py(_run_diags_annual_cycle:124) >> Selected region: global
2024-07-10 12:30:50,654 [ERROR]: core_parameter.py(_run_diag:341) >> Error in e3sm_diags.driver.annual_cycle_zonal_mean_driver
TypeError: float() argument must be a string or a real number, not 'tuple'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/global/u2/c/chengzhu/e3sm_diags/e3sm_diags/parameter/core_parameter.py", line 338, in _run_diag
    single_result = module.run_diag(self)
  File "/global/u2/c/chengzhu/e3sm_diags/e3sm_diags/driver/annual_cycle_zonal_mean_driver.py", line 76, in run_diag
    _run_diags_annual_cycle(
  File "/global/u2/c/chengzhu/e3sm_diags/e3sm_diags/driver/annual_cycle_zonal_mean_driver.py", line 142, in _run_diags_annual_cycle
    test_zonal_mean = test_zonal_mean.sel(lat=(-60, 60))
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/core/dataarray.py", line 1617, in sel
    ds = self._to_temp_dataset().sel(
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/core/dataset.py", line 3074, in sel
    query_results = map_index_queries(
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/core/indexing.py", line 193, in map_index_queries
    results.append(index.sel(labels, **options))
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/core/indexes.py", line 748, in sel
    label_array = normalize_label(label, dtype=self.coord_dtype)
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/core/indexes.py", line 545, in normalize_label
    value = np.asarray(value, dtype=dtype)
ValueError: setting an array element with a sequence.
2024-07-10 12:30:50,730 [INFO]: annual_cycle_zonal_mean_driver.py(run_diag:56) >> Variable: TCO
2024-07-10 12:31:24,528 [INFO]: annual_cycle_zonal_mean_driver.py(_run_diags_annual_cycle:124) >> Selected region: global
2024-07-10 12:31:26,916 [ERROR]: core_parameter.py(_run_diag:341) >> Error in e3sm_diags.driver.annual_cycle_zonal_mean_driver
TypeError: float() argument must be a string or a real number, not 'tuple'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/global/u2/c/chengzhu/e3sm_diags/e3sm_diags/parameter/core_parameter.py", line 338, in _run_diag
    single_result = module.run_diag(self)
  File "/global/u2/c/chengzhu/e3sm_diags/e3sm_diags/driver/annual_cycle_zonal_mean_driver.py", line 76, in run_diag
    _run_diags_annual_cycle(
  File "/global/u2/c/chengzhu/e3sm_diags/e3sm_diags/driver/annual_cycle_zonal_mean_driver.py", line 142, in _run_diags_annual_cycle
    test_zonal_mean = test_zonal_mean.sel(lat=(-60, 60))
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/core/dataarray.py", line 1617, in sel
    ds = self._to_temp_dataset().sel(
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/core/dataset.py", line 3074, in sel
    query_results = map_index_queries(
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/core/indexing.py", line 193, in map_index_queries
    results.append(index.sel(labels, **options))
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/core/indexes.py", line 748, in sel
    label_array = normalize_label(label, dtype=self.coord_dtype)
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/core/indexes.py", line 545, in normalize_label
    value = np.asarray(value, dtype=dtype)
ValueError: setting an array element with a sequence.
2024-07-10 12:31:26,916 [INFO]: annual_cycle_zonal_mean_driver.py(run_diag:56) >> Variable: SST
2024-07-10 12:32:03,765 [INFO]: annual_cycle_zonal_mean_driver.py(_run_diags_annual_cycle:124) >> Selected region: global
2024-07-10 12:32:06,626 [INFO]: io.py(_write_to_netcdf:134) >> 'SST' test variable output saved in: /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/669-annual_cycle_zonal_mean/annual_cycle_zonal_mean/SST_CL_HadISST/HadISST_CL-SST-ANNUALCYCLE-global_test.nc
2024-07-10 12:32:06,778 [INFO]: io.py(_write_to_netcdf:134) >> 'SST' ref variable output saved in: /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/669-annual_cycle_zonal_mean/annual_cycle_zonal_mean/SST_CL_HadISST/HadISST_CL-SST-ANNUALCYCLE-global_ref.nc
2024-07-10 12:32:06,783 [INFO]: io.py(_write_to_netcdf:134) >> 'SST' diff variable output saved in: /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/669-annual_cycle_zonal_mean/annual_cycle_zonal_mean/SST_CL_HadISST/HadISST_CL-SST-ANNUALCYCLE-global_diff.nc
2024-07-10 12:32:06,783 [INFO]: io.py(_save_data_metrics_and_plots:66) >> Metrics saved in /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/669-annual_cycle_zonal_mean/annual_cycle_zonal_mean/SST_CL_HadISST/HadISST_CL-SST-ANNUALCYCLE-global.json
2024-07-10 12:32:07,551 [ERROR]: core_parameter.py(_run_diag:341) >> Error in e3sm_diags.driver.annual_cycle_zonal_mean_driver
Traceback (most recent call last):
  File "/global/u2/c/chengzhu/e3sm_diags/e3sm_diags/parameter/core_parameter.py", line 338, in _run_diag
    single_result = module.run_diag(self)
  File "/global/u2/c/chengzhu/e3sm_diags/e3sm_diags/driver/annual_cycle_zonal_mean_driver.py", line 76, in run_diag
    _run_diags_annual_cycle(
  File "/global/u2/c/chengzhu/e3sm_diags/e3sm_diags/driver/annual_cycle_zonal_mean_driver.py", line 167, in _run_diags_annual_cycle
    _save_data_metrics_and_plots(
  File "/global/u2/c/chengzhu/e3sm_diags/e3sm_diags/driver/utils/io.py", line 81, in _save_data_metrics_and_plots
    plot_func(*args)
  File "/global/u2/c/chengzhu/e3sm_diags/e3sm_diags/plot/annual_cycle_zonal_mean_plot.py", line 67, in plot
    _add_colormap(
  File "/global/u2/c/chengzhu/e3sm_diags/e3sm_diags/plot/annual_cycle_zonal_mean_plot.py", line 112, in _add_colormap
    var = var.transpose("lat", "time")
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/core/dataarray.py", line 3022, in transpose
    dims = tuple(utils.infix_dims(dims, self.dims, missing_dims))
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/core/utils.py", line 814, in infix_dims
    existing_dims = drop_missing_dims(dims_supplied, dims_all, missing_dims)
  File "/global/cfs/cdirs/e3sm/zhang40/conda_envs/e3sm_diags_dev_654_zonal_mean_xy/lib/python3.10/site-packages/xarray/core/utils.py", line 906, in drop_missing_dims
    raise ValueError(
ValueError: Dimensions {'lat'} do not exist. Expected one or more of ('time', 'latitude')

chengzhuzhang · 2024-07-10T22:07:19Z

When set multi-processing=True, error concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending. remains. Without loading dataset i.e. ds.load(scheduler="sync"), it has a TimeoutError.

chengzhuzhang · 2024-07-10T22:08:48Z

e3sm_diags/plot/annual_cycle_zonal_mean_plot.py

@@ -109,7 +109,8 @@ def _add_colormap(
    # Add the contour plot
    # --------------------------------------------------------------------------
    ax = fig.add_axes(DEFAULT_PANEL_CFG[subplot_num], projection=None)
-    var = var.transpose("lat", "time")
+    # var = var.transpose("lat", "time")
+    var = var.transpose(var.dims[1], var.dims[0])


One SST data set has "latitude" instead of "lat" as dimension name. The code change voided using the dimension name explicitly.

chengzhuzhang · 2024-07-12T00:56:08Z

More update: The TimeOut error came from driver/utils/regrid.py

   ds_a_regrid = ds_a_new.regridder.horizontal(
        var_key, output_grid, tool=tool, method=method
    )

chengzhuzhang · 2024-07-12T23:42:04Z

e3sm_diags/driver/utils/dataset_xr.py

@@ -413,6 +413,14 @@ def _get_climo_dataset(self, season: str) -> xr.Dataset:
        # ds = ds[[self.var, 'lat_bnds', 'lon_bnds']]
        ds = ds[[self.var] + keep_bnds]


This line only keeps variable after derivation and bounds related data variables. It helps to remove excessive data for alleviating memory usage. This change in dataset_xr.py may affect other sets.

chengzhuzhang · 2024-07-12T23:44:21Z

e3sm_diags/driver/utils/dataset_xr.py

+        # To avoid this, we load the multi-file dataset into memory before
+        # performing downstream operations.
+        # Related GH issue: https://github.com/pydata/xarray/issues/3781
+        ds.load(scheduler="sync")


This is needed to resolve conflicts between multiprocessing and dask. Though we need it to be here to only load variables needed, otherwise, out of memory still occur as @tomvothecoder noted.

chengzhuzhang · 2024-07-12T23:55:31Z

f0a80c9 and 15811b8 together address the the error:
multiprocessing = True threw timeout error or concurrent.futures.process.BrokenProcessPool. It is fixed by first reduce dataset size and then load multi-file dataset into memory (the later is because Python's multiprociessing conflicts with Dask multiprocessing scheduler)
Regression testing caught an error in main which is addressed by fix save_ncfiles for annual_cycle_zonal_mean #822
The regression results mostly matched except for the AODVIS variable, which the development branch use test data as both test and ref plots.

e3sm_diags/driver/utils/dataset_xr.py

…igned time coords - Update `annual_cycle_zonal_mean_plot.py` to convert time coordinates to month integers

tomvothecoder · 2024-07-18T22:27:20Z

@tomvothecoder Thank you for troubleshooting. I was testing both datasets for the same variable, but the example 2 dataset should be retired (I replaced this dataset in lat_lon, but missed this instance in annual_cycle_zonal_mean). As you pointed out that the created datasets is correct, the first time step gives March mean. We could have a fix to align time (that should fix the plot which have x axis/ticks start from January). Since this dataset is retired, I think we should just focus on example 1 for now. (I should remember to update the main branch with new data in .cfg)

I just pushed a fix to issue 1 in this commit: 159cdf5 (#798).

It involves setting decode_times=True to properly concatenate time coordinates. I found that no downstream operations are affected with this change except the annual_cycle_zonal_mean plotter which uses the time coordinates for plotting. I had to update the plotter to extract the months to use as X axis values.

Also, I updated the comment above describing how CDAT replaces time coordinates with month integers in _create_annual_cycle() as a workaround to this issue.

chengzhuzhang · 2024-07-19T21:13:51Z

@tomvothecoder When testing with decode_times = False, I found that for example one, the decoded_time is just not right. For instance, for the January mean climatology file, the time was decoded as time (time) object 2000-07-02 00:30:00. I also found that time variant units is standard for ncclimo generated climatology files for model and obs data. Not sure why MERRA2_Aerosols stands out..

tomvothecoder · 2024-07-22T16:31:35Z

@tomvothecoder When testing with decode_times = False, I found that for example one, the decoded_time is just not right. For instance, for the January mean climatology file, the time was decoded as time (time) object 2000-07-02 00:30:00. I also found that time variant units is standard for ncclimo generated climatology files for model and obs data. Not sure why MERRA2_Aerosols stands out..

Did you mean decode_times=True? If so, I will take a closer look.

chengzhuzhang · 2024-07-22T16:34:16Z

Did you mean decode_times=True? If so, I will take a closer look.

Yes!
I think the climatology data can't be decoded correctly by cftime somehow.

tomvothecoder · 2024-07-22T19:45:47Z

@tomvothecoder When testing with decode_times = False, I found that for example one, the decoded_time is just not right. For instance, for the January mean climatology file, the time was decoded as time (time) object 2000-07-02 00:30:00. I also found that time variant units is standard for ncclimo generated climatology files for model and obs data. Not sure why MERRA2_Aerosols stands out..

I verified that cftime is decoding the time coordinates correctly. The issue is that the raw time coordinates are not correct relative to the "units" attribute (10782720, 'minutes since 1980-01-01 00:30:00'). The time axis is also missing the "calendar" attribute, with "standard" being subbed in as the default.

I don't think this was caught in the CDAT codebase because the _create_annual_cycle() function avoids this issue by opening each dataset individually, replacing the time coordinate with the month integer, then concatenating the datasets into a single dataset along the time axis.

Although I'm not a fan of a custom I/O function to handle data quality issues, we have to implement a function similar to _create_annual_cycle() as a workaround for this specific case.

`cftime` decoding -- `cftime.DatetimeGregorian(2000, 7, 2, 0, 30, 0, 0, has_year_zero=False)`

from glob import glob

import cftime
import xcdat as xc

args = {
    "add_bounds": ["X", "Y"],
    "coords": "minimal",
    "compat": "override",
    "chunks": "auto",
}

filepath = "/global/cfs/cdirs/e3sm/diagnostics/observations/Atm/climatology/MERRA2_Aerosols/MERRA2_Aerosols_[0-1][0-9]_*climo.nc"
paths = sorted(glob(filepath))

# filepath 1: '/global/cfs/cdirs/e3sm/diagnostics/observations/Atm/climatology/MERRA2_Aerosols/MERRA2_Aerosols_01_198001_202101_climo.nc'
ds_raw_time = xc.open_mfdataset(paths[0], **args, decode_times=False)

# 10782720
time_int = ds_raw_time.time.values.item()
# 'minutes since 1980-01-01 00:30:00'
units = ds_raw_time.time.units
# None, so "standard"
calendar = ds_raw_time.time.attrs.get("calendar", "standard")

# cftime.DatetimeGregorian(2000, 7, 2, 0, 30, 0, 0, has_year_zero=False)
cftime.num2date(time_int, units, calendar=calendar)

`datetime.datetime` decoding -- `datetime.datetime(2000, 7, 2, 0, 30)`

import datetime

first_step = datetime.datetime(1980, 1, 1, hour=0, minute=30)
time_delta = datetime.timedelta(minutes=10782720)

# datetime.datetime(2000, 7, 2, 0, 30)
print(first_step + time_delta)

tomvothecoder · 2024-07-22T19:57:03Z

Although I'm not a fan of a custom I/O function to handle data quality issues, we have to implement a function similar to _create_annual_cycle() as a workaround for this specific case.

Actually, the easier thing to do is to ignore the decoded time values since they aren't used and to assume the order is 1-12 (Jan to Dec) like what the CDAT code does. The main caveat is that the time coordinates must be in ascending order, which they are when opening the datasets in Xarray/xCDAT with decode_times=True.

The only change needed is to update time_months in the plotter to range(1, 13).

e3sm_diags/e3sm_diags/plot/annual_cycle_zonal_mean_plot.py

Lines 103 to 108 in 784404b

    
           # Make sure the months are in order to cover cases where the climatology 
        
           # spans more than 1 year, resulting in months being out of order. 
        
           # e.g., [3, 4, 5,...1, 2] -> [1,2,3, 4, 5,...] 
        
           time_months = sorted([t.dt.month for t in time]) 
        
           var = var.squeeze()

chengzhuzhang · 2024-07-22T23:24:47Z

@tomvothecoder I was searching some code example that reads data using open_mfdatasets with specifying order of files: https://stackoverflow.com/questions/75241585/using-xarrays-open-mfdataset-to-open-a-series-of-nc-files

ds = xarray.open_mfdataset(
    [f'{i}.nc' for i in range(10)],
    concat_dim=[
        pd.Index(np.arange(10), name="new_dim"),
    ],
    combine="nested",

)

Though I think your solution actually works okay, given that decode_times=True actually had time coordinate in ascend order (even though the decoded month value doesn't match with the actually climatology month. Update time_months in the plotter to range(1, 13), again put back the correct month index. I will do another regression test to confirm.

chengzhuzhang · 2024-07-23T19:12:51Z

@tomvothecoder I'm retesting this set will all variables, and realize that the memory issue came back. Then I tested again with the commit which resolved the memory issue (15811b8). No errors. Some changes between (f2c3568) and 15811b8 brought back the issue. I doubted the decode_times is the cause though.

tomvothecoder · 2024-07-23T22:09:33Z

@tomvothecoder I'm retesting this set will all variables, and realize that the memory issue came back. Then I tested again with the commit which resolved the memory issue (15811b8). No errors. Some changes between (f2c3568) and 15811b8 brought back the issue. I doubted the decode_times is the cause though.

Besides the recent plotter update, decode_times=True is the only other change from commit 159cdf5 (#798). Maybe decoding times is introducing an overhead, although it should be lazy in xCDAT. Also if climatology files are being used, the number of time coordinates to decode should be minimal. More debugging needed here.

chengzhuzhang · 2024-07-24T02:50:11Z

To change back decode_times = False did not help. And sadly, some git history was emptied with a few force-pushs. I tried to revert to recent commits, the concurrent.futures.process.BrokenProcessPool: always occur. I kind of running out of debugging method.

chengzhuzhang · 2024-07-25T19:44:20Z

Not sure the best solution to continue troubleshooting, after ruling out the args change for open_mfdataset. But what I did is to swap the dataset_xr.py from commit 15811b8 into latest code. (I do need to edit slightly to make the code work, i.e. change CLIMO_FREQ to Climo_Freq). No memory issue. At least it narrows down the problematic file, and I suspect some changes made in other PRs being merged introduced memory problem. I'm stepping through the differs to see what might be the cause.

The file diff for dataset_xr.py is here https://www.diffchecker.com/mTw8AWif/

- Due to incorrectly updating `keep_bnds` logic - Add `_encode_time_coords()` to workaround cftime issue `ValueError: "months since" units only allowed for "360_day" calendar`

tomvothecoder · 2024-07-25T19:49:24Z

I was actually in the middle of debugging here with my comment. I resolved the multiprocessing issue, it was my fault :(

Issues I resolved in f9a9ea7 (#798)

Slow .load() performance and sometimes multiprocessing issue (concurrent.futures.process.BrokenProcessPool)
- Root cause: My mistake here and sorry for removing git history with rebasing. I accidentally committed incorrect logic for keep_bnds = [var for var in all_vars if "bnd" or "bounds" in var] which kept all variables in the dataset before .load().
- Solution: Update keep_bnds = [var for var in all_vars if "bnd" in var or "bounds" in var]
With decode_times=True, I get ValueError: 'months since' units only allowed for '360_day' calendar for the TCO and SCO reference variables when writing out to netCDF
- Root cause: The source dataset ('/global/cfs/cdirs/e3sm/diagnostics/observations/Atm/climatology/OMI-MLS/OMI-MLS_01_200501_201701_climo.nc') has the units 'months since 2005-01-01 00:00:00' and is missing the "calendar" attribute ("standard" is used as a default). Once again, the CDAT code does not run into this issue because it replaces time coordinates with month integers.
- Solution: Added _encode_time_coords() to driver to encode time coordinates to month integers

tomvothecoder · 2024-07-25T19:53:47Z

I re-ran the regression test notebook with the latest commit. I am still getting the following diffs:

AODVIS

Comparing:
/global/cfs/cdirs/e3sm/www/cdat-migration-fy24/669-annual_cycle_zonal_mean-debug/annual_cycle_zonal_mean/AOD_550/AOD_550-AODVIS-ANNUALCYCLE-global_ref.nc 
 /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/main/annual_cycle_zonal_mean/AOD_550/AOD_550-AODVIS-Annual-Cycle_test.nc
AODVIS
var_key AODVIS

Not equal to tolerance rtol=1e-05, atol=0

Mismatched elements: 1808 / 2160 (83.7%)
Max absolute difference: 0.12250582
Max relative difference: 91.14554689
 x: array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],...
 y: array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],...

ALBEDO -- It just looks like `np.inf` is being used in xCDAT while `np.nan` is used with CDAT. I recall this happening in other regression tests. Replacing `np.inf` with `np.nan` resolves this issue and vice versa.

Comparing:
/global/cfs/cdirs/e3sm/www/cdat-migration-fy24/669-annual_cycle_zonal_mean-debug/annual_cycle_zonal_mean/CERES-EBAF-TOA-v4.1/ceres_ebaf_toa_v4.1-ALBEDO-ANNUALCYCLE-global_ref.nc 
 /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/main/annual_cycle_zonal_mean/CERES-EBAF-TOA-v4.1/ceres_ebaf_toa_v4.1-ALBEDO-Annual-Cycle_test.nc
ALBEDO
var_key ALBEDO

Not equal to tolerance rtol=1e-05, atol=0

x and y nan location mismatch:
 x: array([[0.69877 , 0.695266, 0.68627 , ...,      inf,      inf,      inf],
       [0.712032, 0.706896, 0.69354 , ...,      inf,      inf,      inf],
       [0.765447, 0.743142, 0.738787, ..., 0.752918, 0.751204, 0.833122],...
 y: array([[0.69877 , 0.695266, 0.68627 , ...,      nan,      nan,      nan],
       [0.712033, 0.706896, 0.69354 , ...,      nan,      nan,      nan],
       [0.765447, 0.743142, 0.738787, ..., 0.752918, 0.751204, 0.833123],...

chengzhuzhang · 2024-07-25T20:01:57Z

@tomvothecoder this is big relief! I skimed through the file several times and noticed the changed line keep_bnds = [var for var in all_vars if "bnd" or "bound" in var], but was not careful enough to catch the problem! No worries about the variable AODVIS. I will update the .cfg file to replace this obs source with two new data source.

tomvothecoder · 2024-07-25T20:14:25Z

I added a debug script for AODVIS that compares the max, min, sum, and mean. All of the values look close.

I think the max relative diff is large because the values are close to 0.

import numpy as np
import xcdat as xc

dev_path = "/global/cfs/cdirs/e3sm/www/cdat-migration-fy24/669-annual_cycle_zonal_mean-debug/annual_cycle_zonal_mean/AOD_550/AOD_550-AODVIS-ANNUALCYCLE-global_ref.nc"
main_path = "/global/cfs/cdirs/e3sm/www/cdat-migration-fy24/main/annual_cycle_zonal_mean/AOD_550/AOD_550-AODVIS-Annual-Cycle_test.nc"


var_a = xc.open_dataset(dev_path)["AODVIS"]
var_b = xc.open_dataset(main_path)["AODVIS"]

"""
Floating point comparison

AssertionError:
Not equal to tolerance rtol=1e-07, atol=0

Mismatched elements: 1808 / 2160 (83.7%)
Max absolute difference: 0.12250582
Max relative difference: 91.14554689
 x: array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],...
 y: array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],...
"""
np.testing.assert_allclose(var_a, var_b)

# Get the max of all values
# -------------------------
# 0.28664299845695496
print(var_a.max().item())
# 0.2866430557436412
print(var_b.max().item())

# Get the min of all values
# -------------------------
# 0.0
print(var_a.min().item())
# 0.0
print(var_b.min().item())

# Get the sum of all values
# -------------------------
# 224.2569122314453
print(var_a.sum().item())
# 224.25691348856003
print(var_b.sum().item())

# Get the mean of all values
# -------------------------
# 0.10382264107465744
print(var_a.mean().item())
# 0.1038226451335926
print(var_b.mean().item())


# %%
# Get the max absolute diff
# -------------------------
# 0.12250582128763199
print((var_a - var_b).max().item())

chengzhuzhang · 2024-07-25T20:38:17Z

I think the max relative diff is large because the values are close to 0.

yeah, the values and metrics look all very close. Based on the plots i saw earlier, months were off.
Anyway based on the comments from #624 I retired AODVIS from MACv1 in lat-lon, but missed the annual_cycle_zonal_mean set. I made the update 7ba0900.

chengzhuzhang · 2024-07-25T20:39:03Z

@tomvothecoder I think we can merge after CI/CD test is completed!

chengzhuzhang force-pushed the refactor/669-annual_cycle_zonal_mean branch from 7bc2657 to ebe73f1 Compare March 22, 2024 19:57

chengzhuzhang added the cdat-migration-fy24 CDAT Migration FY24 Task label Mar 22, 2024

tomvothecoder mentioned this pull request Apr 11, 2024

[Enhancement]: Update coords="minimal" and compat="minimal" as defaults to improve performance of xc.open_mfdataset()? xCDAT/xcdat#641

Open

tomvothecoder force-pushed the refactor/669-annual_cycle_zonal_mean branch from ebe73f1 to 629b8e3 Compare April 11, 2024 20:20

tomvothecoder reviewed Apr 11, 2024

View reviewed changes

tomvothecoder force-pushed the refactor/669-annual_cycle_zonal_mean branch from 1de9b95 to ceba30c Compare April 11, 2024 21:37

tomvothecoder force-pushed the cdat-migration-fy24 branch from f6c4fdf to 1e1ab90 Compare May 1, 2024 17:10

tomvothecoder added this to the FY24 Q3 (04/01/24 - 06/30/24) milestone Jun 3, 2024

tomvothecoder assigned chengzhuzhang Jul 3, 2024

tomvothecoder modified the milestones: FY24 Q3 (04/01/24 - 06/30/24), FY24 Q4 (07/01/24 - 9/30/24) Jul 3, 2024

chengzhuzhang commented Jul 10, 2024

View reviewed changes

chengzhuzhang commented Jul 12, 2024

View reviewed changes

tomvothecoder mentioned this pull request Jul 15, 2024

[WIP] add aerosol spec mixing ratio and emission E3SM-Project/e3sm_to_cmip#262

Open

tomvothecoder force-pushed the refactor/669-annual_cycle_zonal_mean branch from f1dc8eb to 2cf65c4 Compare July 15, 2024 20:41

tomvothecoder changed the base branch from cdat-migration-fy24 to main July 15, 2024 20:42

tomvothecoder changed the base branch from main to cdat-migration-fy24 July 15, 2024 20:42

tomvothecoder force-pushed the cdat-migration-fy24 branch from bf28fe2 to 8b90a38 Compare July 15, 2024 20:45

tomvothecoder force-pushed the refactor/669-annual_cycle_zonal_mean branch 2 times, most recently from 2c188cd to befef87 Compare July 15, 2024 20:53

tomvothecoder reviewed Jul 15, 2024

View reviewed changes

e3sm_diags/driver/utils/dataset_xr.py Outdated Show resolved Hide resolved

Update _open_climo_dataset() to decode times as workaround to misal…

159cdf5

…igned time coords - Update `annual_cycle_zonal_mean_plot.py` to convert time coordinates to month integers

tomvothecoder added 2 commits July 18, 2024 15:38

Fix unit tests

53d0f4c

Remove old plotter

784404b

Add script to debug decode_times=True and ncclimo file

537146a

Update plotter time values to month integers

f2c3568

Fix slow .load() and multiprocessing issue

f9a9ea7

- Due to incorrectly updating `keep_bnds` logic - Add `_encode_time_coords()` to workaround cftime issue `ValueError: "months since" units only allowed for "360_day" calendar`

Update _encode_time_coords() docstring

44a6758

Add AODVIS debug script

731c212

update AODVIS obs datasets; regression test results

7ba0900

chengzhuzhang merged commit 8a8dafa into cdat-migration-fy24 Jul 25, 2024
2 of 4 checks passed

chengzhuzhang deleted the refactor/669-annual_cycle_zonal_mean branch July 25, 2024 21:31

This was referenced Jul 25, 2024

CDAT Migration Phase 2: Refactor annual_cycle_zonal_mean set #669

Closed

Update cfg files #830

Merged

		@@ -413,6 +413,14 @@ def _get_climo_dataset(self, season: str) -> xr.Dataset:
		# ds = ds[[self.var, 'lat_bnds', 'lon_bnds']]
		ds = ds[[self.var] + keep_bnds]

CDAT Migration: Refactor annual_cycle_zonal_mean set #798

CDAT Migration: Refactor annual_cycle_zonal_mean set #798

Conversation

chengzhuzhang commented Mar 8, 2024

Description

Checklist

chengzhuzhang commented Mar 22, 2024

chengzhuzhang commented Mar 22, 2024

tomvothecoder commented Apr 11, 2024 • edited Loading

tomvothecoder left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tomvothecoder commented Jul 3, 2024

chengzhuzhang commented Jul 10, 2024 • edited Loading

chengzhuzhang commented Jul 10, 2024

Choose a reason for hiding this comment

chengzhuzhang commented Jul 12, 2024

Choose a reason for hiding this comment

chengzhuzhang Jul 12, 2024 • edited Loading

Choose a reason for hiding this comment

chengzhuzhang commented Jul 12, 2024

tomvothecoder commented Jul 18, 2024 • edited Loading

chengzhuzhang commented Jul 19, 2024

tomvothecoder commented Jul 22, 2024

chengzhuzhang commented Jul 22, 2024 • edited Loading

tomvothecoder commented Jul 22, 2024 • edited Loading

cftime decoding -- cftime.DatetimeGregorian(2000, 7, 2, 0, 30, 0, 0, has_year_zero=False)

datetime.datetime decoding -- datetime.datetime(2000, 7, 2, 0, 30)

tomvothecoder commented Jul 22, 2024 • edited Loading

chengzhuzhang commented Jul 22, 2024

chengzhuzhang commented Jul 23, 2024

tomvothecoder commented Jul 23, 2024

chengzhuzhang commented Jul 24, 2024

chengzhuzhang commented Jul 25, 2024 • edited Loading

tomvothecoder commented Jul 25, 2024 • edited Loading

tomvothecoder commented Jul 25, 2024 • edited Loading

AODVIS

ALBEDO -- It just looks like np.inf is being used in xCDAT while np.nan is used with CDAT. I recall this happening in other regression tests. Replacing np.inf with np.nan resolves this issue and vice versa.

chengzhuzhang commented Jul 25, 2024 • edited Loading

tomvothecoder commented Jul 25, 2024 • edited Loading

chengzhuzhang commented Jul 25, 2024 • edited Loading

chengzhuzhang commented Jul 25, 2024

tomvothecoder commented Apr 11, 2024 •

edited

Loading

chengzhuzhang commented Jul 10, 2024 •

edited

Loading

chengzhuzhang Jul 12, 2024 •

edited

Loading

tomvothecoder commented Jul 18, 2024 •

edited

Loading

chengzhuzhang commented Jul 22, 2024 •

edited

Loading

tomvothecoder commented Jul 22, 2024 •

edited

Loading

`cftime` decoding -- `cftime.DatetimeGregorian(2000, 7, 2, 0, 30, 0, 0, has_year_zero=False)`

`datetime.datetime` decoding -- `datetime.datetime(2000, 7, 2, 0, 30)`

tomvothecoder commented Jul 22, 2024 •

edited

Loading

chengzhuzhang commented Jul 25, 2024 •

edited

Loading

tomvothecoder commented Jul 25, 2024 •

edited

Loading

tomvothecoder commented Jul 25, 2024 •

edited

Loading

ALBEDO -- It just looks like `np.inf` is being used in xCDAT while `np.nan` is used with CDAT. I recall this happening in other regression tests. Replacing `np.inf` with `np.nan` resolves this issue and vice versa.

chengzhuzhang commented Jul 25, 2024 •

edited

Loading

tomvothecoder commented Jul 25, 2024 •

edited

Loading

chengzhuzhang commented Jul 25, 2024 •

edited

Loading