Skip to content
This repository has been archived by the owner on Apr 30, 2021. It is now read-only.

esmlab.anomaly generating error, unrealistically large time values #162

Open
klindsay28 opened this issue Dec 17, 2019 · 5 comments
Open
Labels
bug Something isn't working

Comments

@klindsay28
Copy link

When I run the commands in the following code snippet, I get a ValueError exception. The Traceback is below the code snippet. The values listed in the cftime.DatetimeNoLeap call in the Traceback are fairly suspicious, as if something were uninitialized.

The call to esmlab.anomaly works fine if I comment out the 2nd fname setting, and use a timeseries from CESM2's atm component. I don't see a smoking gun difference between the file that triggers the exception and the file that doesn't.

Code Snippet

import xarray as xr
import esmlab
dir = '/glade/work/klindsay/analysis/CESM2_coup_carb_cycle_JAMES/tseries'
fname = 'SFCO2_OCN_atm_piControl_00_mon.nc'
fname = 'FG_CO2_ocn_piControl_00_mon.nc'
path = '/'.join([dir, fname])
ds = xr.open_dataset(path)
ds_anom = esmlab.anomaly(ds, clim_freq='mon')

Traceback

/glade/work/klindsay/analysis/CESM2_coup_carb_cycle_JAMES/esmlab_dev/esmlab/core.py:291: RuntimeWarning: invalid value encountered in greater
calendar=self.time_attrs['calendar'],
Traceback (most recent call last):
File "foo5.py", line 8, in
ds_anom = esmlab.anomaly(ds, clim_freq='mon')
File "/glade/work/klindsay/analysis/CESM2_coup_carb_cycle_JAMES/esmlab_dev/esmlab/core.py", line 717, in anomaly
slice_mon_clim_time=slice_mon_clim_time
File "/glade/work/klindsay/miniconda3/envs/CESM2_coup_carb_cycle_JAMES_tst/lib/python3.7/contextlib.py", line 74, in inner
return func(*args, **kwds)
File "/glade/work/klindsay/analysis/CESM2_coup_carb_cycle_JAMES/esmlab_dev/esmlab/core.py", line 502, in compute_mon_anomaly
return self.restore_dataset(computed_dset, attrs=attrs)
File "/glade/work/klindsay/analysis/CESM2_coup_carb_cycle_JAMES/esmlab_dev/esmlab/core.py", line 291, in restore_dataset
calendar=self.time_attrs['calendar'],
File "cftime/_cftime.pyx", line 320, in cftime._cftime.num2date
File "cftime/_cftime.pyx", line 869, in cftime._cftime.utime.num2date
File "cftime/_cftime.pyx", line 564, in cftime._cftime.DateFromJulianDay
File "cftime/_cftime.pyx", line 1378, in cftime._cftime.DatetimeNoLeap.init
File "cftime/_cftime.pyx", line 1637, in cftime._cftime.assert_valid_date
ValueError: invalid second provided in cftime.DatetimeNoLeap(-5883517, 2, 27, 0, 0, -2147483648, -2147483648, 5, 58)

Output of esmlab.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 21:52:21)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-693.21.1.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

esmlab: 2019.4.27.post43
xarray: 0.14.0
pandas: 0.25.2
numpy: 1.17.3
scipy: 1.3.1
xesmf: 0.2.1
cftime: 1.0.3.4
dask: 2.6.0
distributed: 2.6.0
setuptools: 41.4.0
pip: 19.3.1
conda: None
pytest: None
IPython: 7.9.0
sphinx: None

@klindsay28 klindsay28 added the bug Something isn't working label Dec 17, 2019
@mnlevy1981
Copy link
Contributor

Looking at the two files, SFCO2_OCN_atm_piControl_00_mon.nc has time units of days since 0001-01-01 and FG_CO2_ocn_piControl_00_mon.nc has time units of days since 0000-01-01. I was under the impression that cftime didn't like having a year 0, but this appears to be a red herring as

import xarray as xr
import esmlab
dir = '/glade/work/klindsay/analysis/CESM2_coup_carb_cycle_JAMES/tseries'
# fname = 'SFCO2_OCN_atm_piControl_00_mon.nc'
fname = 'FG_CO2_ocn_piControl_00_mon.nc'
path = '/'.join([dir, fname])
ds = xr.open_dataset(path, decode_times=False)
ds.time.values = ds.time.values - 365
ds.time.attrs['units'] = "days since 0001-01-01"
ds = xr.decode_cf(ds)
ds_anom = esmlab.anomaly(ds, clim_freq='mon')

yields the same traceback:

/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/esmlab/core.py:291: RuntimeWarning: invalid value encountered in greater
  calendar=self.time_attrs['calendar'],
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/esmlab/core.py", line 730, in anomaly
    slice_mon_clim_time=slice_mon_clim_time
  File "/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/contextlib.py", line 74, in inner
    return func(*args, **kwds)
  File "/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/esmlab/core.py", line 515, in compute_mon_anomaly
    return self.restore_dataset(computed_dset, attrs=attrs)
  File "/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/esmlab/core.py", line 291, in restore_dataset
    calendar=self.time_attrs['calendar'],
  File "cftime/_cftime.pyx", line 320, in cftime._cftime.num2date
  File "cftime/_cftime.pyx", line 869, in cftime._cftime.utime.num2date
  File "cftime/_cftime.pyx", line 564, in cftime._cftime.DateFromJulianDay
  File "cftime/_cftime.pyx", line 1378, in cftime._cftime.DatetimeNoLeap.__init__
  File "cftime/_cftime.pyx", line 1637, in cftime._cftime.assert_valid_date
ValueError: invalid second provided in cftime.DatetimeNoLeap(-5883517, 2, 27, 0, 0, -2147483648, -2147483648, 5, 58)

What's odd is that I can't find the index of ds.time corresponding to cftime.DatetimeNoLeap(-5883517, 2, 27, 0, 0, -2147483648, -2147483648, 5, 58):

for i in range(14400):
    sec = ds.time.values[i].second
    if sec != 0:
            print(f'{i}: {ds.time.values[i]}')

outputs

0: 0001-01-16 12:59:59

So I guess an internal esmlab computation is creating a bad time axis?

@klindsay28
Copy link
Author

If I change fname to fname = 'TOTECOSYSC_lnd_esm-piControl_00_mon.nc', I see the same error, and time:units in this file is days since 0001-01-01.

So, @mnlevy1981 , I also think there is problem in esmlab.

The -2147483648 values lead me to suspect that something isn't getting initialized. I am not able to follow the esmlab code, so I don't know where the problem resides.

@mnlevy1981
Copy link
Contributor

I think @andersy005 is going to need to look into this one, but I just want to point out that your block of code runs fine if you do not decode time when opening the dataset:

import xarray as xr
import esmlab
dir = '/glade/work/klindsay/analysis/CESM2_coup_carb_cycle_JAMES/tseries'
fname = 'SFCO2_OCN_atm_piControl_00_mon.nc'
fname = 'FG_CO2_ocn_piControl_00_mon.nc'
path = '/'.join([dir, fname])
ds = xr.open_dataset(path, decode_times=False)
ds_anom = esmlab.anomaly(ds, clim_freq='mon')

That might be a useful workaround until this issue can be resolved.

@klindsay28
Copy link
Author

Thanks for the workaround @mnlevy1981
Another is to replace the problematic time coordinate with a non-problematic time coordinate.
This assumes you have a non-problematic time coordinate, which fortunately I appear to have in my post-processed atm files. In particular, the following runs without error:

import xarray as xr
import esmlab

def repl_coord(coordname, ds1, ds2, deep=False):
    """
    Return copy of d2 with coordinate coordname replaced, using coordname from ds1.
    The returned Dataset is otherwise a copy of ds2.
    The copy is deep or not depending on the argument deep.
    """
    if 'bounds' in ds2[coordname].attrs:
        tb_name = ds2[coordname].attrs['bounds']
        ds_out = ds2.drop(tb_name).copy(deep)
    else:
        ds_out = ds2.copy(deep)
    ds_out[coordname] = ds1[coordname]
    if 'bounds' in ds1[coordname].attrs:
        tb_name = ds1[coordname].attrs['bounds']
        ds_out[tb_name] = ds1[tb_name]
    return ds_out

dir = '/glade/work/klindsay/analysis/CESM2_coup_carb_cycle_JAMES/tseries'

fname = 'SFCO2_OCN_atm_piControl_00_mon.nc'
path = '/'.join([dir, fname])
ds1 = xr.open_dataset(path)

fname = 'TOTECOSYSC_lnd_piControl_00_mon.nc'
fname = 'FG_CO2_ocn_piControl_00_mon.nc'
path = '/'.join([dir, fname])
ds2 = xr.open_dataset(path)
ds2 = repl_coord('time', ds1, xr.open_dataset(path))
ds2_anom = esmlab.anomaly(ds2, clim_freq='mon')

@klindsay28
Copy link
Author

I got bit by this again, in a workflow where there wasn't a convenient good time coordinate to replace the bad one with. It's also not convenient to add decode_times=False to the open_dataset/open_mfdataset calls.

I found that modifying the time values by a small amount, less than 1.0e-10, can avoid the problem when it arises. So I put a kludge in my wrapper to esmlab.anomaly to change the time values by small amounts if they are close to known problematic values. It's very kludgy.

Any chance that this will get looked at in the near future?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants