Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relying on pandas time coordinate being in "datetime64[ns]" format #151

Closed
JanStreffing opened this issue Sep 25, 2021 · 3 comments
Closed
Assignees
Labels
bug Something isn't working

Comments

@JanStreffing
Copy link
Contributor

  • pyfesom2 version: 0.2.0 dev_0
  • Python version: 3.9.6
  • Operating System: centos-linux-release-8.4-1.2105.el8.noarch

Description

We are relying on xr.open_mfdataset here

dataset = xr.open_mfdataset(paths, combine="by_coords", **kwargs)
which by default casts the time coordinate in datetime64[ns] format. However, owning up to its nanosecond accuracy, this format is limited in its valid (no overflow / underflow) range. It's centered around 1970-01-01 and valid for ~ +- 300 years from there. See:
pydata/xarray#4454 (comment)

Within this range we load data as:

Coordinates:
    time    (time)    datetime64[ns]    2000-01-31T23:20:00...

Outside of this range xarray defaults back to cftime objects.

Coordinates:
    time    (time)    object    2270-01-31 23:20:00...

When attempting load a timeseries that contains values from both cases we fail with:

TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_24152/1386845663.py in <module>
      5 
      6     a_ice[exp_name] = {}
----> 7     a_ice[exp_name]['data'] = pf.get_data(exp_path, 'a_ice', years, mesh, how=None, compute=False, silent=True)

/p/project/chhb19/jstreffi/software/pyfesom2/pyfesom2/load_mesh_data.py in get_data(result_path, variable, years, mesh, runid, records, depth, how, ncfile, compute, continuous, silent, **kwargs)
    518             print("Depth is None, 3d field will be returned")
    519 
--> 520     dataset = xr.open_mfdataset(paths, combine="by_coords", **kwargs)
    521     data = select_slices(dataset, variable, mesh, records, depth)
    522 

/p/project/chhb19/jstreffi/software/miniconda3/envs/pyfesom2/lib/python3.9/site-packages/xarray/backends/api.py in open_mfdataset(paths, chunks, concat_dim, compat, preprocess, engine, data_vars, coords, combine, parallel, join, attrs_file, combine_attrs, **kwargs)
    939             # Redo ordering from coordinates, ignoring how they were ordered
    940             # previously
--> 941             combined = combine_by_coords(
    942                 datasets,
    943                 compat=compat,

/p/project/chhb19/jstreffi/software/miniconda3/envs/pyfesom2/lib/python3.9/site-packages/xarray/core/combine.py in combine_by_coords(data_objects, compat, data_vars, coords, fill_value, join, combine_attrs, datasets)
    896         concatenated_grouped_by_data_vars = []
    897         for vars, datasets_with_same_vars in grouped_by_vars:
--> 898             concatenated = _combine_single_variable_hypercube(
    899                 list(datasets_with_same_vars),
    900                 fill_value=fill_value,

/p/project/chhb19/jstreffi/software/miniconda3/envs/pyfesom2/lib/python3.9/site-packages/xarray/core/combine.py in _combine_single_variable_hypercube(datasets, fill_value, data_vars, coords, compat, join, combine_attrs)
    602         )
    603 
--> 604     combined_ids, concat_dims = _infer_concat_order_from_coords(list(datasets))
    605 
    606     if fill_value is None:

/p/project/chhb19/jstreffi/software/miniconda3/envs/pyfesom2/lib/python3.9/site-packages/xarray/core/combine.py in _infer_concat_order_from_coords(datasets)
    109 
    110                 # ensure series does not contain mixed types, e.g. cftime calendars
--> 111                 _ensure_same_types(series, dim)
    112 
    113                 # Sort datasets along dim

/p/project/chhb19/jstreffi/software/miniconda3/envs/pyfesom2/lib/python3.9/site-packages/xarray/core/combine.py in _ensure_same_types(series, dim)
     52         if len(types) > 1:
     53             types = ", ".join(t.__name__ for t in types)
---> 54             raise TypeError(
     55                 f"Cannot combine along dimension '{dim}' with mixed types."
     56                 f" Found: {types}."

TypeError: Cannot combine along dimension 'time' with mixed types. Found: DatetimeGregorian, Timestamp.

I suggest we avoid datetime64[ns] altogether, as we don't need it's accuracy. This might mean modifications to diagnostics that use structures such as:

toplot = value[h].sel(time=value[h].time.dt.month.isin([month]))

@JanStreffing JanStreffing added the bug Something isn't working label Sep 25, 2021
@JanStreffing
Copy link
Contributor Author

Some more thoughts on this. I guess this is not something we can solve in pyfesom2. I have not seen a way from the open_mfdataset call to force netCDF4 reading onto a time coordinate of our choice. If such a way exists, that would solve our problem.

Also, it turns out we don't need any changes to our diagnostic routines, as long as we install nc-time-axis. I can make our diagnostics above the threshold just fine. Only when we have both years < 2263 and > 2263 in the same open_mfdataset the crash occurs.

If we can not find a way to control the time coordinate type I'll make an issue with xarray about it. The solution could be try to convert all time coordinates to cdfdate objects in xarray/core/combine.py line 54

@JanStreffing
Copy link
Contributor Author

As @koldunovn noted, http://xarray.pydata.org/en/stable/generated/xarray.open_mfdataset.html is a subclass of http://xarray.pydata.org/en/stable/generated/xarray.open_dataset.html

Both can use use_cftime optional argument, though it's only documented for the latter.

@JanStreffing
Copy link
Contributor Author

Problem was solved using use_cftime optional argument

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants