Skip to content

opendap / dap4 support for pydap backend #10182

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Apr 19, 2025
Merged

Conversation

Mikejmnez
Copy link
Contributor

@Mikejmnez Mikejmnez commented Mar 28, 2025

There have been some (revitalizing) upgrades over the last year to pydap (with some more to come), including authentication, pydap data model (dap2 vs dap4). This Draft PR aims to

  • DOC: Update info on how to authenticate (handled by requests, which recovers credentials )
  • DOC: updates links to new (regularly updated) pydap documentation.
  • DOC: How to define opendap protocol via URLs.
  • DOC: Some info on caching sessions to avoid repeatedly downloading same metadata/data from server. At least point to pydap documentation. (Performance upgrade)
  • Removes slashes in variable names (part of dap4 protocol, which allows for hierarchical variables in dataset)
  • Support for DataTree.
  • Improves testing of backend.
  • removes (largely unused) output_grid option. It is now set to False as default in pydap.
  • Add deprecation warning of output_grid as an input parameter. It will always be False as default (and there is no need for it to be True, as it only adds unnecessary logic).

This PR enables, for example (check test opendap dap4 url)

URL = "dap4://test.opendap.org/opendap/dap4/SimpleGroup.nc4.h5"
ds = xr.open_datatree(URL, engine='pydap')

Screenshot 2025-04-14 at 2 23 17 PM

@github-actions github-actions bot added topic-backends topic-indexing topic-documentation topic-plotting topic-performance topic-zarr Related to zarr storage library CI Continuous Integration tools topic-rolling Automation Github bots, testing workflows, release automation io labels Apr 12, 2025
@Mikejmnez
Copy link
Contributor Author

Apologies for all the added labels / contributors. Didn't have much time to work on this pr this week, and so I had to rebase and merge the main branch to this one. Changed files are correct though.

@Mikejmnez Mikejmnez marked this pull request as ready for review April 14, 2025 15:38
@Mikejmnez
Copy link
Contributor Author

Ready for review :)

Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Mikejmnez

The test_cmp_local_file failure is real.

@@ -49,36 +51,37 @@ def __getitem__(self, key):
)

def _getitem(self, key):
# pull the data from the array attribute if possible, to avoid
# downloading coordinate data twice
array = getattr(self.array, "array", self.array)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we need this any more?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I set the output_grid=False by default on pydap, so that should not be necessary. But I'll double check (All the tests passed on the xarray-tests environment)

pytest -v xarray/tests/test_backends.py --run-network-tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated/upgraded to a newer pydap version on min_all_dep.yml. The failing test test_cmp_local_file is gone as expected. Meaning this is correct. I am also removing output_grid as an argument. It is not really used. So I added a deprecation warning.

@Mikejmnez
Copy link
Contributor Author

Mikejmnez commented Apr 16, 2025

It looks like the failing for ubuntu-latest py3.10 min-all-deps has pydap pinned to 3.4 , which is at least 1 year old (and would explain the errors). The latest conda version is 3.5.5.

@github-actions github-actions bot added the dependencies Pull requests that update a dependency file label Apr 16, 2025
@dcherian
Copy link
Contributor

We like to keep compatibility for ~ a year. I could see relaxing this in special cases. Can we do 3.5.0? That was released in August so ~ 8 months ago.

@Mikejmnez
Copy link
Contributor Author

We like to keep compatibility for ~ a year. I could see relaxing this in special cases. Can we do 3.5.0? That was released in August so ~ 8 months ago.

Cool. I think it is totally doable. There might be one or two features that may be newer than that, I might just have to add a try / except case once or twice for backwards compatibility.

@Mikejmnez
Copy link
Contributor Author

I set the min version to pydap 3.5.0, which has almost everything except a few methods implemented in 3.5.1. Let me know if there is any other change needed aside from updating/merging main into this branch :) .

@dcherian
Copy link
Contributor

Thanks! I think we can merge. Can you add a note to whats-new mentioning the version bump under Breaking Changes please? It'd be nice to add a note under New Features too described all these great improvements.

@Mikejmnez
Copy link
Contributor Author

Awesome! Will do shortly

…aking Changes` and documentation for backend engine
@dcherian
Copy link
Contributor

Sorry one last test failure on the flaky env (which I think is the only one where pydap tests are run)

ValueError: conflicting sizes for dimension 'lat': length 1 on 'subgroup1_var' and length 2 on {'lat': 'subgroup1_var'}

@Mikejmnez
Copy link
Contributor Author

'lat': 'subgroup1

Hmmm - there might be an issue with that test dataset. I'll look into this

@Mikejmnez
Copy link
Contributor Author

Mikejmnez commented Apr 18, 2025

I don't really know what is going on... I am having a hard time replicating the error. I installed the environment ci/requirements/environment.yml and ran:

pytest xarray/tests/test_backends_datatree.py -v --run-flaky --run-network-tests -W default 

And the test highlighted here passes.

Screenshot 2025-04-17 at 5 50 07 PM

There was, however, another test that failed but only once:

____________________________ TestPyDAPDatatreeIO.test_inherited_coords _________
        with xr.open_dataset(url, engine=self.engine, group="/SimpleGroup") as expected:
/Users/jimenezm/xarray/xarray/tests/test_backends_datatree.py:505: AssertionError

When I run the same testing command, without making any change, all tests pass no matter how many times I try. I repeatedly try to replicate these errors and cannot. I have pydap 3.5.5, xarray installed in dev mode, and python 3.13.3 installed.

For example according to the failed test, the following should fail (python 3.13.3 pydap=3.5.5):

Screenshot 2025-04-17 at 5 43 09 PM

I am kind of confused... Any suggestion @dcherian ?

@github-actions github-actions bot added the topic-DataTree Related to the implementation of a DataTree class label Apr 18, 2025
@Mikejmnez
Copy link
Contributor Author

Ok! I think likely there was something off with the remote dataset, and it got updated. flaky passes all tests. I see Minimum Version Policy fails because of Pydap=3.5.0 (and I think that will be ok).

I looked at the RuntimeError for the docs/readthedocs build which failed, and I see:

RuntimeError: Unexpected warning in `/home/docs/checkouts/readthedocs.org/user_builds/xray/checkouts/10182/doc/user-guide/computation.rst` line 644

Exception occurred:
  File "/home/docs/checkouts/readthedocs.org/user_builds/xray/conda/10182/lib/python3.12/site-packages/IPython/sphinxext/ipython_directive.py", line 603, in process_input
    raise RuntimeError(
RuntimeError: Unexpected warning in `/home/docs/checkouts/readthedocs.org/user_builds/xray/checkouts/10182/doc/user-guide/computation.rst` line 644

However, that environment does not even use pydap. I think that is it.

@dcherian let me know if there is any other thing needed.

Have a good weekend!

@dcherian
Copy link
Contributor

Nice, thanks for your work here. Sorry it was a bit annoying at the end.

@dcherian dcherian merged commit e39f59e into pydata:main Apr 19, 2025
29 of 32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Automation Github bots, testing workflows, release automation CI Continuous Integration tools dependencies Pull requests that update a dependency file io topic-backends topic-DataTree Related to the implementation of a DataTree class topic-documentation topic-indexing topic-performance topic-plotting topic-rolling topic-zarr Related to zarr storage library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants