Skip to content

Stream ARCO ERA5 data#357

Merged
NoraLoose merged 36 commits into
CWorthy-ocean:mainfrom
NoraLoose:stream-era5
Jun 26, 2025
Merged

Stream ARCO ERA5 data#357
NoraLoose merged 36 commits into
CWorthy-ocean:mainfrom
NoraLoose:stream-era5

Conversation

@NoraLoose
Copy link
Copy Markdown
Collaborator

@NoraLoose NoraLoose commented Jun 23, 2025

This PR is in collaboration with @ScottEilerman. I started from #355, but somehow could not push to it directly. The PR implements streaming ERA5 data directly from the cloud so that users do not need to pre-download their ERA5 data.

Changes

  • Introduced ERA5ARCODataset class.
  • Added gcsfs as an optional dependency to support cloud streaming.
  • Introduced a new pytest marker stream to isolate streaming tests:
  pytest -m stream --stream

This runs only streaming tests and excludes them otherwise. This allows faster development since streaming tests take longer.

  • Added streaming tests to CI.
  • Added ERA5 streaming to documentation.

Other changes:

  • Reorganized internal logic for handling use_dask and read_zarr workflows.

  • Cleaned up conftest.py by removing unnecessary request parameters from fixtures.

  • Adjusted fixture start_time values in conftest.py to more intuitive dates (changed from January 31 to February 2) since source data does not include January.

  • Tests added

  • Passes pre-commit run --all-files

  • Changes are documented in docs/releases.md

  • New functions/methods are listed in docs/api.rst

  • New functionality has documentation

@review-notebook-app
Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Copy Markdown
Contributor

@ScottEilerman ScottEilerman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pending confirmation that the diffs in the data look fine, this is ready to go! Thanks for helping get it cleaned up!

@ScottEilerman ScottEilerman mentioned this pull request Jun 25, 2025
6 tasks
@NoraLoose
Copy link
Copy Markdown
Collaborator Author

@ScottEilerman I'm still getting the warning:

RequestsDependencyWarning: Unable to find acceptable character detection dependency (chardet or charset_normalizer).
  warnings.warn(

in the docs notebook when instantiating the SurfaceForcing object with streaming. We can leave it for a future PR to try to fix that.

@ScottEilerman
Copy link
Copy Markdown
Contributor

@ScottEilerman I'm still getting the warning:

RequestsDependencyWarning: Unable to find acceptable character detection dependency (chardet or charset_normalizer).
  warnings.warn(

in the docs notebook when instantiating the SurfaceForcing object with streaming. We can leave it for a future PR to try to fix that.

What seems odd here is that chardet-normalizer is a dependency of requests, and at least in a fresh environment, seems to get installed via both conda and pip. Is it possible the environment where these doc notebooks are running is maybe in a weird state?

@NoraLoose
Copy link
Copy Markdown
Collaborator Author

Is it possible the environment where these doc notebooks are running is maybe in a weird state?

That was indeed the case! Thanks!

@NoraLoose
Copy link
Copy Markdown
Collaborator Author

This is good to merge.

@NoraLoose NoraLoose merged commit 23e6148 into CWorthy-ocean:main Jun 26, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants