Adding caravan forcing #407

Daafip · 2024-04-09T14:55:28Z

Hi All,

As this is my first contribution would like some feedback if possible.

Was unsure on the naming convention, generate seems to be more linked to the esmvaltool, thus went for retrieve.

Runs for me when I run:

from ewatercycle.forcing import sources
path = Path.cwd()
forcing_path = path / "Forcing"
experiment_start_date = "1997-08-01T00:00:00Z"
experiment_end_date = "2005-09-01T00:00:00Z"
HRU_id = 1022500

camels_forcing = sources['LumpedCaravanForcing'].retrieve(start_time = experiment_start_date,
                                                          end_time = experiment_end_date,
                                                          directory = forcing_path / "Camels",
                                                          basin_id = f"camels_0{HRU_id}",
                                                          variables = ('streamflow','potential_evaporation_sum'),
                                                          )

Discussions on the topic: #398

Still todo:

Daafip · 2024-04-10T10:39:14Z

Not too sure how to complete renaming from ERA-5 to ESMValTool compatible variable names.
Do you have any insight on how to do that @BSchilperoort?

Edit: for now just supplied a rename dictionary as in discussion

BSchilperoort

Hi David, nice work! I left some comments for possible improvements to the code. Feel free to ask a review again when it's ready!

src/ewatercycle/_forcings/caravan.py

BSchilperoort · 2024-04-11T11:35:44Z

Also, it would be nice if you could add tests for this forcing. (without downloading anything). If you're unsure on how to do that we can discuss it at some point

RolfHut · 2024-04-11T20:04:20Z

regarding conventions: from a didactic point of view in tutorial notebooks I suggest to always use:

import ewatercycle.forcing

camels_forcing = ewatercycle.forcing.sources['LumpedCaravanForcing'].retrieve(start_time = experiment_start_date,
                                                          end_time = experiment_end_date,
                                                          directory = forcing_path / "Camels",
                                                          basin_id = f"camels_0{HRU_id}",
                                                          variables = ('streamflow','potential_evaporation_sum'),
                                                          )

Novice users quickly loose sight of which function belongs to which library when using from ewatercycle.forcing import sources. This might load a bit more than you would ideally want, but communicates more clearly. In code 'under the hood' I'm more than fine with the shorter version.

Daafip · 2024-04-12T10:16:27Z

Made the suggested changes Rolf, Example Notebook can be found here:
https://gist.github.com/Daafip/ac1b030eb5563a76f4d02175f2716fd7

Daafip · 2024-04-16T08:12:36Z

Still facing three issues, maybe you could help with that Bart?:

test_utipl.test_merge_esmvaltool_datasets fails: . If i run the test locally in my IDE I don't have this problem but on the git tests I Do...?

FAILED tests/src/test_util.py::test_merge_esmvaltool_datasets - AssertionError: assert 'height' in {'cell_methods': 'day_of_year: year: mean', 'long_name': 'Near-Surface Air Temperature', 'standard_name': 'air_temperature', 'units': 'K'}

MyPy throws an error I dont understand in the SonarCloud analysis:

mypy.....................................................................Failed
-hook id: mypy
-exit code: 1

src/ewatercycle/_forcings/caravan.py:10: error: Library stubs not installed for "requests"  [import]
src/ewatercycle/_forcings/caravan.py:10: note: Hint: "python3 -m pip install types-requests"
src/ewatercycle/_forcings/caravan.py:10: note: (or run "mypy --install-types" to install all missing stub packages)
src/ewatercycle/_forcings/caravan.py:10: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports

Not sure how to make tests not download

BSchilperoort · 2024-04-18T10:43:09Z

test_utipl.test_merge_esmvaltool_datasets fails: . If i run the test locally in my IDE I don't have this problem but on the git tests I Do...?

That's odd. The "height" attribute should be in there. Perhaps it's an environment/versioning issue, and the environment is newly created on Github Actions for the test. So you don't see the error as you don't have all the latest versions of packages.

MyPy throws an error I dont understand in the SonarCloud analysis:

Probably also due to a version change. You can add the types-request package to the setup.cfg file below types-PyYAML. That should fix it.

Not sure how to make tests not download

You would have to mock/patch some of the code. For examples see

ewatercycle/tests/src/test_container.py

Line 177 in b4ea453

def mock_bmi_client_apptainer():

You'd mock this line in your code ds = xr.open_dataset(f"{OPENDAP_URL}{dataset}.nc"), with mock.patch("xarray.Dataset') as ....
You can then set the return value to return a dataset you load from a small testing file. And assert that the mocked function was called with the correct arguments.

BSchilperoort · 2024-05-01T13:11:46Z

Hi David, we found the test issue and fixed it in #410 . Once that's merged to main that should resolve the problem you encounter here.

On second thought I do think that it would be better to use the generate method to generate the forcing. You can add extra kwargs (i.e. basin_id, and just leave the shape one unused).

From the converage output in the CI I see this;

src/ewatercycle/_forcings/caravan.py                   85     28     32      3    60%

Generally we want 80% coverage on the code (sonarcloud analysis). I think if you add tests for the extract_basin_shapefile function that you might be there already.

After these things I think this PR is ready to merge! 👍

fix tests

Daafip · 2024-05-02T10:22:34Z

I think if you add tests for the extract_basin_shapefile

Done! That Bumps it up to 83%
src/ewatercycle/_forcings/caravan.py 84 11 32 3 83%

On second thought I do think that it would be better to use the generate method to generate the forcing. You can add extra kwargs (i.e. basin_id, and just leave the shape one unused).

Refctored this now, so all ready!

Pytest fails on the other test, sonar cloud is stuck on the token but should both be fine.

Thanks for all the input & feedback @BSchilperoort @sverhoeven!

Daafip · 2024-05-02T10:37:03Z

apart from the technical comments, a more hydrological point: we do have to make sure in the documentation that people are aware that the Caravan dataset is not "just" the Camels grouped together, but rather it is new (ERA) data derived for the shape files in the Camels datasets. See https://egusphere.copernicus.org/preprints/2024/egusphere-2024-864/ for a ongoing discussion in the hydrological community about this.

To get back to you on this @RolfHut, In the data set attributes under history it does mention the data source. This is now correctly copied over to the new datas set. If the user looks at the (meta)data, they should see its ERA5

src/ewatercycle/_forcings/caravan.py

CHANGELOG.md

Co-authored-by: Bart Schilperoort <b.schilperoort@gmail.com>

src/ewatercycle/_forcings/caravan.py

BSchilperoort · 2024-05-02T11:59:09Z

src/ewatercycle/_forcings/caravan.py

+            dataset: Unused
+
+            **kwargs:
+                basin_id: str containing the wanted basin_id.


Can you add where the user can lookup the basin ID?

See this is where geopandas would be nice :p, but will add a function which lets the user explore the dataset(ids).

I was hoping more for something like a webpage listing the basin_id's together with river/basin names, country, etc. I guess that doesn't exist?

Nope, but what i do now is list the datasets in a seperate function. That way the user can get the dataset as netcdf and a list of basin ID's

For advanced users this will do, for more novice users choosing a single catchment: likely they will have to download the combinedshapefile, load this into GIS and pick one

I was hoping more for something like a webpage listing the basin_id's together with river/basin names, country, etc. I guess that doesn't exist?

would be pretty easy to make an interactive folium/leaflet map out of this actually! Similar to this

Lets continue this discussion in issues: #398

Co-authored-by: Bart Schilperoort <b.schilperoort@gmail.com>

BSchilperoort

Nice work! Sorry that it was blocked a bit by the failing test+sonarcloud problems, but now it's ready to merge (after that 1 last comment above, and pre-commit haha) 😄

Edit: InputError isn't a buildin exception. You can pick a different appropriate one. This is why mypy fails.

Daafip added 14 commits April 9, 2024 14:22

adding caravan forcing class.

549879d

forgot to pass basin_id correctly

a9edeea

wrong path

3c9639d

specifiy output path for extraction

27ed1d0

wrong var_name path

290460b

replacing geopandas and wget

ad13709

adding lat/lon to dimensions based on shape

2f1e225

issue with to_array(), should be to_numpy()

e339a81

wrong dataset

a0fd925

wrong dataset ref

693d2c6

move basin properties to coordinates

add464c

Visual tweaks based on sonarcloud feedback

5903b6c

change intersection to union

765bd94

back to intersection, change order

5abb673

Daafip added 3 commits April 10, 2024 14:15

add rename era5 dict for now

711c5a6

also rename variables tuple

7087f09

add example

1968ff4

Daafip marked this pull request as ready for review April 10, 2024 13:58

BSchilperoort requested changes Apr 11, 2024

View reviewed changes

running pre-commit & refactoring accoring to suggestions

6eade11

refactor name, calling generate throws exception

e889298

Daafip added 3 commits April 12, 2024 12:20

Fix CI problems

982dd2c

Debug failing test

9bad2e0

Couldn't get temp file to work ('only' 30mb)

c7bc51d

Daafip and others added 2 commits April 29, 2024 15:36

adjust & run pre-commit

db8ff02

fixing indent in .pre-commit

fa8cfd5

Daafip and others added 8 commits May 2, 2024 11:04

test new generate object

7c8d7d3

change path in test

e794b74

add link to example notebook

21dd522

Merge pull request #1 from Daafip/dev

db621ed

fix tests

fix failing pre-commit tests

f598631

Merge branch 'dev'

515e50c

issues with tests/setting variables correctly aftere refactor

84a5058

re-add failing test

b9914e3

ensure history is also copied

771d360

BSchilperoort reviewed May 2, 2024

View reviewed changes

src/ewatercycle/_forcings/caravan.py Show resolved Hide resolved

Daafip and others added 2 commits May 2, 2024 13:49

final tweaks & changelog

bd77de2

Merge branch 'eWaterCycle:main' into main

d4d4eec

BSchilperoort reviewed May 2, 2024

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

Update CHANGELOG.md

36d1036

Co-authored-by: Bart Schilperoort <b.schilperoort@gmail.com>

BSchilperoort reviewed May 2, 2024

View reviewed changes

src/ewatercycle/_forcings/caravan.py Show resolved Hide resolved

BSchilperoort reviewed May 2, 2024

View reviewed changes

catch missing basin_id correctly

3889784

Co-authored-by: Bart Schilperoort <b.schilperoort@gmail.com>

BSchilperoort self-requested a review May 2, 2024 12:00

BSchilperoort approved these changes May 2, 2024

View reviewed changes

Daafip added 2 commits May 2, 2024 15:36

make basin_ids more findable

922d856

Merge branch 'main' of github.com:Daafip/ewatercycle

bffaa63

Daafip merged commit 12d3672 into eWaterCycle:main May 2, 2024
2 of 3 checks passed

Daafip mentioned this pull request May 2, 2024

change InputError to KeyError #412

Merged

Daafip mentioned this pull request Jun 6, 2024

Add CAMELS-USA #426

Open

6 tasks

sverhoeven mentioned this pull request Jun 17, 2024

Add Caravan Forcing to user documentation #431

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding caravan forcing #407

Adding caravan forcing #407

Daafip commented Apr 9, 2024 •

edited

Loading

Daafip commented Apr 10, 2024 •

edited

Loading

BSchilperoort left a comment

BSchilperoort commented Apr 11, 2024

RolfHut commented Apr 11, 2024 •

edited

Loading

Daafip commented Apr 12, 2024

Daafip commented Apr 16, 2024 •

edited

Loading

BSchilperoort commented Apr 18, 2024

BSchilperoort commented May 1, 2024

Daafip commented May 2, 2024

Daafip commented May 2, 2024 •

edited

Loading

BSchilperoort May 2, 2024

Daafip May 2, 2024

BSchilperoort May 2, 2024

Daafip May 2, 2024 •

edited

Loading

Daafip May 2, 2024

Daafip May 2, 2024

Daafip May 2, 2024

BSchilperoort left a comment •

edited

Loading

Adding caravan forcing #407

Adding caravan forcing #407

Conversation

Daafip commented Apr 9, 2024 • edited Loading

Daafip commented Apr 10, 2024 • edited Loading

BSchilperoort left a comment

Choose a reason for hiding this comment

BSchilperoort commented Apr 11, 2024

RolfHut commented Apr 11, 2024 • edited Loading

Daafip commented Apr 12, 2024

Daafip commented Apr 16, 2024 • edited Loading

BSchilperoort commented Apr 18, 2024

BSchilperoort commented May 1, 2024

Daafip commented May 2, 2024

Daafip commented May 2, 2024 • edited Loading

BSchilperoort May 2, 2024

Choose a reason for hiding this comment

Daafip May 2, 2024

Choose a reason for hiding this comment

BSchilperoort May 2, 2024

Choose a reason for hiding this comment

Daafip May 2, 2024 • edited Loading

Choose a reason for hiding this comment

Daafip May 2, 2024

Choose a reason for hiding this comment

Daafip May 2, 2024

Choose a reason for hiding this comment

Daafip May 2, 2024

Choose a reason for hiding this comment

BSchilperoort left a comment • edited Loading

Choose a reason for hiding this comment

Daafip commented Apr 9, 2024 •

edited

Loading

Daafip commented Apr 10, 2024 •

edited

Loading

RolfHut commented Apr 11, 2024 •

edited

Loading

Daafip commented Apr 16, 2024 •

edited

Loading

Daafip commented May 2, 2024 •

edited

Loading

Daafip May 2, 2024 •

edited

Loading

BSchilperoort left a comment •

edited

Loading