Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding caravan forcing #407

Merged
merged 55 commits into from
May 2, 2024
Merged

Adding caravan forcing #407

merged 55 commits into from
May 2, 2024

Conversation

Daafip
Copy link
Collaborator

@Daafip Daafip commented Apr 9, 2024

Hi All,

As this is my first contribution would like some feedback if possible.

Was unsure on the naming convention, generate seems to be more linked to the esmvaltool, thus went for retrieve.

Runs for me when I run:

from ewatercycle.forcing import sources
path = Path.cwd()
forcing_path = path / "Forcing"
experiment_start_date = "1997-08-01T00:00:00Z"
experiment_end_date = "2005-09-01T00:00:00Z"
HRU_id = 1022500

camels_forcing = sources['LumpedCaravanForcing'].retrieve(start_time = experiment_start_date,
                                                          end_time = experiment_end_date,
                                                          directory = forcing_path / "Camels",
                                                          basin_id = f"camels_0{HRU_id}",
                                                          variables = ('streamflow','potential_evaporation_sum'),
                                                          )

Discussions on the topic: #398

Still todo:

  • remove geopandas
  • remove wget
  • add documentation
  • add meaningfull example
  • Rename variables to correct ESMValTool
  • Add test (using mocking)
  • fix failing test: test_util.test_merge_esmvaltool_datasets

@Daafip
Copy link
Collaborator Author

Daafip commented Apr 10, 2024

Not too sure how to complete renaming from ERA-5 to ESMValTool compatible variable names.
Do you have any insight on how to do that @BSchilperoort?

Edit: for now just supplied a rename dictionary as in discussion

@Daafip Daafip marked this pull request as ready for review April 10, 2024 13:58
Copy link
Contributor

@BSchilperoort BSchilperoort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi David, nice work! I left some comments for possible improvements to the code. Feel free to ask a review again when it's ready!

src/ewatercycle/_forcings/caravan.py Outdated Show resolved Hide resolved
src/ewatercycle/_forcings/caravan.py Outdated Show resolved Hide resolved
src/ewatercycle/_forcings/caravan.py Outdated Show resolved Hide resolved
src/ewatercycle/_forcings/caravan.py Outdated Show resolved Hide resolved
src/ewatercycle/_forcings/caravan.py Outdated Show resolved Hide resolved
src/ewatercycle/_forcings/caravan.py Outdated Show resolved Hide resolved
src/ewatercycle/_forcings/caravan.py Outdated Show resolved Hide resolved
src/ewatercycle/_forcings/caravan.py Outdated Show resolved Hide resolved
src/ewatercycle/_forcings/caravan.py Outdated Show resolved Hide resolved
src/ewatercycle/_forcings/caravan.py Outdated Show resolved Hide resolved
@BSchilperoort
Copy link
Contributor

Also, it would be nice if you could add tests for this forcing. (without downloading anything). If you're unsure on how to do that we can discuss it at some point

@RolfHut
Copy link
Contributor

RolfHut commented Apr 11, 2024

regarding conventions: from a didactic point of view in tutorial notebooks I suggest to always use:

import ewatercycle.forcing

camels_forcing = ewatercycle.forcing.sources['LumpedCaravanForcing'].retrieve(start_time = experiment_start_date,
                                                          end_time = experiment_end_date,
                                                          directory = forcing_path / "Camels",
                                                          basin_id = f"camels_0{HRU_id}",
                                                          variables = ('streamflow','potential_evaporation_sum'),
                                                          )

Novice users quickly loose sight of which function belongs to which library when using from ewatercycle.forcing import sources. This might load a bit more than you would ideally want, but communicates more clearly. In code 'under the hood' I'm more than fine with the shorter version.

@Daafip
Copy link
Collaborator Author

Daafip commented Apr 12, 2024

Made the suggested changes Rolf, Example Notebook can be found here:
https://gist.github.com/Daafip/ac1b030eb5563a76f4d02175f2716fd7

@Daafip
Copy link
Collaborator Author

Daafip commented Apr 16, 2024

Still facing three issues, maybe you could help with that Bart?:

  • test_utipl.test_merge_esmvaltool_datasets fails: . If i run the test locally in my IDE I don't have this problem but on the git tests I Do...?
FAILED tests/src/test_util.py::test_merge_esmvaltool_datasets - AssertionError: assert 'height' in {'cell_methods': 'day_of_year: year: mean', 'long_name': 'Near-Surface Air Temperature', 'standard_name': 'air_temperature', 'units': 'K'}
  • MyPy throws an error I dont understand in the SonarCloud analysis:
mypy.....................................................................Failed
-hook id: mypy
-exit code: 1

src/ewatercycle/_forcings/caravan.py:10: error: Library stubs not installed for "requests"  [import]
src/ewatercycle/_forcings/caravan.py:10: note: Hint: "python3 -m pip install types-requests"
src/ewatercycle/_forcings/caravan.py:10: note: (or run "mypy --install-types" to install all missing stub packages)
src/ewatercycle/_forcings/caravan.py:10: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
  • Not sure how to make tests not download

@BSchilperoort
Copy link
Contributor

test_utipl.test_merge_esmvaltool_datasets fails: . If i run the test locally in my IDE I don't have this problem but on the git tests I Do...?

That's odd. The "height" attribute should be in there. Perhaps it's an environment/versioning issue, and the environment is newly created on Github Actions for the test. So you don't see the error as you don't have all the latest versions of packages.

MyPy throws an error I dont understand in the SonarCloud analysis:

Probably also due to a version change. You can add the types-request package to the setup.cfg file below types-PyYAML. That should fix it.

Not sure how to make tests not download

You would have to mock/patch some of the code. For examples see

def mock_bmi_client_apptainer():

You'd mock this line in your code ds = xr.open_dataset(f"{OPENDAP_URL}{dataset}.nc"), with mock.patch("xarray.Dataset') as ....
You can then set the return value to return a dataset you load from a small testing file. And assert that the mocked function was called with the correct arguments.

@BSchilperoort
Copy link
Contributor

Hi David, we found the test issue and fixed it in #410 . Once that's merged to main that should resolve the problem you encounter here.

On second thought I do think that it would be better to use the generate method to generate the forcing. You can add extra kwargs (i.e. basin_id, and just leave the shape one unused).

From the converage output in the CI I see this;

src/ewatercycle/_forcings/caravan.py                   85     28     32      3    60%

Generally we want 80% coverage on the code (sonarcloud analysis). I think if you add tests for the extract_basin_shapefile function that you might be there already.

After these things I think this PR is ready to merge! 👍

@Daafip
Copy link
Collaborator Author

Daafip commented May 2, 2024

I think if you add tests for the extract_basin_shapefile

Done! That Bumps it up to 83%
src/ewatercycle/_forcings/caravan.py 84 11 32 3 83%

On second thought I do think that it would be better to use the generate method to generate the forcing. You can add extra kwargs (i.e. basin_id, and just leave the shape one unused).

Refctored this now, so all ready!

Pytest fails on the other test, sonar cloud is stuck on the token but should both be fine.

Thanks for all the input & feedback @BSchilperoort @sverhoeven!

@Daafip
Copy link
Collaborator Author

Daafip commented May 2, 2024

apart from the technical comments, a more hydrological point: we do have to make sure in the documentation that people are aware that the Caravan dataset is not "just" the Camels grouped together, but rather it is new (ERA) data derived for the shape files in the Camels datasets. See https://egusphere.copernicus.org/preprints/2024/egusphere-2024-864/ for a ongoing discussion in the hydrological community about this.

To get back to you on this @RolfHut, In the data set attributes under history it does mention the data source. This is now correctly copied over to the new datas set. If the user looks at the (meta)data, they should see its ERA5

CHANGELOG.md Outdated Show resolved Hide resolved
Co-authored-by: Bart Schilperoort <b.schilperoort@gmail.com>
dataset: Unused

**kwargs:
basin_id: str containing the wanted basin_id.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add where the user can lookup the basin ID?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See this is where geopandas would be nice :p, but will add a function which lets the user explore the dataset(ids).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping more for something like a webpage listing the basin_id's together with river/basin names, country, etc. I guess that doesn't exist?

Copy link
Collaborator Author

@Daafip Daafip May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, but what i do now is list the datasets in a seperate function. That way the user can get the dataset as netcdf and a list of basin ID's

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For advanced users this will do, for more novice users choosing a single catchment: likely they will have to download the combinedshapefile, load this into GIS and pick one

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping more for something like a webpage listing the basin_id's together with river/basin names, country, etc. I guess that doesn't exist?

would be pretty easy to make an interactive folium/leaflet map out of this actually! Similar to this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets continue this discussion in issues: #398

Co-authored-by: Bart Schilperoort <b.schilperoort@gmail.com>
@BSchilperoort BSchilperoort self-requested a review May 2, 2024 12:00
Copy link
Contributor

@BSchilperoort BSchilperoort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Sorry that it was blocked a bit by the failing test+sonarcloud problems, but now it's ready to merge (after that 1 last comment above, and pre-commit haha) 😄

Edit: InputError isn't a buildin exception. You can pick a different appropriate one. This is why mypy fails.

@Daafip Daafip merged commit 12d3672 into eWaterCycle:main May 2, 2024
2 of 3 checks passed
@Daafip Daafip mentioned this pull request Jun 6, 2024
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants