Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catalogs refactor #849

Merged
merged 43 commits into from
Apr 25, 2024
Merged

Catalogs refactor #849

merged 43 commits into from
Apr 25, 2024

Conversation

DirkEilander
Copy link
Contributor

@DirkEilander DirkEilander commented Mar 22, 2024

Issue addressed

Fixes #844

Explanation

This PR updates the predefined catalogs of HydroMT. The catalogs are now stored in their own folder with each version having its own data catalog. The versions are tracked in a registry.txt. This file contains the relative path and the hash of the data catalog.
Before the different versions of datacatalogs were retrieved by using the git hashes stored in predefinend_catalogs.yml. This file has henceforth been deprecated. Instead a PredefinedCatalog class has been created to retrieve specific versions of predefined data catalogs.

Example of retrieving different versions of data catalogs:

from hydromt import DataCatalog

data_catalog = DataCatalog(data_libs="deltares_data=v0.5.0") # Retrieve deltares_data v0.5.0 data catalog
data_catalog_latest = DataCatalog(data_libs="deltares_data") # Will implicitly retrieve the latest version of deltares_data

If you want to create your own version of deltares_data for example, you should make a new folder in the data/catalogs/deltares_data/ with the version as its name and put the data catalog file there and call name it data_catalog.yml. If you then run the update_versions.py script to update the registry file.

Checklist

  • Updated tests or added new tests
  • Branch is up to date with main
  • Tests & pre-commit hooks pass
  • Updated documentation if needed
  • Updated changelog.rst if needed
  • For predefined catalogs: update the catalog version in the file itself, the references in data/predefined_catalogs.yml, and the changelog in data/changelog.rst

Additional Notes (optional)

Add any additional notes or information that may be helpful.

@DirkEilander
Copy link
Contributor Author

DirkEilander commented Mar 28, 2024

TODO:

  • try to fix linux vs windows os issue with hash; if not easily fixed remove it
  • add older version to artifact_data and deltares_data (check if version numbering makes sense)
  • docs and changelogs
  • cache data_catalog.yml files in DataCatalog._cache_dir/<catalog_name>//data_catalog.yml
  • if versions.yml not accessible use predefined_catalogs._get_catalog_versions to browse local versions

Decided to postpone to v1

  • create entrypoint for data catalogs plugins

@Tjalling-dejong
Copy link
Contributor

@DirkEilander in the case you are still working, I do not really understand what I need to mock with the entry_points in _get_catalog_eps. The entry_points are empty in the tests and where is hydromt.catalogs entrypoint defined?

@DirkEilander
Copy link
Contributor Author

DirkEilander commented Apr 18, 2024

@DirkEilander in the case you are still working, I do not really understand what I need to mock with the entry_points in _get_catalog_eps. The entry_points are empty in the tests and where is hydromt.catalogs entrypoint defined?

@Tjalling-dejong The idea is to test predefined catalogs which are provided by entrypoints. These can be set by users in their pyproject.toml as a "hydromt.catalogs" entrypoint and are discovered in predefined_catalogs.py. For the core these are bypassed as LOCAL_EPS.

I'm however also fine with skipping this test for now as @savente93 has done a really nice job in v1 to generalize all entrypoints in plugins.py. It would be nice to use that after merging this work in v1 too in which case the tests will probably have te be rewritten. Compared to the other entrypoints this refers to a versions.yml file rather than a class though. But if needed we could refactor the predefined catalog versions into a simply (pydantic) class as well.

@savente93
Copy link
Contributor

@Tjalling-dejong if you want me to go over how we did plugins.py let me know

@Tjalling-dejong Tjalling-dejong marked this pull request as ready for review April 18, 2024 13:09
@savente93
Copy link
Contributor

@Tjalling-dejong can you please update the title, description and checklist?

Copy link
Contributor

@savente93 savente93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have to finish the full review later, but here are some preliminary comments.

data/catalogs/artifact_data/v0.0.7/data_catalog.yml Outdated Show resolved Hide resolved
data/catalogs/changelog.rst Outdated Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an explanation on how to call this script? e.g. does it take arguments? if so which?


def _get_file_hash(file_path: Path) -> str:
"""Get the md5 hash of a file."""
hash_func = hashlib.md5()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you use SHA-1? that's what git uses which I think is more consistent.

@DirkEilander DirkEilander changed the title [WIP] Catalogs refactor Catalogs refactor Apr 19, 2024
@Tjalling-dejong
Copy link
Contributor

@savente93 Can you review this PR?
Note that the Build Documentation fails because it tries to retrieve a data catalog versions.yaml from main that does not exist yet.

@savente93
Copy link
Contributor

@Tjalling-dejong I'll have a more indepth look in a bit, the failing docs are okay, but there are still a few issues flagged my sonar cube and the PR description is still empty. Could you fix those in the mean time please

@savente93
Copy link
Contributor

So, perhaps a stupid question but: is this change backwards compatible? i.e. will previous versions of hydromt break if we merge this? (I'll do a review regardless but I'd like some confirmation that this won't break things again before we merge)

@Tjalling-dejong
Copy link
Contributor

So, perhaps a stupid question but: is this change backwards compatible? i.e. will previous versions of hydromt break if we merge this? (I'll do a review regardless but I'd like some confirmation that this won't break things again before we merge)

I dont think it will break previous versions of hydromt. It works even if you do not specify a data catalog version.

Copy link
Contributor

@savente93 savente93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work Tjalling! I like the descriptions and testing. Just a couple small comments left and then it should be good to go :)

data/catalogs/README.rst Outdated Show resolved Hide resolved
data/catalogs/changelog.rst Outdated Show resolved Hide resolved
hydromt/data_catalog.py Show resolved Hide resolved
hydromt/data_catalog.py Show resolved Hide resolved
tests/conftest.py Outdated Show resolved Hide resolved
Tjalling-dejong and others added 4 commits April 25, 2024 15:20
Co-authored-by: Sam Vente <savente93@gmail.com>
Co-authored-by: Sam Vente <savente93@gmail.com>
Copy link
Contributor

@savente93 savente93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but Jaap said he wanted to review as well so I'll leave the final approve to him

@Tjalling-dejong Tjalling-dejong merged commit 07537c7 into main Apr 25, 2024
11 of 12 checks passed
@Tjalling-dejong Tjalling-dejong deleted the catalogs_refactor branch April 25, 2024 15:02
Tjalling-dejong added a commit that referenced this pull request Apr 25, 2024
@savente93 savente93 restored the catalogs_refactor branch May 6, 2024 13:26
@savente93 savente93 mentioned this pull request May 10, 2024
5 tasks
@savente93 savente93 deleted the catalogs_refactor branch May 14, 2024 11:44
@savente93 savente93 mentioned this pull request Jun 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

improve predefined catalog implementation
3 participants