-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Catalogs refactor #849
Catalogs refactor #849
Conversation
…s/eilan_dk/repos/hydromt/data/catalogs/update_versions.py
TODO:
Decided to postpone to v1
|
…catalogs_refactor
@DirkEilander in the case you are still working, I do not really understand what I need to mock with the entry_points in _get_catalog_eps. The entry_points are empty in the tests and where is hydromt.catalogs entrypoint defined? |
@Tjalling-dejong The idea is to test predefined catalogs which are provided by entrypoints. These can be set by users in their pyproject.toml as a "hydromt.catalogs" entrypoint and are discovered in predefined_catalogs.py. For the core these are bypassed as LOCAL_EPS. I'm however also fine with skipping this test for now as @savente93 has done a really nice job in v1 to generalize all entrypoints in plugins.py. It would be nice to use that after merging this work in v1 too in which case the tests will probably have te be rewritten. Compared to the other entrypoints this refers to a |
@Tjalling-dejong if you want me to go over how we did |
@Tjalling-dejong can you please update the title, description and checklist? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have to finish the full review later, but here are some preliminary comments.
data/catalogs/update_versions.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add an explanation on how to call this script? e.g. does it take arguments? if so which?
hydromt/predefined_catalogs.py
Outdated
|
||
def _get_file_hash(file_path: Path) -> str: | ||
"""Get the md5 hash of a file.""" | ||
hash_func = hashlib.md5() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you use SHA-1? that's what git uses which I think is more consistent.
@savente93 Can you review this PR? |
@Tjalling-dejong I'll have a more indepth look in a bit, the failing docs are okay, but there are still a few issues flagged my sonar cube and the PR description is still empty. Could you fix those in the mean time please |
So, perhaps a stupid question but: is this change backwards compatible? i.e. will previous versions of hydromt break if we merge this? (I'll do a review regardless but I'd like some confirmation that this won't break things again before we merge) |
I dont think it will break previous versions of hydromt. It works even if you do not specify a data catalog version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work Tjalling! I like the descriptions and testing. Just a couple small comments left and then it should be good to go :)
Co-authored-by: Sam Vente <savente93@gmail.com>
Co-authored-by: Sam Vente <savente93@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but Jaap said he wanted to review as well so I'll leave the final approve to him
This reverts commit 07537c7.
Issue addressed
Fixes #844
Explanation
This PR updates the predefined catalogs of HydroMT. The catalogs are now stored in their own folder with each version having its own data catalog. The versions are tracked in a registry.txt. This file contains the relative path and the hash of the data catalog.
Before the different versions of datacatalogs were retrieved by using the git hashes stored in predefinend_catalogs.yml. This file has henceforth been deprecated. Instead a PredefinedCatalog class has been created to retrieve specific versions of predefined data catalogs.
Example of retrieving different versions of data catalogs:
If you want to create your own version of deltares_data for example, you should make a new folder in the
data/catalogs/deltares_data/
with the version as its name and put the data catalog file there and call name itdata_catalog.yml
. If you then run theupdate_versions.py
script to update the registry file.Checklist
main
Additional Notes (optional)
Add any additional notes or information that may be helpful.