New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(core): new dataset provenance #2181
feat(core): new dataset provenance #2181
Conversation
4fc27cc
to
b210037
Compare
renku/core/management/datasets.py
Outdated
|
||
def update_datasets_provenance(self, dataset, remove=False): | ||
@inject.autoparams() | ||
def update_datasets_provenance( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be cleaner to have two methods, add_or_update_dataset_provenance
and remove_dataset_provenance
?
It would also be cleaner to just call the methods on DatasetProvenance
directly and inject that where needed, since this method in LocalClient doesn't really contain any additional logic (especially since commit_database
isn't set to True
anywhere). In the future I'd like to rely on LocalClient
less and have the dedicated classes do more of the heavy lifting.
If we inject DatasetProvenance
using the binder.bind_to_constructor(Class, lambda: Class(*args))
way of injection, it would only initialise DatasetProvenance
if code that uses it (well, requests it as an injection) is called and then it would be a singleton, so we could just inject this, ProvenanceGraph
and DependencyGraph
this way in the with_database
CommandBuilder without performance overhead, and it'd save performance since we wouldn't have to do from_database
more than once.
renku/core/management/datasets.py
Outdated
@@ -1488,7 +1462,7 @@ def _check_url(url): | |||
if not is_git: | |||
# NOTE: Check if the url is a redirect. | |||
url = requests.head(url, allow_redirects=True).url | |||
u = parse.urlparse(url) | |||
_ = parse.urlparse(url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know I wrote this, but now I'm confused, does this line actually do anything other than raise an uncaught error if the redirect url is not a valid url? I feel like we could remove this line? Looks like a copy&paste error by me
b210037
to
246ffa4
Compare
1d2f216
to
eb75698
Compare
4162ff6
to
c3053ac
Compare
db58b29
to
54e4313
Compare
c3053ac
to
ec62fa7
Compare
Description
Adds two indexes to the new persistent layer:
datasets
anddatasets-provenance
. The first index is a mapping from dataset names to data objects for all current datasets in a project. The second index is a mapping from dataset id to dataset object; this index contains current datasets and all previous versions of all datasets.Fixes #2119