You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a Renku user, once I removed a dataset from a project, I don't want this dataset to be listed as that project's dataset anymore.
Acceptance criteria:
drive the change from acceptance-tests;
triples-generator to feed info about deleted files to the renku log command while generating triples;
GET /knowledge-graph/datasets NOT to return a removed dataset if it existed on a single project where it was removed or it existed on multiple forks of the project and it's removed on all of them; if a removed dataset is not removed on some forks it should be still searchable;
GET /knowledge-graph/datasets/:id to return NOT FOUND (404) when the requested id is the identifier of the deleted dataset with the assumption there are no forks sharing the same dataset or the dataset is removed on all the projects; the resource to return the dataset details if it's shared between at least one fork and it's NOT removed on all the projects;
GET /knowledge-graph/projects/:namespace/:name/datasets NOT to return datasets if they were deleted on the project with the given namespace and name even if they are not removed on projects sharing exactly the same dataset (fork case);
think of a case when a parent dataset (in terms of dataset modification) was removed; probably the wasDerivedFrom triple should be removed from the direct child dataset.
Original acceptance criteria:
Option 1:
the triples curation process to look for removed data-sets metadata files in project's .renku/datasets folder;
removed here means deleted in a sense of a particular commit (probably git diff --name-only --diff-filter=D HEAD~1..HEAD could be a choice here);
do nothing when no deleted metadata files are found;
if deleted metadata files are found, extract datasets Identifiers from the paths (e.g. if a deleted file is .renku/datasets/c42f08db-27f4-44d0-9b55-6dfe6ca96ec9/metadata.yml, the identifier is c42f08db-27f4-44d0-9b55-6dfe6ca96ec9);
generate a delete query removing schema:isPartOf link between the dataset having found Identifier and the project the triples are generated for;
generate a delete query removing the whole dataset entity if:
it's not linked to any other project;
there are no descendant datasets;
Option 2:
the triples curation process to look for removed data-sets metadata files in project's .renku/datasets folder;
once deleted dataset is identified (see the first approach above), the triples curation process do not generate dataset entity removal query but inserts invalidatedBy triple pointing to the commit Activity where the dataset metadata was removed;
the only problem I can see here is that the dataset finding queries would get more complicated (and they are already complicated enough); the reason for that is that all the queries reaching to a dataset entity would have to check if there's an invalidatedBy link to a commit Activity of a certain project as I suppose we wouldn't be unlinking project from a dataset;
the KG queries have to be able to deal with the eventuality of multiple invalidateBy links on a single dataset (the case of project forks);
cross-check whether the above assumptions hold once we play the dataset immutability issue on renku-python.
Option 3:
there are no changes done to a removed dataset entity but only the dataset queries are updated so the look for files that get invalidated; if all dataset's parts (effectively underlying files) are invalidated then such dataset should not be retrieved by the queries;
I imagine this approach would make KG queries very complicated and the complexity would have to be repeated to all the queries touching dataset entities;
The text was updated successfully, but these errors were encountered:
As a Renku user, once I removed a dataset from a project, I don't want this dataset to be listed as that project's dataset anymore.
Acceptance criteria:
renku log
command while generating triples;GET /knowledge-graph/datasets
NOT to return a removed dataset if it existed on a single project where it was removed or it existed on multiple forks of the project and it's removed on all of them; if a removed dataset is not removed on some forks it should be still searchable;GET /knowledge-graph/datasets/:id
to returnNOT FOUND (404)
when the requestedid
is the identifier of the deleted dataset with the assumption there are no forks sharing the same dataset or the dataset is removed on all the projects; the resource to return the dataset details if it's shared between at least one fork and it's NOT removed on all the projects;GET /knowledge-graph/projects/:namespace/:name/datasets
NOT to return datasets if they were deleted on the project with the givennamespace
andname
even if they are not removed on projects sharing exactly the same dataset (fork case);wasDerivedFrom
triple should be removed from the direct child dataset.Original acceptance criteria:
Option 1:
.renku/datasets
folder;git diff --name-only --diff-filter=D HEAD~1..HEAD
could be a choice here);.renku/datasets/c42f08db-27f4-44d0-9b55-6dfe6ca96ec9/metadata.yml
, the identifier isc42f08db-27f4-44d0-9b55-6dfe6ca96ec9
);schema:isPartOf
link between the dataset having found Identifier and the project the triples are generated for;Option 2:
.renku/datasets
folder;invalidatedBy
triple pointing to the commit Activity where the dataset metadata was removed;invalidatedBy
link to a commit Activity of a certain project as I suppose we wouldn't be unlinking project from a dataset;invalidateBy
links on a single dataset (the case of project forks);Option 3:
The text was updated successfully, but these errors were encountered: