New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(core): make update work with new storage #2304
Conversation
37cb9a7
to
33631be
Compare
33631be
to
ec3bc7b
Compare
ec3bc7b
to
8ab311c
Compare
8ab311c
to
cff3d7d
Compare
86496f8
to
95be0ca
Compare
95be0ca
to
97146c1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really nice now. It's great seeing things finally come together!
all_activities = defaultdict(set) | ||
|
||
def have_identical_inputs_and_outputs(activity1, activity2): | ||
return sorted(u.entity.path for u in activity1.usages) == sorted( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think set()
instead of sorted()
would work just as well. Not that it matters but there's not really a reason for sorting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe there is not clear way of comparing activities. The idea here was to cover cases like cat A A B > C
and cat A B B > C
.
renku/core/commands/update.py
Outdated
if paths: | ||
# NOTE: Add the activity to check if it also matches the condition | ||
downstream_chains.append((activity,)) | ||
downstream_chains = [c for c in downstream_chains if any(g.entity.path in paths for g in c[-1].generations)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can users ask Renku to update a whole folder? if so we'd need to check it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they can. I've fixed this and added a test for it.
|
||
if len(activities) > 1: | ||
activity_collection = ActivityCollection(activities=activities) | ||
activity_gateway.add_activity_collection(activity_collection) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the "activity-collections"
only exists to enable some tests. it is a bit odd to me to store something in users' repositories that is only used for our tests, essentially littering the database with unneeded data.
I guess we could have a TestingActivityGateway(ActivityGateway)
in tests/
and inject that for tests, to only have this index when testing code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ActivityCollection
is to mark that these activities have been executed together as a result of an update or a rerun. So, it's not just for testing. I was not sure what other metadata we need to include here (specifically if we need a link in the Activity
to its ActivityCollection
if any). What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a link from ActivityCollection to Activity makes sense to me. Best to discuss it with the KG team as well
@@ -141,6 +141,9 @@ def workflow_graph(self): | |||
workflow_graph.add_node(node) | |||
continue | |||
|
|||
if not next(self.graph.predecessors(node), None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might only make a difference on very large repositories with complex workflows, but
intermediate_predecessor = next(self.graph.predecessors(node), None)
if not intermediate_predecessor:
continue
[...]
source = next(self.graph.predecessors(intermediate_predecessor), None)
would only have to calculate the predecessor once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It always helps with the code's readability.
name: str, | ||
derived_from: str = None, | ||
plans: List[Union["CompositePlan", Plan]] = None, | ||
plans: List[Plan] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be AbstractPlan
? A CompositePlan
created by a user could contain a CompositePlan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
I made it as it was before (Union["CompositePlan", Plan]
) which should also help with ide type linter.
45f3998
to
7279445
Compare
7279445
to
f57ccb2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Description
renku update
with the new storage/metadata. It has aPATHS
parameter which limits the update to the specified paths.Fixes #2257