Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(core): make update work with new storage #2304

Merged
merged 4 commits into from Sep 14, 2021

Conversation

m-alisafaee
Copy link
Contributor

@m-alisafaee m-alisafaee commented Sep 1, 2021

Description

renku update with the new storage/metadata. It has a PATHS parameter which limits the update to the specified paths.

Fixes #2257

@m-alisafaee m-alisafaee changed the base branch from 2130-workflow-execute to master September 1, 2021 23:52
@m-alisafaee m-alisafaee marked this pull request as ready for review September 2, 2021 07:20
@m-alisafaee m-alisafaee requested a review from a team as a code owner September 2, 2021 07:20
renku/cli/update.py Outdated Show resolved Hide resolved
renku/core/commands/update.py Outdated Show resolved Hide resolved
renku/core/commands/update.py Outdated Show resolved Hide resolved
renku/core/models/workflow/composite_plan.py Outdated Show resolved Hide resolved
renku/core/commands/update.py Outdated Show resolved Hide resolved
@m-alisafaee m-alisafaee marked this pull request as draft September 6, 2021 22:16
@m-alisafaee m-alisafaee marked this pull request as ready for review September 7, 2021 14:38
@m-alisafaee m-alisafaee marked this pull request as draft September 8, 2021 11:57
@m-alisafaee m-alisafaee marked this pull request as ready for review September 10, 2021 08:20
Copy link
Member

@Panaetius Panaetius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really nice now. It's great seeing things finally come together!

renku/cli/update.py Outdated Show resolved Hide resolved
renku/core/utils/git.py Outdated Show resolved Hide resolved
all_activities = defaultdict(set)

def have_identical_inputs_and_outputs(activity1, activity2):
return sorted(u.entity.path for u in activity1.usages) == sorted(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think set() instead of sorted() would work just as well. Not that it matters but there's not really a reason for sorting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there is not clear way of comparing activities. The idea here was to cover cases like cat A A B > C and cat A B B > C.

if paths:
# NOTE: Add the activity to check if it also matches the condition
downstream_chains.append((activity,))
downstream_chains = [c for c in downstream_chains if any(g.entity.path in paths for g in c[-1].generations)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can users ask Renku to update a whole folder? if so we'd need to check it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they can. I've fixed this and added a test for it.

renku/core/commands/update.py Outdated Show resolved Hide resolved

if len(activities) > 1:
activity_collection = ActivityCollection(activities=activities)
activity_gateway.add_activity_collection(activity_collection)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the "activity-collections" only exists to enable some tests. it is a bit odd to me to store something in users' repositories that is only used for our tests, essentially littering the database with unneeded data.

I guess we could have a TestingActivityGateway(ActivityGateway) in tests/ and inject that for tests, to only have this index when testing code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ActivityCollection is to mark that these activities have been executed together as a result of an update or a rerun. So, it's not just for testing. I was not sure what other metadata we need to include here (specifically if we need a link in the Activity to its ActivityCollection if any). What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a link from ActivityCollection to Activity makes sense to me. Best to discuss it with the KG team as well

@@ -141,6 +141,9 @@ def workflow_graph(self):
workflow_graph.add_node(node)
continue

if not next(self.graph.predecessors(node), None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might only make a difference on very large repositories with complex workflows, but

intermediate_predecessor = next(self.graph.predecessors(node), None)
if not intermediate_predecessor:
    continue
[...]
source = next(self.graph.predecessors(intermediate_predecessor), None)

would only have to calculate the predecessor once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It always helps with the code's readability.

name: str,
derived_from: str = None,
plans: List[Union["CompositePlan", Plan]] = None,
plans: List[Plan] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be AbstractPlan ? A CompositePlan created by a user could contain a CompositePlan

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
I made it as it was before (Union["CompositePlan", Plan]) which should also help with ide type linter.

@m-alisafaee m-alisafaee marked this pull request as draft September 13, 2021 12:49
@m-alisafaee m-alisafaee force-pushed the 2257-new-renku-update branch 6 times, most recently from 45f3998 to 7279445 Compare September 13, 2021 16:21
@m-alisafaee m-alisafaee marked this pull request as ready for review September 13, 2021 17:46
Copy link
Member

@Panaetius Panaetius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@m-alisafaee m-alisafaee merged commit c047ed9 into master Sep 14, 2021
@m-alisafaee m-alisafaee deleted the 2257-new-renku-update branch September 14, 2021 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

implement renku update with new database
2 participants