Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(workflow): avoid unnecessary parent runs #1476

Merged
merged 11 commits into from Sep 10, 2020
Merged

Conversation

m-alisafaee
Copy link
Contributor

@m-alisafaee m-alisafaee commented Aug 18, 2020

Description

Reuse parent runs in WorkflowRun when possible. Do not create parent runs when there is only a single step. Do not serialize WorkflowRun.

Fixes #1399
Fixes #1400

@m-alisafaee m-alisafaee force-pushed the 1400-reuse-parent-run branch 4 times, most recently from 5cc582a to dc51321 Compare August 21, 2020 16:26
@m-alisafaee m-alisafaee marked this pull request as ready for review August 21, 2020 18:00
@m-alisafaee m-alisafaee requested a review from a team as a code owner August 21, 2020 18:00
@Panaetius
Copy link
Member

Panaetius commented Aug 25, 2020

I tested this a bit and it looks very nice. I did find an issue/inconsistency in how rerun/update works, but I'm not sure if this is related to this PR or if it's how things always worked.

Let's say we have the following situation:

input1 ---A--> intermediate1-----|
                                 |
                                 |--C--> output
                                 |
input2 --B--> intermediate2------|

With A,B,C being Run's.

And then I change input1 and run renku update, this yield a new wf yaml with a new parent and A and C as subprocess (not B since input2 was not changed).

Now I do renku rerun output, [1] which yields a new parent with A, B and C as subprocess. I guess it reruns B as well since it's a dependency for C? Already a bit weird.

I do another renku rerun output, but this time I get no parent, it merely reruns C and that's the only Run in the generated yaml. So I'd expect for two reruns in a row to have the same plans, but they differ.

Now I do a renku rerun --from input1 output and I get a new parent with steps A, B and C, but this should have the same parent as [1]. So this looks like a bug on the PR. I'm also not sure why this did not just rerun A and C without B, since --from should only follow the one path from my understanding.

On a more general level, I think we need to decide what rerun and update refer to for the planned redesign of graph building. If I do a renku update followed by a renku rerun somefile, does that mean that I (A) rerun the update (rerun all steps involved in the update) or (B) rerun the step that produced somefile(i.e. the last step of the update)? I'd vote for (B) but it's something we need to design.

Rerun only running the last step (instead of the whole DAG) is ok, to prevent a user from accidentally kicking off a huge chain of steps, but it's something we should communicate clearly. And then the --from flag is used to denote a node in the DAG that rerun should start from, and ONLY things downstream from nodes specified with --from should be rerun. But if a have a workflow like above and I edit input1 and do a renku update (so only A and C are part of this new workflow), and then do renku rerun --from input2 output (input2 was not part of the previous update), should this still work and would this also rerun A, since A was part of the preceding update but not of the branch of the DAG specified by --from?

I think it comes down to "We have one, and only one, big, possibly disconnected DAG at each point in time in the repo and each run/update/rerun modifies this DAG and acts upon it(deletes and adds edges)", in which case parent Run nodes are more virtual and only the children really matter; or if it's more "Each executed renku command is a separate DAG and all of them together form the lineage", with the latter being closer to what we have now, but with us trying to somehow glue together the single, big DAG from this. Something we should define clearly in the graph redesign. Maybe we could just have a single DAG datastructure consisting of (linked) Runs and the provenance just points to start/end nodes in that DAG?

The current implementation seems to do a bit of a mix of things, and I'm really not sure what we want to address in this PR and what should be held off for the graph building redesign.

@m-alisafaee
Copy link
Contributor Author

I've tested your scenario on master and it has the same issue with rerun with and without --from option; it runs the last step or all steps in turns. I believe this is an issue with graph generation. I don't think it's in the scope of this PR to fix that. I'll create an issue for that and we will decide either to fix it or to wait for the redesigned graph generation.
I've fixed the issue with not reusing a parent run by checking all WorkflowRuns that generate an output if we cannot find one in the graph.
Regarding the rerun behavior, I also think it should run only the last step that generated a file when running renku rerun file. Using --from option users can specify a different path to execute. Rerun should not include updated inputs automatically because that what update is for. We should discuss these cases in more details.

@m-alisafaee m-alisafaee force-pushed the 1400-reuse-parent-run branch 4 times, most recently from 327b204 to 543017d Compare August 31, 2020 14:00
Panaetius
Panaetius previously approved these changes Sep 8, 2020
@m-alisafaee m-alisafaee merged commit b908ffd into master Sep 10, 2020
@m-alisafaee m-alisafaee deleted the 1400-reuse-parent-run branch September 10, 2020 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reuse parent Runs if possible Don't create parent Runs if it is not necessary
3 participants