Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2/n] Implement application logics to delete and handle deleted artifact contents #1399

Conversation

likawind
Copy link
Contributor

@likawind likawind commented Jun 1, 2023

Describe your changes and why you are making these changes

This PR adds application logics to handle artifact content deletion if the user choose to opt-out snapshotting. The outline of the work lifecycle follows these steps:

  • During wf registration, we assume client provides
    the right should_persist tag to each artifact based on user inputs
  • During wf preview, no change is made there
  • During wf execution, no change is made on the execution logic.
  • During wf execution, we modified DB updates such that artifact with should_persist will be directly marked as 'deleted'
  • After wf execution, we added a step in eng.ExecuteWorkflow() that deletes all necessary artf contents. Note that this cannot be done during execution as downstream op may need still need to read the contents.
  • All APIs that read results returns content whenever possible regardless of exec state. Thus no change is necessary. The only caveat is that during execution, there will be time where the artf is marked as deleted while the content is still available, and we may want to have the client hide the content to provide consistent user view.

Related issue number (if any)

ENG-3000

Tests

  • Integration test should pass on this PR
  • We will test through this feature in follow up PRs after client side changes are done

Checklist before requesting a review

  • I have created a descriptive PR title. The PR title should complete the sentence "This PR...".
  • I have performed a self-review of my code.
  • I have included a small demo of the changes. For the UI, this would be a screenshot or a Loom video.
  • If this is a new feature, I have added unit tests and integration tests.
  • I have run the integration tests locally and they are passing.
  • I have run the linter script locally (See python3 scripts/run_linters.py -h for usage).
  • All features on the UI continue to work correctly.
  • Added one of the following CI labels:
    • run_integration_test: Runs integration tests
    • skip_integration_test: Skips integration tests (Should be used when changes are ONLY documentation/UI)

Copy link
Contributor

@kenxu95 kenxu95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All APIs that read results returns content whenever possible regardless of exec state. Thus no change is necessary. The only caveat is that during execution, there will be time where the artf is marked as deleted while the content is still available, and we may want to have the client hide the content to provide consistent user view.

Is this consistent view addressed in this PR, or is that a follow-up? I assume there will be additional PRs that deal with the client?

src/golang/lib/engine/aq_engine.go Outdated Show resolved Hide resolved
@@ -23,7 +23,11 @@ const (
CanceledExecutionStatus ExecutionStatus = "canceled"
FailedExecutionStatus ExecutionStatus = "failed"
SucceededExecutionStatus ExecutionStatus = "succeeded"
UnknownExecutionStatus ExecutionStatus = "unknown"
// 'erased' refers to 'erased after success'.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is "erased" here mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here is a good place to lay out what the lifecycle is, with an ascii diagram:
Registered -> Pending -> Running -> Failed
.......................................................-> Cancelled
.......................................................-> Succeeded -> Deleted [Optional]

@@ -25,6 +25,12 @@ type Artifact interface {
Signature() uuid.UUID
Type() shared.ArtifactType
Name() string
ShouldPersistContent() bool

// DeleteContent removes the artifact content from storage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you note that this only deletes the latest artifact result? Also, what happens if the artifact content was never written due to workflow failure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It only try to delete the content when it's computed, we can clarify that. I don't think it's the latest result, in fact in the context of this artifact object (which has a result attached), we delete content of that result

@@ -147,6 +147,15 @@ func (eng *aqEngine) ScheduleWorkflow(
return nil
}

func (eng *aqEngine) deleteTemporaryArtifactContents(ctx context.Context, dag dag_utils.WorkflowDag) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you note here that this is best effort. Also, that it only deletes the latest artifact result right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a bit weird to explicitly note about the latest result. Similar to the other commend, current dag / op / artf objects have a result included and is used to handle the lifecycle of 'one workflow execution'. And all operations over these objects are tied to that execution. In this way I feel which result to delete is clear from the context? Also technically, if we reconstruct a workflow from an older run, we can use the same method to delete contents from that run as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current dag / op / artf objects have a result included and is used to handle the lifecycle of 'one workflow execution'.

I think I'm confused cus you can execute the same dag / op/ artf multiple times. An artifact is different than an artifact result so as soon as I see this method signature I'm thinking "Which artifact result am I deleting". But ok I think I see what you mean, an artf in orchestration means something different than an artifact in our data model? This seemed like implicit context I wanted to call out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think our current abstraction allows us to re-execute a dag by calling ExecuteWorkflow(dag) multiple times? If you look at the constructor, we explicitly require a result ID to pass in to construct the WorkflowDag object: NewWorkflowDag(..., dagResultID,...) . In this way I think our abstraction is explicitly designed to have one object track one result.

I think it's a good idea to clarify this context somewhere, and I will add some comments to WorkflowDag object's definition. I just don't feel it's specific to this feature so we don't have to call out that on the specific API we added

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm not saying I want us to change the way we execute, I just wanted that context clarified somewhere cus I do think it's an important callout. Agreed we can put it at a higher layer (maybe in orchestration or as you said, in workflowDag)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sg! I updated comments for wfDag / artf / op objects, lmk if you have suggestions on where else we can clarify this

@likawind
Copy link
Contributor Author

likawind commented Jun 1, 2023

@kenxu95 yes client change will be in future PRs

@likawind
Copy link
Contributor Author

likawind commented Jun 1, 2023

@kenxu95 updated comments!

@@ -372,6 +372,10 @@ func (bo *baseOperator) PersistResult(ctx context.Context) error {
artifactExecState.Status = shared.CanceledExecutionStatus
}

if artifactExecState.Status == shared.SucceededExecutionStatus && !outputArtifact.ShouldPersistContent() {
artifactExecState.Status = shared.DeletedExecutionStatus
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This setting of status to Deletion before the artifact is even deleted seems confusing actually, can we not update the status in DeleteContent() instead? That way all the deletion logic is handled in one place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this way the artifact will be marked as succeeded during execution and deleted later. I'm not sure if it's going to confuse users since they will see the intermediate succeed state

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - I actually find that less confusing since it very clearly follows the lifecycle, where deleted is just a future state of succeeded. Right now, we technically would have operators using artifacts already marked as deleted, which seems weirder?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, let's do what you suggested

@likawind likawind requested a review from kenxu95 June 1, 2023 23:25
Copy link
Contributor

@kenxu95 kenxu95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unblocking with a couple final thoughts. Thanks!

src/golang/lib/engine/aq_engine.go Outdated Show resolved Hide resolved
} else {
a.execState.Status = shared.DeletedExecutionStatus
a.execState.Timestamps.DeletedAt = &now
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like logic we could potentially push down into SetExecState()? If that helper can make updates more additively, it'll simplify all the callers a lot.

@likawind likawind merged commit 836e1d2 into eng-3000-allow-users-to-opt-out-of-data-snapshots-1 Jun 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants