Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[3/n] Allows user to disable snapshots from SDK #1400

Conversation

likawind
Copy link
Contributor

@likawind likawind commented Jun 1, 2023

Describe your changes and why you are making these changes

This PR adds SDK changes to disable snapshots on both artf and wf level. The API is implemented based on https://aqueducthq.slack.com/archives/C01U6RREC7M/p1685489357239879 . And the behavior looks like the following:

  • user can enable / disable snapshots each individual artifact (enabled by default)
  • when user turns off workflow level snapshot at publish_flow, this is equivalent to call disable_snapshots for all artifacts from non param / metrics / checks operators. In other words, if the user explicitly disable snapshots for those excluded artifacts, the option will be respected.
  • See the manual QA example as a reference to the above behavior

Related issue number (if any)

ENG-3000

Tests

Verified the manual QA as a part of #1401 .

Checklist before requesting a review

  • I have created a descriptive PR title. The PR title should complete the sentence "This PR...".
  • I have performed a self-review of my code.
  • I have included a small demo of the changes. For the UI, this would be a screenshot or a Loom video.
  • If this is a new feature, I have added unit tests and integration tests.
  • I have run the integration tests locally and they are passing.
  • I have run the linter script locally (See python3 scripts/run_linters.py -h for usage).
  • All features on the UI continue to work correctly.
  • Added one of the following CI labels:
    • run_integration_test: Runs integration tests
    • skip_integration_test: Skips integration tests (Should be used when changes are ONLY documentation/UI)

return count < 10


def deploy(client, resource_name):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vsreekanti would like your input to confirm this call chain should result in:

  • reviews, op_artf and check_disabled_artf to be deleted
  • outputs of metrics and check are persisted

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I feel allowing disabling check artf is very confusing. I'm wondering if we should just return an error if the user attempts to do so. Maybe the same applies to params.

Disabling metric artf makes sense to me though, since esp. table metrics can have sensitive content as they are just a grammar sugar over regular operators.

Comment on lines 606 to 608
disable_snapshots:
If set 'True', disables snapshots for all artifacts not generating from params / metrics / checks
by calling `.disable_snapshot()` on each of these artifacts.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence seems fragmented and I am a bit confused about what it means. Pasting it to GPT gave me the following:

disable_snapshots:
    If set to 'True', this option disables snapshots for all artifacts that are not generated from parameters, metrics, or checks. It achieves this by calling the `.disable_snapshot()` method on each of these artifacts.

Is this what you're trying to say? @likawind

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thank you :)

Copy link
Contributor

@kenxu95 kenxu95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you paste a loom demo of the manual_qa_tests?

def enable_snapshot(self) -> None:
self._dag.update_artifact_should_persist(self._artifact_id, True)

def disable_snapshot(self) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the set_ prefix for any write methods?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been discussed in https://aqueducthq.slack.com/archives/C01U6RREC7M/p1685489357239879 and given it's this close to release, I'd stick with the decision than making more back-and-forth to the channel. We've been using other verbs like bound max over artifacts so I don't see it as a huge misalignment as long as they start with verbs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I didn't realize this was user-facing actually. That's good for now yea

def update_artifact_should_persist(self, artifact_id: uuid.UUID, should_persist: bool) -> None:
self.must_get_artifact(artifact_id).should_persist = should_persist

def disable_snapshots_from_non_params_metrics_checks(self) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally find this name a little confusing. I'd prefer if we kept the method more generic disable_snapshots() and added a nice docstring here indicating that parameters, metric and check artifacts are excluded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually prefer the naming to be self-descriptive esp. that here the underlying assumption is not clear to whoever scan through the caller codes.

Screenshot 2023-06-05 at 2 21 06 PM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about disable_snapshots_not_from_params_metrics_checks ?

Copy link
Contributor

@kenxu95 kenxu95 Jun 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also prefer my methods to be meaningful, but the description needs to stop somewhere. In this case, it feels like we're leaking implementation details into the method name. Specifically, I'd rather not be listing things in my names (params, metrics, checks), since it's not scalable or very readable imo (also not consistent with rest of code). Nbd if you want to keep it, it just jumped out at me.

Copy link
Contributor Author

@likawind likawind Jun 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm more worried that disable_snapshot is misleading because we don't actually disable all snapshots. I do agree that this name is too long and what you mentioned about listing stuff makes sense. Welcome to advices as long as we somehow indicate only some snapshots get disabled here :)

Copy link
Contributor Author

@likawind likawind Jun 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking about things like disable_user_snapshots but this is also inaccurate..

disable_user_data_snapshots?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to disable_sensitive_snapshots. sensitive may confuse reader but that should lead them to the comment where we clarify things

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mm does disable_op_snapshots or disable_op_output_snapshots work? No strong opinions as long as we document it in the function.

Copy link
Contributor Author

@likawind likawind Jun 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah sounds good, I guess I just don't want it look too harmless so that caller reader would try to look at the docs, rather than skip over it with some wrong assumptions

@@ -14,6 +14,7 @@ class ArtifactMetadata(BaseModel):
# If true, this artifact name is expected to be unique in the DAG.
explicitly_named: bool = False
from_local_data: bool = False
should_persist: bool = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment here about what should_persist means.

@likawind
Copy link
Contributor Author

likawind commented Jun 6, 2023

@kenxu95 manual QA is updated in #1401

@kenxu95 kenxu95 self-requested a review June 6, 2023 00:12
@likawind likawind merged commit f557240 into eng-3000-allow-users-to-opt-out-of-data-snapshots-2 Jun 6, 2023
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants