Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DS][24/n] Remove all AssetSubset references from the context and result objects #21612

Conversation

OwenKephart
Copy link
Contributor

Summary & Motivation

As title -- for the new codepaths, helps avoid a lot of convert_to_valid_asset_subset() business.

I didn't bother updating the AutoMaterializeRule side of the world in a nicer way, as this code will eventually be removed anyway.

Had to add some new methods to replace the & / | / - operators, this time properly labeled with "compute_"

How I Tested These Changes

Copy link
Contributor Author

OwenKephart commented May 2, 2024

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @OwenKephart and the rest of your teammates on Graphite Graphite

@OwenKephart OwenKephart force-pushed the 05-01-Update_SchedulingContext_object branch from 132764e to 5744f34 Compare May 2, 2024 22:59
@OwenKephart OwenKephart force-pushed the 05-02-Remove_all_AssetSubset_references_from_the_context_and_result_objects branch from b9355f4 to 6baf92a Compare May 2, 2024 22:59
@OwenKephart OwenKephart force-pushed the 05-01-Update_SchedulingContext_object branch from 5744f34 to 8bc6892 Compare May 2, 2024 23:59
@OwenKephart OwenKephart force-pushed the 05-02-Remove_all_AssetSubset_references_from_the_context_and_result_objects branch from 6baf92a to e5e6ed8 Compare May 2, 2024 23:59
Comment on lines +167 to +176
def compute_union(self, other: "AssetSlice") -> "AssetSlice":
return _slice_from_subset(
self._asset_graph_view, self._compatible_subset | other.convert_to_valid_asset_subset()
)

def compute_intersection(self, other: "AssetSlice") -> "AssetSlice":
return _slice_from_subset(
self._asset_graph_view, self._compatible_subset & other.convert_to_valid_asset_subset()
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm i don't think we need to use the compute language here right? This is guaranteed to be fairly fast I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at the moment there's one case where things get a bit rough, which is if you have an in-memory time-window subset which only contains the time windows, not the partition keys themselves.

in order to compute the intersection, the current logic has to compute the partition keys for those time windows.

we could theoretically make this more efficient if you have time window to time window intersections (just intersect the time windows themselves), but I think there will always be the case where you have one subset that's encoded in terms of partition keys and another subset encoded with time windows, and it's non-negligible to convert one to the other

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah and then there's the especially-nasty case of multi-partitions definitions, which would require even more cleverness to make efficient in all cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Very happy we are doing compute then instead of an operator that might just silently do that!

@@ -433,7 +435,8 @@ def evaluate_for_asset(self, context: "SchedulingContext") -> "SchedulingResult"
ignore_subset=context.legacy_context.materialized_requested_or_discarded_since_previous_tick_subset,
)
)
return SchedulingResult.create(context, true_subset, subsets_with_metadata)
true_slice = context.asset_graph_view.get_asset_slice_from_subset(true_subset)
return SchedulingResult.create(context, true_slice, subsets_with_metadata)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gentle reminder that i do want to revist the true_slice verbiage

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noted !

@OwenKephart OwenKephart force-pushed the 05-01-Update_SchedulingContext_object branch from 8bc6892 to 7b1cf10 Compare May 3, 2024 16:58
@OwenKephart OwenKephart force-pushed the 05-02-Remove_all_AssetSubset_references_from_the_context_and_result_objects branch from e5e6ed8 to 41e4314 Compare May 3, 2024 16:58
@OwenKephart OwenKephart force-pushed the 05-01-Update_SchedulingContext_object branch from 7b1cf10 to 45c84a3 Compare May 3, 2024 21:11
@OwenKephart OwenKephart force-pushed the 05-02-Remove_all_AssetSubset_references_from_the_context_and_result_objects branch from 41e4314 to a03b80f Compare May 3, 2024 21:11
Copy link
Contributor Author

OwenKephart commented May 3, 2024

Merge activity

  • May 3, 5:31 PM EDT: @OwenKephart started a stack merge that includes this pull request via Graphite.
  • May 3, 6:25 PM EDT: Graphite rebased this pull request as part of a merge.
  • May 3, 6:28 PM EDT: @OwenKephart merged this pull request with Graphite.

@OwenKephart OwenKephart force-pushed the 05-01-Update_SchedulingContext_object branch from 45c84a3 to 9fffe56 Compare May 3, 2024 22:22
Base automatically changed from 05-01-Update_SchedulingContext_object to master May 3, 2024 22:23
@OwenKephart OwenKephart force-pushed the 05-02-Remove_all_AssetSubset_references_from_the_context_and_result_objects branch from a03b80f to 2fb533a Compare May 3, 2024 22:24
@OwenKephart OwenKephart merged commit 37a50be into master May 3, 2024
1 check was pending
@OwenKephart OwenKephart deleted the 05-02-Remove_all_AssetSubset_references_from_the_context_and_result_objects branch May 3, 2024 22:28
cmpadden pushed a commit that referenced this pull request May 6, 2024
…ult objects (#21612)

## Summary & Motivation

As title -- for the new codepaths, helps avoid a lot of convert_to_valid_asset_subset() business.

I didn't bother updating the AutoMaterializeRule side of the world in a nicer way, as this code will eventually be removed anyway.

Had to add some new methods to replace the & / | / - operators, this time properly labeled with "compute_"

## How I Tested These Changes
Comment on lines +71 to +72
# construct is used here for performance
return SchedulingContext.construct(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

X

danielgafni pushed a commit to danielgafni/dagster that referenced this pull request Jun 18, 2024
…ult objects (dagster-io#21612)

## Summary & Motivation

As title -- for the new codepaths, helps avoid a lot of convert_to_valid_asset_subset() business.

I didn't bother updating the AutoMaterializeRule side of the world in a nicer way, as this code will eventually be removed anyway.

Had to add some new methods to replace the & / | / - operators, this time properly labeled with "compute_"

## How I Tested These Changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants