New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable different start/end partitions defs in TimeWindowPartitionMapping #14449
Enable different start/end partitions defs in TimeWindowPartitionMapping #14449
Conversation
2dc33af
to
6defb07
Compare
@@ -298,17 +298,6 @@ def __str__(self) -> str: | |||
) | |||
return partition_def_str | |||
|
|||
def __eq__(self, other): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests in this PR exposed a bug with this function. We recently added an end
param to time window partitions defs, and since end
is not included in this function, this function falsely evaluated defs with different end values as equivalent.
I don't think this function is necessary though, since this class is a named tuple so it should already have an __eq__
function that we don't need to maintain like this one.
This reminds me of the discussion we've been having about a partitioned asset that snapshots an unpartitioned asset. As I mentioned on #14305 (comment), I think we should try to come up with some vocabulary to talk about the situation where a partition isn't materializable because the upstream data that would be required to compute it isn't present. In the future, instead of just raising an error, the UI could actually give an indication that this is the case. E.g. if you tried to select those partitions in the backfill dialog, we could prevent you and pop up a tooltip explaining the situation. I don't think we need to do that now, but I think we should try to find a name that would still make sense if/when we do do it. |
6defb07
to
dc9318b
Compare
Btw should we mark this parameter as experimental just to be safe? |
dc9318b
to
d06581a
Compare
d06581a
to
4f698dd
Compare
Removed support for the following edge case:
@sryza would appreciate a second look at this. Previously this PR raised an error when encountering the edge case above regardless of if If you are in the weird edge case above where an hourly is downstream of a daily, if I also made the new arg experimental. |
Background here: https://www.notion.so/dagster/Handle-invalid-parents-in-auto-materialize-5fe6dbdfba0f43a6beae88fb61ccc07d?pvs=4 This PR builds upon #14449 and introduces graceful handling for auto-materialize cases where an invalid parent exists. Errors are currently raised in `PartitionMapping.get_upstream_partitions_for_partitions` when mapped upstream partitions were nonexistent. It is good that an error occurs when launching backfills and runs in these cases, but undesirable for the error to occur in auto-materialization. This PR introduces logic to gracefully handle invalid parents without erroring in auto-materialize. **Modifies the auto-materialize logic:** - Only considers partitions that can be materialized. If a partition has invalid upstreams, it should not be possible to auto-materialize it. - Modifies `is_reconciled` to recursively check only valid upstream partitions. If all of a partition's parents are invalid, then returns `True` if the partition is materialized, `False` if not. This enables reconciling downstream assets.
Requested by a user: https://dagster.slack.com/archives/C01U954MEER/p1682972599494859
Sometimes, a downstream time-partitioned asset Z might be dependent on an upstream time-partitioned asset Y that starts later than B. For example:
Selecting one of these partitions with a nonexistent upstream errors within a call to
get_upstream_partitions_for_partitions
.This PR enables this behavior by adding an additional
raise_error_on_nonexistent_upstream_partition
arg toTimeWindowPartitionsDefinition
, which when set to True (the default) raises an error when upstream partitions fall outside of the start-end time range of their partitions def.For the users above in this special case, they can set this bool to False. In this case, no error will be raised and the method will just filter for existent partitions. There are no changes to behavior for when
get_downstream_partitions_for_partitions
is called.