-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
What's the issue?
When initiating a bulk backfill (materializing multiple partitions simultaneously) from the Dagster UI for a downstream asset that uses a custom PartitionMapping, a DagsterInvariantViolationError occurs. The error message suggests that Dagster is incorrectly attempting to validate the downstream asset's partition keys directly against the upstream asset's partition definition, instead of using the partition keys transformed by the custom PartitionMapping. This indicates that the custom partition mapping logic is being bypassed or ignored during the backfill planning phase for bulk operations.
What did you expect to happen?
-
I expected Dagster to correctly invoke the custom PartitionMapping (ReproGenericPartitionMapping in the provided example) during the planning phase of the bulk backfill.
-
This mapping should transform the downstream partition keys (e.g.,
groupA/item1) into the corresponding upstream partition keys (e.g.,groupA/item1/fileA.config,groupA/item1/fileB.config). -
The backfill process should then validate these correctly mapped upstream keys. If the (correctly mapped) upstream partitions exist, the backfill should proceed.
-
Debugger breakpoints or logs within the custom PartitionMapping should be hit during the bulk backfill, just as they are during a successful single-partition materialization.
-
The
DagsterInvariantViolationErrorrelated to "invalid partitions" (where the invalid partitions listed are actually the unmapped downstream keys) should not occur if the mapping is correctly applied.
How to reproduce?
- Save the attached Python code as
repro_bug_report.py. - Run
dagster dev -f repro_bug_report.pyin your terminal. - Open the Dagster UI (usually
http://127.0.0.1:3000).
Add Dynamic Partitions:
Navigate to the Assets page, then click the Partitions tab.
For upstream_generic_partitions_repro, add the following partition keys one by one:
groupA/item1/fileA.config
groupA/item1/fileB.config
groupB/item2/fileA.config
groupB/item2/fileB.config
For downstream_generic_partitions_repro, add the following partition keys one by one:
groupA/item1
groupB/item2
- Trigger Bulk Backfill:
- Go back to the main Assets page (graph or list view).
Find the asset named processed_item_asset_repro.
Click "Materialize".
In the materialization dialog, select "All" partitions or manually select both groupA/item1 and groupB/item2.
Click "Launch backfill".
Observe the Error: The backfill will fail, and a DagsterInvariantViolationError will appear in the Dagster UI and/or the terminal logs for dagster-daemon or dagster-webserver. The error message will be similar to:
DagsterInvariantViolationError: Asset partition subset EntitySubset<AssetKey(['processed_item_asset_repro'])>(...) depends on invalid partitions EntitySubset<AssetKey(['source_file_asset_repro'])>(DefaultPartitionsSubset(subset={'groupA/item1', 'groupB/item2'}))
Verification (Single Partition):
- Follow steps 1-4 above.
- Find processed_item_asset_repro.
- Click "Materialize".
- Select only one partition (e.g., groupA/item1).
- Launch the run. This run should succeed, and the logs from ReproGenericPartitionMapping (e.g., [ReproGenericPartitionMapping] Calling get_upstream_mapped_partitions_result_for_partitions...) should be visible in the run's detailed logs in the UI. This confirms the mapping works for single materializations.
Dagster version
dagster, version 1.10.15
Deployment type
Dagster Helm chart
Deployment details
No response
Additional information
- The core of the issue seems to be that Dagster's backfill planner, specifically the logic around
_should_backfill_atomic_asset_subset_unit(fromdagster._core.execution.asset_backfill), does not utilize the provided customPartitionMappingwhen processing multiple partitions in a bulk operation. Instead, it appears to default to an identity mapping for dependency validation. - This is evidenced by the fact that debugger breakpoints (or log statements) placed within the custom PartitionMapping's methods (
get_upstream_mapped_partitions_result_for_partitions) are not hit when the bulk backfill fails, but they are hit when a single partition is materialized successfully. - The custom
PartitionMappingin the repro (ReproGenericPartitionMapping) intentionally returns an empty required_but_nonexistent_subset in its UpstreamPartitionsResult to demonstrate that the error is not caused by the mapping itself identifying missing mapped upstream partitions, but rather by Dagster's premature validation using unmapped downstream keys. - The file
repro_bug_report.pycontains the minimal code to demonstrate this.
Message from the maintainers
Impacted by this issue? Give it a 👍! We factor engagement into prioritization.