Skip to content

DagsterInvariantViolationError During Bulk Backfill with Custom PartitionMapping (Mapping Logic Bypassed) #30109

@adnaniazi

Description

@adnaniazi

What's the issue?

When initiating a bulk backfill (materializing multiple partitions simultaneously) from the Dagster UI for a downstream asset that uses a custom PartitionMapping, a DagsterInvariantViolationError occurs. The error message suggests that Dagster is incorrectly attempting to validate the downstream asset's partition keys directly against the upstream asset's partition definition, instead of using the partition keys transformed by the custom PartitionMapping. This indicates that the custom partition mapping logic is being bypassed or ignored during the backfill planning phase for bulk operations.

What did you expect to happen?

  • I expected Dagster to correctly invoke the custom PartitionMapping (ReproGenericPartitionMapping in the provided example) during the planning phase of the bulk backfill.

  • This mapping should transform the downstream partition keys (e.g., groupA/item1) into the corresponding upstream partition keys (e.g., groupA/item1/fileA.config, groupA/item1/fileB.config).

  • The backfill process should then validate these correctly mapped upstream keys. If the (correctly mapped) upstream partitions exist, the backfill should proceed.

  • Debugger breakpoints or logs within the custom PartitionMapping should be hit during the bulk backfill, just as they are during a successful single-partition materialization.

  • The DagsterInvariantViolationError related to "invalid partitions" (where the invalid partitions listed are actually the unmapped downstream keys) should not occur if the mapping is correctly applied.

How to reproduce?

  • Save the attached Python code as repro_bug_report.py.
  • Run dagster dev -f repro_bug_report.py in your terminal.
  • Open the Dagster UI (usually http://127.0.0.1:3000).

Add Dynamic Partitions:
Navigate to the Assets page, then click the Partitions tab.
For upstream_generic_partitions_repro, add the following partition keys one by one:

groupA/item1/fileA.config
groupA/item1/fileB.config
groupB/item2/fileA.config
groupB/item2/fileB.config

For downstream_generic_partitions_repro, add the following partition keys one by one:

groupA/item1
groupB/item2
  • Trigger Bulk Backfill:
  • Go back to the main Assets page (graph or list view).

Find the asset named processed_item_asset_repro.
Click "Materialize".

In the materialization dialog, select "All" partitions or manually select both groupA/item1 and groupB/item2.
Click "Launch backfill".

Observe the Error: The backfill will fail, and a DagsterInvariantViolationError will appear in the Dagster UI and/or the terminal logs for dagster-daemon or dagster-webserver. The error message will be similar to:

DagsterInvariantViolationError: Asset partition subset EntitySubset<AssetKey(['processed_item_asset_repro'])>(...) depends on invalid partitions EntitySubset<AssetKey(['source_file_asset_repro'])>(DefaultPartitionsSubset(subset={'groupA/item1', 'groupB/item2'}))

Verification (Single Partition):

  • Follow steps 1-4 above.
  • Find processed_item_asset_repro.
  • Click "Materialize".
  • Select only one partition (e.g., groupA/item1).
  • Launch the run. This run should succeed, and the logs from ReproGenericPartitionMapping (e.g., [ReproGenericPartitionMapping] Calling get_upstream_mapped_partitions_result_for_partitions...) should be visible in the run's detailed logs in the UI. This confirms the mapping works for single materializations.

Dagster version

dagster, version 1.10.15

Deployment type

Dagster Helm chart

Deployment details

No response

Additional information

  • The core of the issue seems to be that Dagster's backfill planner, specifically the logic around _should_backfill_atomic_asset_subset_unit (from dagster._core.execution.asset_backfill), does not utilize the provided custom PartitionMapping when processing multiple partitions in a bulk operation. Instead, it appears to default to an identity mapping for dependency validation.
  • This is evidenced by the fact that debugger breakpoints (or log statements) placed within the custom PartitionMapping's methods (get_upstream_mapped_partitions_result_for_partitions) are not hit when the bulk backfill fails, but they are hit when a single partition is materialized successfully.
  • The custom PartitionMapping in the repro (ReproGenericPartitionMapping) intentionally returns an empty required_but_nonexistent_subset in its UpstreamPartitionsResult to demonstrate that the error is not caused by the mapping itself identifying missing mapped upstream partitions, but rather by Dagster's premature validation using unmapped downstream keys.
  • The file repro_bug_report.py contains the minimal code to demonstrate this.

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

Metadata

Metadata

Assignees

No one assigned

    Labels

    type: bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions