Skip to content

Merge Strategy Configuration Inconsistency Between Code and Schema #2412

@kang8

Description

@kang8

dlt version

dlt 1.8.0

Describe the problem

When changing a resource's merge strategy from "upsert" to "delete-insert" in code, the change is not reflected in the "dlt_version" table's schema. The pipeline continues to use the previously defined "upsert" strategy instead of the newly specified "delete-insert" strategy in the code. This leads to unexpected behavior where duplicates are not handled as intended by the new merge strategy.

Expected behavior

When there's a mismatch between the merge strategy specified in code and the one stored in the "dlt_version" schema, dlt should either:

  1. Update the schema automatically to reflect the new merge strategy, or preferably
  2. Raise a clear error indicating the mismatch and suggesting a full refresh to update the schema (e.g., "The merge strategy in your code (delete-insert) doesn't match the one in the schema (upsert). Consider using full_refresh to update your schema.")

I suggest implementing solution #2 (raising an error when there's a mismatch) rather than automatically updating the schema, as this gives users more explicit control and understanding of what's happening with their pipeline.

Steps to reproduce

  1. Create a dlt pipeline with a resource using "upsert" merge strategy
  2. Run the pipeline to load some data
  3. Change the merge strategy in your code from "upsert" to "delete-insert"
  4. Run the pipeline again
  5. Observe that despite the code change, the "upsert" strategy is still being used
  6. Verify by checking that duplicate data isn't being handled according to "delete-insert" logic
  7. Run with full refresh (pipeline.run(full_refresh=True)) and confirm that now the correct "delete-insert" behavior is observed

Operating system

macOS, Linux

Runtime environment

Kubernetes

Python version

3.11

dlt data source

intercom contacts https://developers.intercom.com/docs/references/rest-api/api.intercom.io/contacts/searchcontacts

dlt destination

Snowflake

Other deployment details

No response

Additional information

This issue creates a confusing experience for users changing merge strategies. When changing from "upsert" to "delete-insert" to handle duplicates better, users expect the change to take effect immediately. Instead, the old behavior persists without any clear indication of why.

Metadata

Metadata

Assignees

Labels

questionFurther information is requested

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions