diff --git a/core/connectors/sinks/mongodb_sink/README.md b/core/connectors/sinks/mongodb_sink/README.md index 550612cc3..5e0dd4cfa 100644 --- a/core/connectors/sinks/mongodb_sink/README.md +++ b/core/connectors/sinks/mongodb_sink/README.md @@ -137,10 +137,11 @@ This connector provides **at-least-once** delivery semantics. ### Behavior - Messages may be delivered more than once on retry or restart -- Uses Iggy message ID as MongoDB `_id` for document identity -- **Insert-only mode**: duplicate key error is a hard failure (not upsert) +- Uses a deterministic composite MongoDB `_id`: `stream:topic:partition:message_id` +- Duplicate key collisions are treated as idempotent replay of already-written messages +- The sink remains insert-only; it does not upsert existing documents ### Known Limitations -- On network timeout during insert, retry may cause duplicate key error -- Sink does not upsert on duplicate (future improvement) +- On network timeout during insert, MongoDB may partially commit a batch before returning an error +- The sink does not upsert on duplicate; replay safety relies on deterministic `_id` values and duplicate-key tolerance diff --git a/core/integration/tests/connectors/mongodb/mongodb_sink.rs b/core/integration/tests/connectors/mongodb/mongodb_sink.rs index 37c522e34..474b809ed 100644 --- a/core/integration/tests/connectors/mongodb/mongodb_sink.rs +++ b/core/integration/tests/connectors/mongodb/mongodb_sink.rs @@ -261,7 +261,7 @@ async fn large_batch_processed_correctly(harness: &TestHarness, fixture: MongoDb server(connectors_runtime(config_path = "tests/connectors/mongodb/sink.toml")), seed = seeds::connector_stream )] -async fn duplicate_key_is_explicit_failure_and_not_silent_success( +async fn duplicate_key_is_idempotent_replay_not_sink_error( harness: &TestHarness, fixture: MongoDbSinkFixture, ) {