Skip to content

[Bug]: MongoDB Output transform fails to match on ObjectId fields after BSON 5.x upgrade #7183

@ttideveloper

Description

@ttideveloper

Apache Hop version?

2.17.0

Java version?

openjdk version "17.0.19" 2026-04-21 LTS OpenJDK Runtime Environment Microsoft-13877129 (build 17.0.19+10-LTS) OpenJDK 64-Bit Server VM Microsoft-13877129 (build 17.0.19+10-LTS, mixed mode, sharing)

Operating system

Windows

What happened?

Component: MongoDB Plugin (hop-transform-mongodb)

Description:

After the upgrade to BSON driver 5.6.1 in Hop 2.17, the MongoDB Output transform can no longer use _id (or any ObjectId-typed field) as an update match field. This is a regression from Hop 2.16.

Root cause: MongoDbOutputData.setMongoValueFromHopValue() calls Document.parse() on the incoming field value when json_field=Y. For ObjectId fields, the value is MongoDB extended JSON: {"$oid": "6a15efdd0a93aa5ed3b4b947"}. In BSON 4.x, Document.parse() handled this correctly. In BSON 5.x, the parser recognizes {"$oid": "..."} as an OBJECT_ID type at the top level and throws BsonInvalidOperationException because readStartDocument expects DOCUMENT, not OBJECT_ID.

Stack trace:

org.bson.BsonInvalidOperationException: readStartDocument can only be called when CurrentBSONType is DOCUMENT, not when CurrentBSONType is OBJECT_ID.
at org.bson.AbstractBsonReader.verifyBSONType(AbstractBsonReader.java:689)
at org.bson.AbstractBsonReader.checkPreconditions(AbstractBsonReader.java:721)
at org.bson.AbstractBsonReader.readStartDocument(AbstractBsonReader.java:449)
at org.bson.codecs.DocumentCodec.decode(DocumentCodec.java:173)
at org.bson.codecs.DocumentCodec.decode(DocumentCodec.java:44)
at org.bson.Document.parse(Document.java:129)
at org.bson.Document.parse(Document.java:114)
at org.apache.hop.pipeline.transforms.mongodboutput.MongoDbOutputData.setMongoValueFromHopValue(MongoDbOutputData.java:818)
at org.apache.hop.pipeline.transforms.mongodboutput.MongoDbOutputData.getQueryObject(MongoDbOutputData.java:576)
at org.apache.hop.pipeline.transforms.mongodboutput.MongoDbOutput.processRow(MongoDbOutput.java:171)

Workarounds attempted:

  • Setting json_field=N avoids the parse crash, but the _id value is then passed as a String, which won't match an ObjectId in MongoDB (different BSON types). Updates silently match zero documents.
  • Matching on alternative non-ObjectId fields (e.g., integer fields) works but sacrifices the built-in _id index and requires schema-specific workarounds.

Reproduction test (JUnit 5, depends on org.mongodb:bson:5.6.1):

import org.bson.BsonInvalidOperationException;
import org.bson.Document;
import org.junit.jupiter.api.Test;
import static org.junit.jupiter.api.Assertions.*;

class BsonObjectIdParseTest {

@Test
void parseObjectIdExtendedJson_failsOnBson5() {
    // Reproduces the crash in MongoDbOutputData.setMongoValueFromHopValue()
    assertThrows(BsonInvalidOperationException.class, () -> {
        Document.parse("{\"$oid\": \"6a15efdd0a93aa5ed3b4b947\"}");
    });
}

@Test
void parseDocumentContainingObjectId_stillWorks() {
    // Wrapping inside a parent document works — suggests a possible fix path
    Document doc = Document.parse("{\"_id\": {\"$oid\": \"6a15efdd0a93aa5ed3b4b947\"}}");
    assertNotNull(doc.getObjectId("_id"));
}

}

Issue Priority

Priority: 3

Issue Component

Component: Other

Metadata

Metadata

Assignees

No one assigned

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions