Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SearchHit.sourceAsString adds superfluous type metadata #3002

Open
tastyminerals opened this issue Feb 16, 2024 · 2 comments
Open

SearchHit.sourceAsString adds superfluous type metadata #3002

tastyminerals opened this issue Feb 16, 2024 · 2 comments

Comments

@tastyminerals
Copy link

tastyminerals commented Feb 16, 2024

I am stuggling to figure out how can one deserialize a valid JSON hit string into a SearchHit? We are using JSON string -> SearchHit pattern a lot in our tests and classic elasticsearch library SearchHit allowed to do the following:

    def sourceFixtureAsSearchHit(fileName: String, docId: Int = 1): SearchHit = {
        val fixture = loadUTF8FixtureAsString(fileName)
        val source = new BytesArray(fixture)
        val hit = new SearchHit(docId)
        hit.sourceRef(source)
        hit
    }

The elastic4s SearchHit doesn't provide sourceRef. Hence, we parse (via circe) a JSON string into a Map[String, Any]

val sourceMap = parser.parse(fixture).getOrElse(Json.Null)
  .asObject.map(_.toMap)
  .getOrElse(Map.empty[String, Json])

and then store it into elastic4s SearchHit(_source = sourceMap). This however produces a different JSON representation during serialization back via hit.sourceAsString. For example, the original JSON

   {"document_id" : "0b85846f-2c7b-4cc8-b265-6c3fdf1da815"}

becomes

{
  "document_id": {
   "value": "0b85846f-2c7b-4cc8-b265-6c3fdf1da815",
   "array": false,
   "null": false,
   "boolean": false,
   "number": false,
   "string": true,
   "object": false
 }
}

This drastically increases the resulting string size: 214 lines -> ~3k. So, this doesn't look like the correct way to create SearchHit from strings. So, how does one deserialize a string into SearchHit?

@tastyminerals
Copy link
Author

tastyminerals commented Feb 19, 2024

Apparently, a workaround is possible if you use .sourceAsMap instead of .sourceAsString and then convert it to a new json string on your end. So the question is why .sourceAsString adds all that additional type metadata? It will index significantly more data if not checked :(

@tastyminerals tastyminerals changed the title JSON string to SearchHit deserialization? SearchHit.sourceAsString adds superfluous type metadata Feb 19, 2024
@tastyminerals
Copy link
Author

This has something to do with the Jackson that elastic4s uses. So, whenever SearchHit is instantiated manually and _source is set. The downstream .sourceAsString will generate a json with type elements. We avoid it now only by calling .sourceAsMap on the manually instantiated SearchHit, converting the result map into Json object and then back to String using circe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant