Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect missing deserializer before hitting JSON "slow path" under polymorphism #2478

Closed
alshain opened this issue Oct 16, 2023 · 7 comments
Closed
Assignees
Labels

Comments

@alshain
Copy link

alshain commented Oct 16, 2023

Describe the bug

When attempting to decode data that contains unknown types (e.g. generated by a newer version of our application), the StreamingJsonDecode abandons streaming and tries to build the entire (huge) structure in-memory using JsonObject etc, which would ultimately fail anyway because the deserializer simply isn't available.

I think it'd be possible to error out earlier and not fall back to the non-streaming approach in such a case.

I've tracked this down to this fragment in StreamingJsonDecode:

            val discriminator = deserializer.descriptor.classDiscriminator(json)
            val type = lexer.peekLeadingMatchingValue(discriminator, configuration.isLenient)
            var actualSerializer: DeserializationStrategy<Any>? = null
            if (type != null) {
                actualSerializer = deserializer.findPolymorphicSerializerOrNull(this, type)
            }
            if (actualSerializer == null) {
                // Fallback if we haven't found discriminator or serializer
                return decodeSerializableValuePolymorphic<T>(deserializer as DeserializationStrategy<T>)
            }

Suppose type != null because we did find the discriminator, but actualSerializer == null because the serializer isn't known in the current application version. Is there even a point to try to fallback to the "slow path"?

It'd be nice if you could error out right then and there before entering the slow-path.

Are there cases where it's required to enter this code-path when you find the discriminator but don't find a deserializer? From my reading of the code, you always error out here:

val actualSerializer = deserializer.findPolymorphicSerializerOrNull(this, type)
        ?: throwSerializerNotFound(type, jsonTree)
@sandwwraith
Copy link
Member

Hi, thanks for your investigation and detailed report. Yes, I think you're right — when the serializer can't be found by the discriminator, it is an error anyway. The only difference is that throwSerializerNotFound accepts jsonTree to append it to the error message. But this can also be done directly from the input string.
By the way, do you have any benchmarks or measurements on how this affects your application preformance?

@sandwwraith sandwwraith self-assigned this Oct 16, 2023
@alshain
Copy link
Author

alshain commented Oct 16, 2023

@sandwwraith Awesome, thanks!

By the way, do you have any benchmarks or measurements on how this affects your application preformance?

Not really, we only noticed that the slow-path exists because we went OOM because of it. We might not have noticed otherwise. We deserialize 120MiB of JSON and it's not exceedingly fast in the first place.

But thanks to the OOM, we can see that it took at least 1.2 gigs of additional memory but I don't know much longer or more memory it would have taken before finishing.

Ideally, it wouldn't print the remaining 119MiB of JSON into the log :)

image

@sandwwraith
Copy link
Member

But thanks to the OOM, we can see that it took at least 1.2 gigs of additional memory but I don't know much longer or more memory it would have taken before finishing.

That's sufficient evidence, thanks. Not sure how you are expecting it to be deserialized when class discriminator is known, though :)

Ideally, it wouldn't print the remaining 119MiB of JSON into the log

Of course, there's truncation:

private fun CharSequence.minify(offset: Int = -1): CharSequence {

@alshain
Copy link
Author

alshain commented Oct 17, 2023

Not sure how you are expecting it to be deserialized when class discriminator is known, though :)

Under streaming, it works just fine. The resulting object graph is ~250MiB in size according to a heap dump.

sandwwraith added a commit that referenced this issue Oct 17, 2023
Previously, when discriminator was found as the first key in Json, but there was no deserializer for it, we still fell back to a slow path with JsonTree.
It was actually meaningless because a slow path always throws exception when a serializer is not found.
Such behavior led to unnecessary memory pressure & consumption in exceptional cases (see linked ticket for details).

Also make polymorphic deserialization exception messages more meaningful and make them more consistent with serialization ones.

Also fix behavior when the actual discriminator value is JsonNull (it should be treated as missing, not as "null" string).

Fixes #2478
@alshain
Copy link
Author

alshain commented Oct 23, 2023

@sandwwraith Thanks for implementing the fix. Do you have a timeline on the release? 😇

@sandwwraith
Copy link
Member

@alshain A week or two after Kotlin 1.9.20

@alshain
Copy link
Author

alshain commented Nov 16, 2023

Thank you ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants