Support oneof declaraction in protobuf #2546

xiaozhikang0916 · 2024-01-16T15:02:50Z

It is an implementation of #2538 , to support oneof declaraction in protobuf.

Some major changes:

Introduce a new annotation ProtoOneOf to mark a property of class should be decoded and encoded in the oneof way.
ProtoNumber now can be attached to a class, to mark the proto id if such class is an implementation of oneof field.
SealedClassSerializer.subclassSerializers now is a public property, because when decoding message with oneof mark, I need to find the related serializer of concrete class with ProtoNumber annotation, assisted by change 2.

For change 3, there may be some other implementation to achieve it. For example, binding the proto id with specific serializer in ProtoOneOf annotation like

public annotation class ProtoOneOf(public vararg val numbers: ProtoOneOfPair)

public annotation class ProtoOneOfPair(public val number: Int, public val serializer: KClass<KSerializer<*>>)

But it may be tedious for user to write some super heavy tag for a property in data class, and hard to know that the protoNumber annotated in oneof field has priority to number in inner proerty. so current implementation became my choise.

If everything in this PR is OK, I will continue to update comments of code and docs.

pdvrieze

Looks like a good starting point. It probably makes sense to also allow non-sealed polymorphic children.
One thing I'm not sure that you handled is how to deal with child classes that are not annotated with @ProtoNum (you may want to ignore them/throw an error on encoding).

formats/protobuf/commonMain/src/kotlinx/serialization/protobuf/internal/Helpers.kt

formats/protobuf/commonMain/src/kotlinx/serialization/protobuf/internal/ProtobufDecoding.kt

formats/protobuf/commonMain/src/kotlinx/serialization/protobuf/internal/ProtobufEncoding.kt

xiaozhikang0916 · 2024-01-17T14:48:22Z

One thing I'm not sure that you handled is how to deal with child classes that are not annotated with @ProtoNum (you may want to ignore them/throw an error on encoding).

It should be an error in this case. Without the annotation it can never get a proper ProtoNumber since there is no guarantee on the order of the sub serializer, and not all oneof fields start with id 1.

But it is hard to find a proper time to throw an error. The best choise should be in compile time but a specific format has nothing to do with the plugin. Maybe perform a traversal check every time an oneof field is encountered?

pdvrieze · 2024-01-17T17:53:02Z

For a type that contains a OneOf you enumerate all leaf serializers anyway. At that point you can reliably throw an exception of there are any that have missing annotations. You could theoretically do it by deep inspection of the types, but that would be overkill. From my perspective reliable runtime errors can be acceptable.

The main thing to keep in mind is how it interacts with changed/updated classes (different library version/classloader shenanigans, etc.) when that is not direct under developer control (say you have a library that uses protobuf, some user implements one of the interfaces (when not sealed) and now the application doesn't work because the library uses protobuf internally... (maybe required annotations are something that the plugin could be made to support).

Complicating is that deserialization can safely ignore types without ProtoNum, they will never match. Serialization, when such children are present, will be an issue as there is no proper protoNum present. As an option, it could be an idea to support fallback to general polymorphic format (you'd only need one "special" value number that is used in such case - the @ProtoNum value of the "owner field" would be backwards compatible).

xiaozhikang0916 · 2024-01-18T05:15:36Z

it could be an idea to support fallback to general polymorphic format

I might argue that it would lead to inconsistency of encoding and decoding. The field annotated with ProtoOneOf has no number, and the proto number of property in child class may conflict with existing field and casue overwritting. The serializer cannot decode the same value from what it encoded to. This behaviour is error prone, hard to debug and confusing.

how it interacts with changed/updated classes

It could be some trouble theoretically, but it is more a core lib issue here in my opinion. The same usage may also fail when not using ProtoOneOf or even not in ProtoBuf format, and if some one managed to make it work with custom serialize module passing in, just specify the proto number of serializer when registering.

pdvrieze

It is possible to implement this without depending on AbstractPolymorphicSerializer. This implementation expects certain internal behaviour from the serialization library that is not part of its contract. Instead it can use the OneOfEncoder/Decoder to synthesize the expected structure for polymorphic types - and leave determining the (de)serializer to use to the actual serializer implementation (rather than assuming 2 cases and re-implementing their logic).

formats/protobuf/commonMain/src/kotlinx/serialization/protobuf/internal/ProtobufDecoding.kt

formats/protobuf/commonMain/src/kotlinx/serialization/protobuf/internal/ProtobufEncoding.kt

formats/protobuf/commonMain/src/kotlinx/serialization/protobuf/internal/ProtobufDecoding.kt

xiaozhikang0916 · 2024-01-21T04:31:21Z

The shceme generator on current branch can only generate oneof scheme for sealed type, by the support of all subserializer encoded in the sealed serializer descriptor.

For an open one, the ProtoBuf instance should be passed in to achieve the SerializersModule and iterate the SerialModuleImpl.polyBase2Serializers.

What do you think of it? @rotilho

rotilho · 2024-01-21T09:09:00Z

Makes sense to me.

I wonder though if using a repeatable @ProtobufOneOf to pair the num with a subclass wouldn't be a more straightforward approach. This, however, would force to know upfront all subclasses, probably not a problem for sealed but may be to open polymorphism.

xiaozhikang0916 · 2024-01-21T11:39:22Z

I would like to share my use case for context.

Our groups are using proto3 syntex to define RPC methods and messages between clients(iOS and android) and servers(go). As we are adopting KMM in clients, I am migrating the data class in Java generated by proto plugin to kotlin data class, which are generated in compile time.

In this scenario, all the data classes are static snd not editable, to match the definition of messages in proto files. So the oneof elements are enumerable, and sealed interface fit this case best. That why I prefer sealed polymorphic in my implementation.

Though supporting of open polymorphic has some meaning in theory, I still wonder if there are some actual use cases of it. In my opinion:

If the proto file exists, the oneof elements are settled, so the declaraction can be sealed.
If there is no proto file, such as communicating between kotlin processes, oneof is not necessary here, just use the common polymorphic serializer.
If developer wants to use different kinds rather than the pre-defined sealed class, just map them in runtime.

I wonder though if using a repeatable @ProtobufOneOf to pair the num with a subclass wouldn't be a more straightforward approach.

@rotilho As I said in the top description, this way of declaraction can do the work, but I don't think it is a good api, as developer need to write so many annotations on a property( in my case, there will be more than 20), making it unreadable. And for sealed interfaces which I prefer most, the mapping relations are settled at the moment they are defined, meaning that these mappings are redundent.

xiaozhikang0916 · 2024-01-30T05:18:38Z

Rewrote decoding in contract of common polymorphic serializer.

But it still needs to find the actual serializer to get a proper serial name for polymorphic serializer.

@pdvrieze

xiaozhikang0916 · 2024-02-05T16:44:34Z

Sorry for my urging, but is there any work to do for this request to be accepted?

rotilho · 2024-02-15T21:31:42Z

Hey @pdvrieze. Any chance to take a look on this PR when you have some time? I'd love to be able to use it

xiaozhikang0916 · 2024-02-23T15:17:35Z

Hi @sandwwraith could you please also take a look on this PR whether it is qualified to be merged.

I am looking forward to use it in my product code.

sandwwraith · 2024-02-29T11:21:58Z

@xiaozhikang0916 Sure thing, I'll take a look at it soon

xiaozhikang0916 · 2024-04-01T08:43:48Z

Found some api check failed in the TeamCity check, but running apiDump in pure dev branch would add public synthetic class kotlinx/serialization/json/internal/JsonStreamsKt$EntriesMappings, which should not exist, in the Json format api file.

Just remove it from commit history.

sandwwraith

Good job, I like the idea overall. However, I have two big design questions:

Is val numbers in @ProtoOneOf really required? The annotation itself is necessary to differentiate between regular polymorphism and oneOf. However, numbers in it seems excessive. First, we can already get them from descriptors using a mechanism similar to descriptor.getActualOneOfSerializer. It is heavy processing, but result of extractProtoOneOfIds (the only place using annotation.numbers) is cached anyway. Second, it is too easy to make a mistake where you put @ProtoNumber on a class but forget to update @ProtoOneOf content.
OneOf in protobuf is defined inside the outer message (i.e. class) itself, while we're using @ProtoNumber on a definition of a nested message. In other words, if I have @ProtoNumber(4) class X: SomeIface, I have committed to use number 4 for it in everywhere. And if I already have class Y(@ProtoNumber(4) val s: String), I cannot add val x: SomeIface to it, because I cannot override the number on a use-site of oneof class. This looks like a big design limitation; perhaps you can say more if it will bother people or not.

I haven't proof-read docs yet, but I've left comments about implementation. PTAL at the comments on tests — looks like they're working incorrectly with value classes.

formats/protobuf/commonMain/src/kotlinx/serialization/protobuf/internal/ProtobufDecoding.kt

core/commonMain/src/kotlinx/serialization/SealedSerializer.kt

formats/protobuf/commonMain/src/kotlinx/serialization/protobuf/internal/ProtobufDecoding.kt

formats/protobuf/commonTest/src/kotlinx/serialization/protobuf/schema/SchemaValidationsTest.kt

formats/protobuf/commonTest/src/kotlinx/serialization/protobuf/ProtobufOneOfTest.kt

xiaozhikang0916 · 2024-04-04T09:18:24Z

Is val numbers in @ProtoOneOf really required?

You are right, numbers here is unnecessary. It may need a benchmark in the future, but works fine for now.

ProtoNumber of oneof

In a protobuf-file-based develop mode, the proto files will be the source-of-truth to generate data classes with some plugin. In the definition of protobuf, each oneof field can only be declared in a message and cannot be reused. So the generated sealed interface has one-to-one relation with a concreate data class, no need to worry about the id conflicts.

In a kotlin-based develop mode, the case what you worry about may happen. But in practice, oneof is a special mechanism of protobuf that developers will have few needs to depend on. In other words, why not just use the common polymorphic of common serialization?

However, as a designed limitation, I would add some more explaination in the docs, and some runtime check in ProtobufDecoding for duplated proto ids, but cannot find some simular check-point in ProtobufEncoding.

sandwwraith · 2024-04-04T10:44:54Z

But in practice, oneof is a special mechanism of protobuf that developers will have few needs to depend on.

Perhaps. However, you can write Kotlin classes without a special proto plugin to be compatible with the existing schema. I'm not sure if it is popular in protobuf to reuse messages in oneof, or if everyone just uses unique classes for that. Anyway, we can work with this limitation and look at the feedback

sandwwraith · 2024-04-04T10:47:01Z

Oh, and about

Found some api check failed in the TeamCity check, but running apiDump in pure dev branch would add public synthetic class kotlinx/serialization/json/internal/JsonStreamsKt$EntriesMappings, which should not exist, in the Json format api file.

This is a bug in BCV (Kotlin/binary-compatibility-validator#141) which is fixed in 0.14.0. I'll update the version soon.

However, there are some examples failing, probably due to a new duplicate check: https://teamcity.jetbrains.com/buildConfiguration/KotlinTools_KotlinxSerialization_RuntimeLibraryBuildAggregated/4559581?hideTestsFromDependencies=false&hideProblemsFromDependencies=false&expandBuildChangesSection=true&expandBuildTestsSection=true&expandBuildProblemsSection=true

xiaozhikang0916 · 2024-04-04T13:11:49Z

I'm not sure if it is popular in protobuf to reuse messages in oneof.

It is totally fine to reuse message between oneof fields. Let me elaborate a little bit.

Giving messages like

message ShareMessage {
    string common = 1;
}

message Some {
    string name = 1;
    oneof pack {
        ShareMessage share_in_some = 2;
    }
}

Definations of message Some in kt should be data class Some the outer class, sealed interface IPack the oneof interface, data class ShareInSome(val value: ShareMessage): IPack message holders for each oneof field with proto ids, and data class ShareMessage, the message to be sharing.

Now we share ShareMessage in another message:

message Another {
    string another_name = 1;
    oneof another_pack {
        ShareMessage share_in_another = 3;
    }
}

Note that we cannot share pack in Some in a protobuf file, but only redefine another oneof field with every messages we want to share, e.g. ShareMessage here.

The oneof interface for previous class IPack, as a representative of the oneof pack declaraction, should not be reused either. We should have another oneof interface IAnotherPack, another group of message holders with proto ids stand for theriselves. In this way, each group of message holders and oneof interface can only be used once, which prevent the id conflicts.

In summary, oneof in protobuf is a complecated mechanism, and using it in kotlin data class requires some prerequisites for knowledge of protobuf.

sandwwraith · 2024-04-04T14:45:26Z

Thanks for the explanation. I've re-read everything — subclasses are meant to be used in one particular place, and there is even a requirement mentioned that they should have exactly one property. In that case, they don't have to be reusable.
It is not entirely an optimal way to do this, but I think it is the best way we can do it in case a polymorphism is required.

Is it possible though, to enforce 'subclass should have only one property' requirement via SerialDescriptor.elementCount == 1? This place is easy to miss in the docs, and some runtime aid may be helpful.

sandwwraith · 2024-04-04T15:09:17Z

Is it possible though, to enforce 'subclass should have only one property' requirement via SerialDescriptor.elementCount == 1? This place is easy to miss in the docs, and some runtime aid may be helpful.

UPD: found require(descriptor.elementsCount == 1) in OneOfElementEncoder

xiaozhikang0916 · 2024-04-04T15:40:05Z

Yes, both encoder and decoder have a check of elementsCount in init block during runtime. It would be better to have a check in buildtime but can do nothing more inside a particular serialization format lib, and I think it is done enough for now.

docs/formats.md

formats/protobuf/commonMain/src/kotlinx/serialization/protobuf/ProtoTypes.kt

formats/protobuf/commonMain/src/kotlinx/serialization/protobuf/internal/Helpers.kt

formats/protobuf/commonMain/src/kotlinx/serialization/protobuf/internal/ProtobufEncoding.kt

formats/protobuf/commonTest/src/kotlinx/serialization/protobuf/ProtobufOneOfTest.kt

formats/protobuf/commonMain/src/kotlinx/serialization/protobuf/internal/ProtobufDecoding.kt

formats/protobuf/commonTest/src/kotlinx/serialization/protobuf/ProtobufOneOfTest.kt

sandwwraith · 2024-04-04T16:21:36Z

t would be better to have a check in buildtime but can do nothing more inside a particular serialization format lib, and I think it is done enough for now.

Yes, unfortunately we do not have any special mechanisms to provide formats with compile-time checks.

xiaozhikang0916 · 2024-04-06T12:19:31Z

I have re-thought about the design of oneof declaration. Is it necessary to annotate the message holder class with ProtoNumber?

The message holder, as described above, should have only 1 property. We can still use the origin form of ProtoNumber annotated at this property, as the one we need.

In this way, we need no to introduce another annotation target to ProtoNumber, need no to think about that which id will override another. And oneof class can be defined in the same way developers get used of, reducing the likelihood of errors.

In fact, there were some error found in my implementation and test case during this refactoring, which should be fixed right now.

By the way, I think writing unit tests with data in binary format is quite cryptic here, e.g. hard to change the hex string after proto id updated. Maybe some utils like protoscope is necessary in preparing binary data.

sandwwraith · 2024-04-25T10:50:32Z

It seems you need to run knitCheck to update documentation (likely because I've suggested to add a comment to a code sample)

sandwwraith

Thank you for such a big contribution, it was nice working with you!

rotilho · 2024-04-26T09:10:07Z

Awesome guys! Thank you very much for working on this 🎉🎉🎉

pdvrieze reviewed Jan 17, 2024

View reviewed changes

pdvrieze reviewed Jan 18, 2024

View reviewed changes

xiaozhikang0916 requested a review from pdvrieze January 25, 2024 02:45

xiaozhikang0916 force-pushed the feature/protobuf-oneof branch from e5e703e to 533881d Compare January 30, 2024 04:30

xiaozhikang0916 force-pushed the feature/protobuf-oneof branch 2 times, most recently from 2ba6984 to 37b3c3e Compare April 1, 2024 08:41

sandwwraith requested changes Apr 3, 2024

View reviewed changes

xiaozhikang0916 requested a review from sandwwraith April 4, 2024 10:05

sandwwraith requested changes Apr 4, 2024

View reviewed changes

xiaozhikang0916 added 12 commits April 25, 2024 10:06

Add duplicated id check in decoding

f01723a

Update protobuf docs

9d5a401

Check unique proto id only for class type

4a0adf5

Api dump for core lib

88311e7

Update docs by review

ac0af92

Some cleaning

b60c76c

Fix schema generator for oneof message

2a80a1f

Some exception messages and tests

d2eb264

Use property proto number in oneof class

8ac87da

Update docs

014817f

Update docs

6dec429

Add comment in proto oneof test for hex data

17f3022

xiaozhikang0916 force-pushed the feature/protobuf-oneof branch from 14de8d0 to 2ec2edc Compare April 25, 2024 06:23

xiaozhikang0916 added 8 commits April 25, 2024 14:56

Comments and test for proto oneOf id index mapping

3fd14b0

Comments and test using non-polymorphic class in oneOf

e5f6755

Revert docs of ProtoNumber

ee1f7ec

Docs of ProtoOneOf

96e04b7

Update docs by review

c056e5d

Refine exception msg if bad oneof def found

a5a53b7

Checking one of element with 2 properties should fail

848885a

No checking proto id duplication

79dfb9d

xiaozhikang0916 force-pushed the feature/protobuf-oneof branch from 2ec2edc to 79dfb9d Compare April 25, 2024 06:57

Clean comment of extractParameters

09550db

xiaozhikang0916 mentioned this pull request Apr 25, 2024

[ProtoBuf] Should we check duplicating of proto id? #2653

Open

update docs dump

188f998

sandwwraith approved these changes Apr 25, 2024

View reviewed changes

sandwwraith merged commit 251bca7 into Kotlin:dev Apr 25, 2024
3 checks passed

xiaozhikang0916 deleted the feature/protobuf-oneof branch April 26, 2024 00:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support oneof declaraction in protobuf #2546

Support oneof declaraction in protobuf #2546

xiaozhikang0916 commented Jan 16, 2024 •

edited

Loading

pdvrieze left a comment

xiaozhikang0916 commented Jan 17, 2024

pdvrieze commented Jan 17, 2024

xiaozhikang0916 commented Jan 18, 2024

pdvrieze left a comment

xiaozhikang0916 commented Jan 21, 2024

rotilho commented Jan 21, 2024 •

edited

Loading

xiaozhikang0916 commented Jan 21, 2024 •

edited

Loading

xiaozhikang0916 commented Jan 30, 2024

xiaozhikang0916 commented Feb 5, 2024

rotilho commented Feb 15, 2024

xiaozhikang0916 commented Feb 23, 2024

sandwwraith commented Feb 29, 2024

xiaozhikang0916 commented Apr 1, 2024

sandwwraith left a comment

xiaozhikang0916 commented Apr 4, 2024

sandwwraith commented Apr 4, 2024

sandwwraith commented Apr 4, 2024

xiaozhikang0916 commented Apr 4, 2024

sandwwraith commented Apr 4, 2024

sandwwraith commented Apr 4, 2024

xiaozhikang0916 commented Apr 4, 2024

sandwwraith commented Apr 4, 2024

xiaozhikang0916 commented Apr 6, 2024

sandwwraith commented Apr 25, 2024

sandwwraith left a comment

rotilho commented Apr 26, 2024

Support oneof declaraction in protobuf #2546

Support oneof declaraction in protobuf #2546

Conversation

xiaozhikang0916 commented Jan 16, 2024 • edited Loading

pdvrieze left a comment

Choose a reason for hiding this comment

xiaozhikang0916 commented Jan 17, 2024

pdvrieze commented Jan 17, 2024

xiaozhikang0916 commented Jan 18, 2024

pdvrieze left a comment

Choose a reason for hiding this comment

xiaozhikang0916 commented Jan 21, 2024

rotilho commented Jan 21, 2024 • edited Loading

xiaozhikang0916 commented Jan 21, 2024 • edited Loading

xiaozhikang0916 commented Jan 30, 2024

xiaozhikang0916 commented Feb 5, 2024

rotilho commented Feb 15, 2024

xiaozhikang0916 commented Feb 23, 2024

sandwwraith commented Feb 29, 2024

xiaozhikang0916 commented Apr 1, 2024

sandwwraith left a comment

Choose a reason for hiding this comment

xiaozhikang0916 commented Apr 4, 2024

Is val numbers in @ProtoOneOf really required?

ProtoNumber of oneof

sandwwraith commented Apr 4, 2024

sandwwraith commented Apr 4, 2024

xiaozhikang0916 commented Apr 4, 2024

sandwwraith commented Apr 4, 2024

sandwwraith commented Apr 4, 2024

xiaozhikang0916 commented Apr 4, 2024

sandwwraith commented Apr 4, 2024

xiaozhikang0916 commented Apr 6, 2024

sandwwraith commented Apr 25, 2024

sandwwraith left a comment

Choose a reason for hiding this comment

rotilho commented Apr 26, 2024

xiaozhikang0916 commented Jan 16, 2024 •

edited

Loading

rotilho commented Jan 21, 2024 •

edited

Loading

xiaozhikang0916 commented Jan 21, 2024 •

edited

Loading