Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support oneof declaraction in protobuf #2546

Merged
merged 46 commits into from
Apr 25, 2024

Conversation

xiaozhikang0916
Copy link
Contributor

@xiaozhikang0916 xiaozhikang0916 commented Jan 16, 2024

It is an implementation of #2538 , to support oneof declaraction in protobuf.

Some major changes:

  1. Introduce a new annotation ProtoOneOf to mark a property of class should be decoded and encoded in the oneof way.
  2. ProtoNumber now can be attached to a class, to mark the proto id if such class is an implementation of oneof field.
  3. SealedClassSerializer.subclassSerializers now is a public property, because when decoding message with oneof mark, I need to find the related serializer of concrete class with ProtoNumber annotation, assisted by change 2.

For change 3, there may be some other implementation to achieve it. For example, binding the proto id with specific serializer in ProtoOneOf annotation like

public annotation class ProtoOneOf(public vararg val numbers: ProtoOneOfPair)

public annotation class ProtoOneOfPair(public val number: Int, public val serializer: KClass<KSerializer<*>>)

But it may be tedious for user to write some super heavy tag for a property in data class, and hard to know that the protoNumber annotated in oneof field has priority to number in inner proerty. so current implementation became my choise.

If everything in this PR is OK, I will continue to update comments of code and docs.

Copy link
Contributor

@pdvrieze pdvrieze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good starting point. It probably makes sense to also allow non-sealed polymorphic children.
One thing I'm not sure that you handled is how to deal with child classes that are not annotated with @ProtoNum (you may want to ignore them/throw an error on encoding).

@xiaozhikang0916
Copy link
Contributor Author

One thing I'm not sure that you handled is how to deal with child classes that are not annotated with @ProtoNum (you may want to ignore them/throw an error on encoding).

It should be an error in this case. Without the annotation it can never get a proper ProtoNumber since there is no guarantee on the order of the sub serializer, and not all oneof fields start with id 1.

But it is hard to find a proper time to throw an error. The best choise should be in compile time but a specific format has nothing to do with the plugin. Maybe perform a traversal check every time an oneof field is encountered?

@pdvrieze
Copy link
Contributor

For a type that contains a OneOf you enumerate all leaf serializers anyway. At that point you can reliably throw an exception of there are any that have missing annotations. You could theoretically do it by deep inspection of the types, but that would be overkill. From my perspective reliable runtime errors can be acceptable.

The main thing to keep in mind is how it interacts with changed/updated classes (different library version/classloader shenanigans, etc.) when that is not direct under developer control (say you have a library that uses protobuf, some user implements one of the interfaces (when not sealed) and now the application doesn't work because the library uses protobuf internally... (maybe required annotations are something that the plugin could be made to support).

Complicating is that deserialization can safely ignore types without ProtoNum, they will never match. Serialization, when such children are present, will be an issue as there is no proper protoNum present. As an option, it could be an idea to support fallback to general polymorphic format (you'd only need one "special" value number that is used in such case - the @ProtoNum value of the "owner field" would be backwards compatible).

@xiaozhikang0916
Copy link
Contributor Author

it could be an idea to support fallback to general polymorphic format

I might argue that it would lead to inconsistency of encoding and decoding. The field annotated with ProtoOneOf has no number, and the proto number of property in child class may conflict with existing field and casue overwritting. The serializer cannot decode the same value from what it encoded to. This behaviour is error prone, hard to debug and confusing.

how it interacts with changed/updated classes

It could be some trouble theoretically, but it is more a core lib issue here in my opinion. The same usage may also fail when not using ProtoOneOf or even not in ProtoBuf format, and if some one managed to make it work with custom serialize module passing in, just specify the proto number of serializer when registering.

Copy link
Contributor

@pdvrieze pdvrieze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible to implement this without depending on AbstractPolymorphicSerializer. This implementation expects certain internal behaviour from the serialization library that is not part of its contract. Instead it can use the OneOfEncoder/Decoder to synthesize the expected structure for polymorphic types - and leave determining the (de)serializer to use to the actual serializer implementation (rather than assuming 2 cases and re-implementing their logic).

@xiaozhikang0916
Copy link
Contributor Author

The shceme generator on current branch can only generate oneof scheme for sealed type, by the support of all subserializer encoded in the sealed serializer descriptor.

For an open one, the ProtoBuf instance should be passed in to achieve the SerializersModule and iterate the SerialModuleImpl.polyBase2Serializers.

What do you think of it? @rotilho

@rotilho
Copy link
Contributor

rotilho commented Jan 21, 2024

Makes sense to me.

I wonder though if using a repeatable @ProtobufOneOf to pair the num with a subclass wouldn't be a more straightforward approach. This, however, would force to know upfront all subclasses, probably not a problem for sealed but may be to open polymorphism.

@xiaozhikang0916
Copy link
Contributor Author

xiaozhikang0916 commented Jan 21, 2024

I would like to share my use case for context.

Our groups are using proto3 syntex to define RPC methods and messages between clients(iOS and android) and servers(go). As we are adopting KMM in clients, I am migrating the data class in Java generated by proto plugin to kotlin data class, which are generated in compile time.

In this scenario, all the data classes are static snd not editable, to match the definition of messages in proto files. So the oneof elements are enumerable, and sealed interface fit this case best. That why I prefer sealed polymorphic in my implementation.

Though supporting of open polymorphic has some meaning in theory, I still wonder if there are some actual use cases of it. In my opinion:

  • If the proto file exists, the oneof elements are settled, so the declaraction can be sealed.
  • If there is no proto file, such as communicating between kotlin processes, oneof is not necessary here, just use the common polymorphic serializer.
  • If developer wants to use different kinds rather than the pre-defined sealed class, just map them in runtime.

I wonder though if using a repeatable @ProtobufOneOf to pair the num with a subclass wouldn't be a more straightforward approach.

@rotilho As I said in the top description, this way of declaraction can do the work, but I don't think it is a good api, as developer need to write so many annotations on a property( in my case, there will be more than 20), making it unreadable. And for sealed interfaces which I prefer most, the mapping relations are settled at the moment they are defined, meaning that these mappings are redundent.

@xiaozhikang0916
Copy link
Contributor Author

Rewrote decoding in contract of common polymorphic serializer.

But it still needs to find the actual serializer to get a proper serial name for polymorphic serializer.

@pdvrieze

@xiaozhikang0916
Copy link
Contributor Author

Sorry for my urging, but is there any work to do for this request to be accepted?

@rotilho
Copy link
Contributor

rotilho commented Feb 15, 2024

Hey @pdvrieze. Any chance to take a look on this PR when you have some time? I'd love to be able to use it

@xiaozhikang0916
Copy link
Contributor Author

Hi @sandwwraith could you please also take a look on this PR whether it is qualified to be merged.

I am looking forward to use it in my product code.

@sandwwraith
Copy link
Member

@xiaozhikang0916 Sure thing, I'll take a look at it soon

@xiaozhikang0916 xiaozhikang0916 force-pushed the feature/protobuf-oneof branch 2 times, most recently from 2ba6984 to 37b3c3e Compare April 1, 2024 08:41
@xiaozhikang0916
Copy link
Contributor Author

Found some api check failed in the TeamCity check, but running apiDump in pure dev branch would add public synthetic class kotlinx/serialization/json/internal/JsonStreamsKt$EntriesMappings, which should not exist, in the Json format api file.

Just remove it from commit history.

Copy link
Member

@sandwwraith sandwwraith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job, I like the idea overall. However, I have two big design questions:

  1. Is val numbers in @ProtoOneOf really required? The annotation itself is necessary to differentiate between regular polymorphism and oneOf. However, numbers in it seems excessive. First, we can already get them from descriptors using a mechanism similar to descriptor.getActualOneOfSerializer. It is heavy processing, but result of extractProtoOneOfIds (the only place using annotation.numbers) is cached anyway. Second, it is too easy to make a mistake where you put @ProtoNumber on a class but forget to update @ProtoOneOf content.

  2. OneOf in protobuf is defined inside the outer message (i.e. class) itself, while we're using @ProtoNumber on a definition of a nested message. In other words, if I have @ProtoNumber(4) class X: SomeIface, I have committed to use number 4 for it in everywhere. And if I already have class Y(@ProtoNumber(4) val s: String), I cannot add val x: SomeIface to it, because I cannot override the number on a use-site of oneof class. This looks like a big design limitation; perhaps you can say more if it will bother people or not.

I haven't proof-read docs yet, but I've left comments about implementation. PTAL at the comments on tests — looks like they're working incorrectly with value classes.

@xiaozhikang0916
Copy link
Contributor Author

Is val numbers in @ProtoOneOf really required?

You are right, numbers here is unnecessary. It may need a benchmark in the future, but works fine for now.

ProtoNumber of oneof

In a protobuf-file-based develop mode, the proto files will be the source-of-truth to generate data classes with some plugin. In the definition of protobuf, each oneof field can only be declared in a message and cannot be reused. So the generated sealed interface has one-to-one relation with a concreate data class, no need to worry about the id conflicts.

In a kotlin-based develop mode, the case what you worry about may happen. But in practice, oneof is a special mechanism of protobuf that developers will have few needs to depend on. In other words, why not just use the common polymorphic of common serialization?

However, as a designed limitation, I would add some more explaination in the docs, and some runtime check in ProtobufDecoding for duplated proto ids, but cannot find some simular check-point in ProtobufEncoding.

@sandwwraith
Copy link
Member

But in practice, oneof is a special mechanism of protobuf that developers will have few needs to depend on.

Perhaps. However, you can write Kotlin classes without a special proto plugin to be compatible with the existing schema. I'm not sure if it is popular in protobuf to reuse messages in oneof, or if everyone just uses unique classes for that. Anyway, we can work with this limitation and look at the feedback

@sandwwraith
Copy link
Member

Oh, and about

Found some api check failed in the TeamCity check, but running apiDump in pure dev branch would add public synthetic class kotlinx/serialization/json/internal/JsonStreamsKt$EntriesMappings, which should not exist, in the Json format api file.

This is a bug in BCV (Kotlin/binary-compatibility-validator#141) which is fixed in 0.14.0. I'll update the version soon.

However, there are some examples failing, probably due to a new duplicate check: https://teamcity.jetbrains.com/buildConfiguration/KotlinTools_KotlinxSerialization_RuntimeLibraryBuildAggregated/4559581?hideTestsFromDependencies=false&hideProblemsFromDependencies=false&expandBuildChangesSection=true&expandBuildTestsSection=true&expandBuildProblemsSection=true

@xiaozhikang0916
Copy link
Contributor Author

I'm not sure if it is popular in protobuf to reuse messages in oneof.

It is totally fine to reuse message between oneof fields. Let me elaborate a little bit.

Giving messages like

message ShareMessage {
    string common = 1;
}

message Some {
    string name = 1;
    oneof pack {
        ShareMessage share_in_some = 2;
    }
}

Definations of message Some in kt should be data class Some the outer class, sealed interface IPack the oneof interface, data class ShareInSome(val value: ShareMessage): IPack message holders for each oneof field with proto ids, and data class ShareMessage, the message to be sharing.

Now we share ShareMessage in another message:

message Another {
    string another_name = 1;
    oneof another_pack {
        ShareMessage share_in_another = 3;
    }
}

Note that we cannot share pack in Some in a protobuf file, but only redefine another oneof field with every messages we want to share, e.g. ShareMessage here.

The oneof interface for previous class IPack, as a representative of the oneof pack declaraction, should not be reused either. We should have another oneof interface IAnotherPack, another group of message holders with proto ids stand for theriselves. In this way, each group of message holders and oneof interface can only be used once, which prevent the id conflicts.

In summary, oneof in protobuf is a complecated mechanism, and using it in kotlin data class requires some prerequisites for knowledge of protobuf.

@sandwwraith
Copy link
Member

Thanks for the explanation. I've re-read everything — subclasses are meant to be used in one particular place, and there is even a requirement mentioned that they should have exactly one property. In that case, they don't have to be reusable.
It is not entirely an optimal way to do this, but I think it is the best way we can do it in case a polymorphism is required.

Is it possible though, to enforce 'subclass should have only one property' requirement via SerialDescriptor.elementCount == 1? This place is easy to miss in the docs, and some runtime aid may be helpful.

@sandwwraith
Copy link
Member

Is it possible though, to enforce 'subclass should have only one property' requirement via SerialDescriptor.elementCount == 1? This place is easy to miss in the docs, and some runtime aid may be helpful.

UPD: found require(descriptor.elementsCount == 1) in OneOfElementEncoder

@xiaozhikang0916
Copy link
Contributor Author

Yes, both encoder and decoder have a check of elementsCount in init block during runtime. It would be better to have a check in buildtime but can do nothing more inside a particular serialization format lib, and I think it is done enough for now.

@sandwwraith
Copy link
Member

t would be better to have a check in buildtime but can do nothing more inside a particular serialization format lib, and I think it is done enough for now.

Yes, unfortunately we do not have any special mechanisms to provide formats with compile-time checks.

@xiaozhikang0916
Copy link
Contributor Author

I have re-thought about the design of oneof declaration. Is it necessary to annotate the message holder class with ProtoNumber?

The message holder, as described above, should have only 1 property. We can still use the origin form of ProtoNumber annotated at this property, as the one we need.

In this way, we need no to introduce another annotation target to ProtoNumber, need no to think about that which id will override another. And oneof class can be defined in the same way developers get used of, reducing the likelihood of errors.

In fact, there were some error found in my implementation and test case during this refactoring, which should be fixed right now.

By the way, I think writing unit tests with data in binary format is quite cryptic here, e.g. hard to change the hex string after proto id updated. Maybe some utils like protoscope is necessary in preparing binary data.

@sandwwraith
Copy link
Member

It seems you need to run knitCheck to update documentation (likely because I've suggested to add a comment to a code sample)

Copy link
Member

@sandwwraith sandwwraith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for such a big contribution, it was nice working with you!

@sandwwraith sandwwraith merged commit 251bca7 into Kotlin:dev Apr 25, 2024
3 checks passed
@xiaozhikang0916 xiaozhikang0916 deleted the feature/protobuf-oneof branch April 26, 2024 00:40
@rotilho
Copy link
Contributor

rotilho commented Apr 26, 2024

Awesome guys! Thank you very much for working on this 🎉🎉🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants