Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protobuf schema incompatibility between registered version and version used by producer #1851

Closed
cemeyer2 opened this issue Apr 27, 2021 · 5 comments

Comments

@cemeyer2
Copy link

With confluent platform 6.1.0 (other versions tested as well), there exists the following issue:

  • I generate a protobuf schema file which has nested types
  • I register that schema using curl directly against the schema registry api
  • I use protoc to generate a python binding for that schema
  • I attempt to produce a message in that schema with the serializer configured to not auto register

Expected: the producer will produce a message to the topic
Actual: The schema registry fails the compatiblity check before the producer can produce

More explicitly,

Given:

syntax = "proto3";
package test_schema_registry_bug;

message MessageEnvelope {
  oneof message {
    SubTypeA message_TypeA = 1;
    SubTypeB message_TypeB = 2;
  }
}
message SubTypeA {
  string name = 1;
}
message SubTypeB {
  string name = 1;
}

and then register that with curl with the payload:

{
  "schemaType": "PROTOBUF",
  "schema": "syntax = \"proto3\";\npackage test_schema_registry_bug;\n\nmessage MessageEnvelope {\n  oneof message {\n    SubTypeA message_TypeA = 1;\n    SubTypeB message_TypeB = 2;\n  }\n}\nmessage SubTypeA {\n  string name = 1;\n}\nmessage SubTypeB {\n  string name = 1;\n}"
}

like

curl -XPOST -H 'Content-Type:application/vnd.schemaregistry.v1+json' http://localhost:8081/subjects/test-value/versions --data @register_body.json

Then use protoc to generate a python binding, which outputs a file named schema_pb2.py

and use the following producer code:

from confluent_kafka import SerializingProducer
from confluent_kafka.serialization import StringSerializer
from confluent_kafka.schema_registry import SchemaRegistryClient
from confluent_kafka.schema_registry.protobuf import ProtobufSerializer

from schema_pb2 import MessageEnvelope, SubTypeA, SubTypeB

schema_registry_conf = {'url': 'http://localhost:8081'}
schema_registry_client = SchemaRegistryClient(schema_registry_conf)

serializer_conf = {'auto.register.schemas': False}
protobuf_serializer = ProtobufSerializer(MessageEnvelope, schema_registry_client, serializer_conf)

producer_conf = {'bootstrap.servers': 'localhost:9092',
                 'key.serializer': StringSerializer('utf_8'),
                 'value.serializer': protobuf_serializer}

producer = SerializingProducer(producer_conf)

message = MessageEnvelope(
    message_TypeA=SubTypeA(name='typeA')
)
producer.produce(topic='test', key='my_key', value=message)
producer.flush()

It fails with:

confluent_kafka.error.ValueSerializationError: KafkaError{code=_VALUE_SERIALIZATION,val=-161,str="Schema not found (HTTP status code 404, SR code 40403)"}

Looking at the generated python, the descriptor is slightly different which causes the mismatch of:

in the schema registry:

syntax = "proto3"; package test_schema_registry_bug; message MessageEnvelope { oneof message { SubTypeA message_TypeA = 1; SubTypeB message_TypeB = 2; } } message SubTypeA { string name = 1; } message SubTypeB { string name = 1; } 

vs

what the python client is attempting to use:

syntax = "proto3"; package test_schema_registry_bug; message MessageEnvelope { oneof message { .test_schema_registry_bug.SubTypeA message_TypeA = 1; .test_schema_registry_bug.SubTypeB message_TypeB = 2; } } message SubTypeA { string name = 1; } message SubTypeB { string name = 1; }

As can be seen, the protoc generated language binding for python fully qualifies the names of the nested types in the definition of the MessageEnvelope. I believe the protobuf provider in the schema registry should not treat this as an incompatibility.

@rayokota
Copy link
Member

In Java you can specify use.latest.version=true. This config has not yet been implemented in Python. cc @mhowlett

@cemeyer2
Copy link
Author

@rayokota Is this only a client error or should this also be addressed in the schema registry as well? Seems like the schema registry could also address this bug

@rayokota
Copy link
Member

rayokota commented Jun 2, 2021

@jiangok-open
Copy link

jiangok-open commented Feb 3, 2022

Is the client fix only available for Java? For example, I use Go. Since there is no Confluent schema registry client for Go, I have to use broker side validation. Is the fix available in Go? If not available, what’s the workaround? Otherwise this issue will be blocking. Thanks!

@rayokota
Copy link
Member

rayokota commented Jul 8, 2022

@rayokota rayokota closed this as completed Jul 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants