-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python Avro consumer cannot consume non-union fields #108
Comments
A workaround is to change the Python class User(Record):
name = String()
age = Integer() to class User(Record):
_sorted_fields = True
name = String()
age = Integer(required=True) To be compatible with the Java client, we have to configure the |
BewareMyPower
added a commit
to BewareMyPower/pulsar-client-python
that referenced
this issue
May 22, 2023
Fixes apache#108 ### Motivation Currently the Python client uses the reader schema, which is the schema of the consumer, to decode Avro messages. However, when the writer schema is different from the reader schema, the decode will fail. ### Modifications Add `attach_client` method to `Schema` and call it when creating consumers and readers. This method stores a reference to a `_pulsar.Client` instance, which leverages the C++ APIs added in apache/pulsar-client-cpp#257 to fetch schema info. The `AvroSchema` class fetches and caches the writer schema if it is not cached, then use both the writer schema and reader schema to decode messages. Add `test_schema_evolve` to test consumers or readers can decode any message whose writer schema is different with the reader schema.
shibd
pushed a commit
that referenced
this issue
May 25, 2023
Fixes #108 ### Motivation Currently the Python client uses the reader schema, which is the schema of the consumer, to decode Avro messages. However, when the writer schema is different from the reader schema, the decode will fail. ### Modifications Add `attach_client` method to `Schema` and call it when creating consumers and readers. This method stores a reference to a `_pulsar.Client` instance, which leverages the C++ APIs added in apache/pulsar-client-cpp#257 to fetch schema info. The `AvroSchema` class fetches and caches the writer schema if it is not cached, then use both the writer schema and reader schema to decode messages. Add `test_schema_evolve` to test consumers or readers can decode any message whose writer schema is different with the reader schema.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
How to reproduce
Run a Pulsar standalone 2.11.
First, create a Python consumer whose schema is a class with a string field
name
and an integer fieldage
:Then, set the schema compatibility to
FORWARD
:Then, run the Java producer to send a message (
User{name="xyz", age=10}
):Then, the Python consumer application will crash with the following logs:
Analysis
There are two bugs. First, the schema definition generated by the Python client is different from the Java client. Copy these two classes here:
Check the schema definitions (
pulsar-admin schemas get my-topic -v <version>
) and we can find there are two versions of the schema:We can see:
nullable
fields, (or say it correctly, they are Avro unions), even including theint
fieldnullable
fields since Primitive types cannot benull
The text was updated successfully, but these errors were encountered: