New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New AvroDeserializer class requires a reader schema #834
Comments
Thanks for the feedback @mparry. Reader Schema requirement: This was actually an item of debate during the design process. The question being should the Deserliazer be polymorphic(since we can access the writer schema from the Schema Registry). Ultimately the decision was made that the deserializer should only handle a single type, that which is defined in the reader schema. That said we marked these features as experimental for exactly this reason. To see how our users intend to use the API. I'll link any subseuqent issues related to the same thing here to document interest in handling "Generic" Avro records. Added a fix for the error message/exception raising to the v1.4.1rc branch. Thanks for catching that! |
The benefit that I'm interested in isn't so much polymorphism (though sure, I can see that results too) but rather that the Python client needs no advance knowledge of the schema. In a dynamic language, this is quite a nice feature! Clearly if you want to use the data in depth then you probably need to know something about the schema but I think there are various use cases where this isn't true. To expand on my original example, we have tools that simply monitor and display the data being published on a topic, and it's quite undesirable to have to manually propagate the schema downstream to there, when the registry serves this purpose rather well. I guess my other argument would be that this seems like something of a regression from the existing functionality - i.e. as it stands, you can't necessarily simply replace |
+1 |
In my case I would like to retrieve schemas from Schema-Registry and use writer_schema instead of reader_schema. So my suggestion is make schema_str optional in AvroDeserializer. |
I could prepare patch and PR if you wish. |
I'm experiencing similar difficulty when trying to replace the legacy AvroConsumer with a DeserialzingConsumer and would be willing to take a crack at preparing a fix. I see two ways of doing this. Either a new class, eg: |
I definitely agree with this issue — looking at the example for the AvroConsumer confluent-kafka-python/examples/avro_consumer.py Lines 86 to 97 in c33a0cc
It's necessary to provide a schema to deserialize messages from a topic in the current state. You can definitely get the writer schema from the registry — so that piece is ok. However, how do we know which schema version/id to retrieve if it's encoded inside the message itself? confluent-kafka-python/src/confluent_kafka/schema_registry/avro.py Lines 231 to 232 in c33a0cc
With these changes, it makes encoding the Schema ID obsolete since the cart is before the horse |
+1. One of the most important features of Avro (imo) is the ability to read data without advance knowledge of the schema. The schema (or a reference to it) is included with the data. Now, I also like the ability to create projections of Avro-encoded data by supplying a compatible reader schema when I want to (primarily to ignore fields I don't care about). Both use cases are supported by Avro, and I think both should be supported by the deserializer. I took a look at the code, and I believe this would mean a breaking change to the |
the root cause of this decision, i think, is that it mimics the specific api in strongly typed languages where the schema is always present in the generated class. since the the new serde interface is still marked as experimental, I think we can err on the side of making the breaking parameter order change to make the API a bit nicer. thanks for reporting @mparry and all the endorsement everyone else .. |
I just wanted to cast my vote for this change as well |
Is there any workaround for this issue? I'd like to be able to consume a topic without specifying the reader schema in advance. |
This should be closed, as of #1000 |
Closing this issue as the required PR is already merged and released. |
Description
The new
AvroDeserializer
class added in 1.4.0 requires a reader schema but this seems unnecessary and means that it's not possible to use these new classes in the same way as the oldAvroConsumer
- i.e. inAvroConsumer
the default values forreader_key_schema
andreader_value_schema
areNone
.Specifically, we have use cases where we are just interested in consuming (and, say, displaying) whatever messages are forthcoming on a topic and it's sufficient - and indeed preferable - to just use the corresponding writer schema from the registry.
I would be happy to provide a PR addressing this, unless the reader schema requirement is deliberate for some reason.
(A couple of other minor issues I noticed while looking at
schema_registry/avro.py
: i) the error messages aboutto_dict
not being callable have the arguments the wrong way around; ii) the attempt to raise aValueError
about unrecognized properties will fail; the second call toformat()
should bejoin()
instead.)Checklist
Please provide the following information:
confluent_kafka.version()
andconfluent_kafka.libversion()
): 1.4.0The text was updated successfully, but these errors were encountered: