Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Avoid ClassCastException for the same class when used with Flink #509
Fix to avoid the ClassCastException for the same class when used with Flink.
It is documented that when Avro is used with Flink, a ClassCastException is thrown, due to a caching issue. The issue occurs, then the confluent kafka client is configured with "specific.avro.reader" = true, and the de-serialization is performed with the SpecificDatumReader, instead of the generic reader.
This fix extends the deserializer with an extra configuration option "force.new.specific.avro.instance" to force SpecificDatumReader to use a new instance of SpecificData when used together with specific.avro.reader.
TODO: investigate performance penalty of generating a new instance instead of using Singleton
It looks like @andrikod hasn't signed our Contributor License Agreement, yet.
You can read and sign our full Contributor License Agreement here.
Once you've signed reply with
Appreciation of efforts,
ewencp left a comment
@andrikod Barring one comment on the documentation, the change itself looks reasonable. My main question is if this is really the right place to fix this problem. This sounds a lot like an issue with the way Flink handles classloading -- it's unclear to me that this is the right place to fix the problem (case in point, the issue isn't just this deserializer, not even just Avro, but it extends to other libraries as well).
Is anything being done in Flink to address this? Is addressing it in each library that encounters the issue the right way to address it?
We are hitting exactly the same issue while integrating Flink with Confluent's avro schema registry with this Avro deserializer.
As I understand it, Avro's SpecificData() is a JVM-wide cache of Class instances and other reflection-related objects. On the other hand, when Flink is deployed in standalone mode, it is essentially a set of JVMs running on their own which load and unload the class definitions of the jobs to execute at the moment the fat jars are uploaded.
What happens is that when a new version of a Flink jobs get's re-submitted, Flink reloads the class definitions accordingly but has no way to invalidate Avro's SpecificData JVM-wide cache, which leads to the "X cannot be cast to Z" mentioned by @andrikod.
I am not aware of other frameworks that do dynamic class reloading and are also impacted by this, although it sounds plausible to me that they are all impacted in a similar fashion.
May I suggest we try to align the lifespan of the SpecificData cache instance with the lifespan of the application?
I think if we create one instance of SpecificData per instance of
@ewencp , does that sounds reasonable to you?
PS: I published on my blog more details on how we integrate FLink with Confluent's Schema Registry, for which we just discover we meet the issue above
We are having classloading issues when using
I tried sv3nd's approach by adding a field for a single instance of
To make a long story short ; this approach works!
So we can still benefit from a cached
Let me know if you like us to do a pull request containing this solution.
I think starting from Flink 1.4, this issue is now properly fixed.
Shortly put, starting from Flink 1.4, dependency management for Avro usage was modified so that Avro classes are always part of the user code and that the Avro schema cache is no longer JVM-wide.