[BEAM-9041, BEAM-9042] SchemaCoder equals should not rely on from/toRowFunction equality#10492
Conversation
TheNeuralBit
left a comment
There was a problem hiding this comment.
I made sure we had a good equals for most of the toRow/fromRow functions we generate when I changed SchemaCoder#equals in #9493. It looks like I missed some since these lambdas are still here, but I would prefer adding the equals rather than just reverting to byte equality if possible.
You have a good point that new implementations could easily neglect to implement equals though. Maybe we could mitigate that by defining our own interface that requires equals to be implemented?
cc: @kennknowles
There was a problem hiding this comment.
Couldn't you just implement equals here instead of changing to comparing byte equality of the serialized function in SchemaCoder?
There was a problem hiding this comment.
This is a second different issue about capture of Avro schema on serialization (the key change is the transient part) so not really related to equals. As explained above I put both together because I use equality to validate the roundtrip of serialization/deserialization.
|
This is not a revert. Previous version did not compare from/toRow functions for equality. Do you have any suggestion on how to compare both functions? It is not really clear to me how to do so in particular for functions with no state. |
|
Sorry revert was a poor choice of words. I just meant a big part of that PR
was making equals work for the functions produced by
GetterBasedSchemaProvider so I would like it if they continued to be used.
For a function without any state I would write the equals function to just
compare this.getClass().equals(other.getClass()) since any two instances
will do the same thing.
…On Mon, Jan 6, 2020 at 2:00 PM Ismaël Mejía ***@***.***> wrote:
This is not a revert. Previous version did not compare from/toRow
functions for equality. Do you have any suggestion on how to compare both
functions? It is not really clear to me how to do so in particular for
functions with no state.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#10492?email_source=notifications&email_token=AAFEZ3356TFQGYYPMGZ3GJTQ4OSV3A5CNFSM4KCIC5T2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIG5ZTI#issuecomment-571333837>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFEZ3543QDX3QM2FHQIIQ3Q4OSV3ANCNFSM4KCIC5TQ>
.
|
|
Hmm. I am torn on this. I agree with @TheNeuralBit that mostly these conversions should just own their own In the case of the |
|
I see the things more clearly now, thanks for the explanation @TheNeuralBit. I was not aware of the goals of the other PR. I have one question related to that PR. Why did you use the strict conversion (no widening/narrowing) from/to row only on the schemaCoder that generates GenericRecords |
|
After thinking more on the issue probably we don't have a nice solution for it. We cannot define So the alternative would be to change the signature of SchemaCoder to use a new class that implements SerialiableFunction and equals, but this will require many changes for few returns, maybe the easiest thing to do here is the right one, just to document that the functions to convert from/to rows in SchemaCoder must implement equals/hashCode. WDYT? I will update the PR with the fix for the Avro case in the meantime. |
5d5b015 to
dfa7d69
Compare
|
PR updated I used your equals suggestion to tackle the first part instead of the byte equality change and improved the serialization issue of the internal Avro schema. Now we should be good to go. PTAL @TheNeuralBit |
TheNeuralBit
left a comment
There was a problem hiding this comment.
This LGTM except that it seems to have broken some integration tests? I'm not sure what went wrong.
Why did you use the strict conversion (no widening/narrowing) from/to row only on the schemaCoder that generates GenericRecords AvroUtils#schemaCoder(java.lang.Class) and not in the others. Or maybe the real question is why we need to do this strictly?
I think this is a question for @reuvenlax - in my PR I tried to just preserve the behavior that he had implemented previously. (Primarily in #7290)
There was a problem hiding this comment.
I'm not 100% sure but it looks like the failures are occurring when avroSchema is null. Either way I think you need to check if avroSchemaAsString is null here.
There was a problem hiding this comment.
Yes you are right I forgot to check the nullability of the string before the parse, I will fix that and add a method for this. Hopefully everything will be green at that moment. Thanks for the hint.
cf76be7 to
fe17671
Compare
|
Merging now that the tests are green. |
|
Sorry if I rushed a bit the merge @TheNeuralBit I just wanted to cherry pick it to unblock 2.18.0 release. |
|
Not at all, sorry I didn't approve sooner. |
This PR fixes both issues because (1) one fix depends on the other, and (2) to make it easier to validate/cherry pick into 2.18.0's branch.
BEAM-9041: Don't rely on equality for the from/to functions in
SchemaCoderbecause nobody implements equals onSerializableFunction. I tried to do this with byte equality, maybe there is a better way to do it but I could not think of another.BEAM-9042: Since Avro's
Schemaclass is notSerializableI made ittransient. Another approach could have been to transform the Schema into a String (not sure if this is needed but I can change it if you think it is worth).R: @TheNeuralBit