-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AVRO-3240: fix deserializer schema backward compatibility #1379
Conversation
I cannot comment without seeing what exactly does not work :-/ |
Ok I'm sorry if my description is not clear enough. Let's see if my Rust is good enough to write a great test now 👍 |
Ok @martin-g in my quest to write a test for this (I fail so far) I found something that I clearly do not understand but which triggers the same Error than my initial problem here. Disclaimer : I'm not sure if they are related, but since I don't understand how to fix this failing test below, I dunno... Basically, you can't call #[test]
fn test_from_avro_datum_multiple() {
let schema = Schema::parse_str(SCHEMA).unwrap();
let mut encoded: &'static [u8] = &[54, 6, 102, 111, 111];
let mut record = Record::new(&schema).unwrap();
record.put("a", 27i64);
record.put("b", "foo");
let expected = record.into();
from_avro_datum(&schema, &mut encoded, None).unwrap();
assert_eq!(
from_avro_datum(&schema, &mut encoded, None).unwrap(),
expected
); Gives
|
df892f4
to
cf520b4
Compare
@martin-g I finally tracked down the problem to the The root cause is that I fixed the |
This is expected. You try to unwrap a None the second time. |
Cool! I will take a look soon! |
I'm sorry but I'm not sure I'm following you here. Calling the same function twice with the same parameters is an expected failure? My understanding is that Rust's Array &[u8] does implement the For instance, using a |
You understood it correctly! With https://doc.rust-lang.org/std/io/trait.Read.html#impl-Read-2 says
If you want to |
10e7162
to
ffe34c7
Compare
Sorry for the late reply @martin-g I took some time off, now back to business! I pushed the corrections you asked for. |
I've checked out your PR locally but I am not able to push to your branch. Anyway, I've fixed the formatting and now I will have to squash all commits for this PR into one and then push it to master. |
Actually, I could merge the PR and then fix the formatting in a follow up commit! |
Done! |
Done! |
I don't understand why this is required: according to the spec, the exact writer schema should be made available at deserialization, while this feature seems to be about deserializing while specifying a different "writer schema" than the data was actually written with. |
@Ten0 |
Yes and when using that we normally still need to specify the actual writer schema, and reader schema is provided as a separate argument. avro/lang/rust/avro/src/reader.rs Lines 573 to 588 in e22f029
I can't find mention of schema evolution that is supposed to work this way in the spec. There is something similar called "schema compatibility", but it essentially says "if your data happens to be wire compatible that's fine":
and that's not the case here because it also says null fields should still normally always be serialized as the union discriminant, even at the end of the record:
where schema evolution refers, as you mentioned, to the reader schema process. |
When providing your own schema to
from_avro_datum
,the deserialization is not backward compatible with messages
containing a previous schema even if the schemas are created
to be backward compatible.
This is due to the
decode_variable
function inutils
returningError::ReadVariableIntegerBytes
when the reader object issmaller than expected and thus cannot fill the read_exact buffer.
Reading a message generated with an older schema version than the
one we are statically using, the payload buffer is by essence
smaller than the expected payload from a message serialized with
a newer schema (backward compatible schemas are basically append
only).
The proposed fix takes that into account and just returns empty
Ok() which will be interpreted as null. This is in line with the
fact that a backward compatible schema has to allow for null
values to the newly added fields.
Make sure you have checked all steps below.
Jira
Tests