-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AVRO-3683: [Rust] Read/Write with multiple schemas #2014
Conversation
@markfarnan The impl is almost ready but as you can see the test in |
Thanks, It's also panicing with my test case that uses slightly more complicated structs (the ones in the) PR and a struct rather than manually constructed value. I'll see if I can work out why/if its me, or I'll update the PR in the morning with test case that uses your new functions. Here is the snippet that panics inside "to_avro_datum_schemata" in case usefull.
|
cfe8512
to
2274338
Compare
The only way I see to support this is to have a let actual = to_avro_datum_schemata(&main_schema, &schemata.as_slice(), record_value).unwrap(); where |
That fits with how I was thinking it would need to be done, both for read and write. The only alternative I can think of, would be to pass the Name of the record to be used, and let the resolver find it in the Schemata. Either way, ideally there is an easy way to find the relevant schema in the Schemata slice. |
@markfarnan I've re-worked it. Now the new roundtrip test passes. |
Awesome, Will do ! |
Found a problem. The order the Schema's given to parse_list seems to matter. If they are passed in descending order of reference, everything works. If they are passed out of order for references to resolve forward, then avro_to_datum_schemata panics. (Parse List seems to manage fine with any order) i.e. if you modify your test thus:, it will panic.
This will be a problem with large schema's. For my use case some Records have references that use up to 20+ schema's, guarenteeing they are provided in the right order could be a nightmare. |
On the flip side, I tested with one slightly more complex schema, passed in the right order, and it seems to round trip correctly ! |
Hi! I take the conversation on the fly and I am wondering about a few things. If I understand it correctly, this implementation means that for a complex message (I.e with deep schemas recursion) the user of the API has to build a vector with all the schemas and track the root one. Right ? |
This PR is about the use case when there are more than one schemata and they refer to each other. |
Then the user could just use the current APIs with just one argument with the root/main schema. |
Update: I've been testing this PR for our protocol schema's, and so far it works fine.
|
ha ok! Do you think there is a performance difference between the two methods ? Considering the same schema, one time completely expanded and another time with this API |
I am pretty sure the new (multiple schemata) methods will be slower than the old (single schema)! |
thanks @martin-g ! I will give it a try. |
992cf06
to
9e1f6ec
Compare
…te/write Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
22fe1e1
to
777cf97
Compare
It is much easier to deal with. Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
…o file Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
@markfarnan I think I am ready with this PR. Please review it and test it before I merge! |
@markfarnan I see you thumped up my comment above. Could you please explicitly comment whether you have tested the changes ? Thanks! |
@markfarnan Ping! |
Checking this week. I'm still somewhat blocked by the missing upstream PR's, though I've got a temp workaround for now. |
Just wanted to add my support for this PR from the sidelines - I hope it will unblock the use of rust in my org - we have hundreds of interdependent schemas and ran across this problem during evaluation. Thanks for all your good work! |
@chupaty Do you say that you have reviewed and tested the PR with your application(s) ? |
I'd love to be able to commit more time to this. But my very brief comments are: I initially tried running it against my dataset (~200 schemas), but ran into problems with ambiguous schema defs (note that my previous workflow of using to_avro_datum(...) still works as well as it did in 0.14.0). Tried to reproduce the above with a minimal dataset (hacked test_avro_3683_schemata_writer_reader), but ran into problems when I changed the order of schemas loaded by Schema::parse_list, ie (switch schema 'a' and schema 'b'): let schemata: Vec = Schema::parse_list(&[SCHEMA_B_STR, SCHEMA_A_STR]).unwrap(); I do have concerns about the structure of the schemata in my use case (ie lots of schemas). It seems like a fairly big value of N that the schemata O(N) search uses, plus potentially some big-ish Vecs being passed around. I probably can't investigate much more in the short term, but will update when I can. |
There is a discussion about releasing Avro 1.11.2 in the dev@ mailing list. Also CC @woile @WaterKnight1998 |
LGTM, this doesn't really affect me, as the avdl parser needs to solve all the references before even dealing with values. But it's a good addition 👍🏻 |
Checking this over the weekend. |
@martin-g I confirm this works for me. Thanks !. - I think ready for merge and include in release |
* AVRO-3683: Add support for using multiple schemata for resolve/validate/write Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org> * AVRO-3683: WIP Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org> * AVRO-3683: WIP compiles Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org> * AVRO-3683: WIP compiles and all tests pass Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org> * AVRO-3683: WIP Add support for reading Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org> * AVRO-3683: WIP Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org> * AVRO-3683: WIP Use a main schema and pass all other schemata for resolution. Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org> * AVRO-3683: Formatting Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org> * AVRO-3683: WIP Add support for multiple schemata in Reader/Writer APIs Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org> * AVRO-3646: Formatting Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org> * AVRO-3683: Use Vec instead of slice reference for schemata It is much easier to deal with. Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org> * AVRO-3683: Fix the resolving of the writer schema when reading an Avro file Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org> * AVRO-3683: Cleaup Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org> --------- Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org> (cherry picked from commit b8b83b7)
Thank you, @markfarnan ! |
AVRO-3683
What is the purpose of the change
Make it possible to read/write Avro messages by using several schemata which depend on each other.
The new APIs provide methods which are similar to the single schema ones, but use a main schema and secondary ones.
The secondary ones are used to resolve any dependencies which are not in the main one.
Verifying this change
New tests are added.
Documentation