-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AVRO-2247 - improved java reading performance with new reader #391
Conversation
I will play with it. To get it to build, I added licenses to all the files: https://github.com/rstata-projects/avro/tree/unchuckable-fast-avro For some reason I can't issue a pull request to your fork, can you pull this change from my repo? Also, how do you invoke the benchmark? |
Hi, @rstata. First of all, thanks for looking into it. It means a lot. I'm sorry about the license files; totally forgot about them files this time 😞 I pulled your change from your repo and pushed it into mine. No clue what's up with github and the pull request there, if anybody has a pointer on what I would need to set in my repo, any advice is welcome. Invoking the benchmark: By default, it will use 5 warmup iterations and 5 measurement iterations with 10 seconds each, and do all of that 5 times, which totals up to almost 3 hours, but it can easily be reduced to more reasonable limits (20 minutes), like: The current benchmark classes are only a small excerpt of cases of Perf.java (but trying to replicate them as good as possible). I can gladly add more if it helps the project; it might make sense to move that to a different ticket though, I guess. |
I've run your code against
The last sub-column of "avro-2274 (...) vs" results is the more relevant. What we see here are a large number of record-related cases showing speedups of 20-30% and even more. This is very promising. I am currently running the JMH-based benchmarks. These do not have an (obvious) mechanism for comparing the "before/after" performance of your proposed changes, but I will be interested in seeing if they do better in reducing the variance between runs. I haven't inspected your code yet. I'll do that as well, and offer some opinions. |
I agree that JMH will still be hard pressed for before/after comparisons, unless the change can be toggled with a feature switch at runtime (which fortunately is the case with the proposed change). |
Am currently refactoring the code, to use the refactored |
On the one hand, the performance results I posted a few days ago certainly demonstrate there is some perfomance improvements to be had for GenericDatumReader. On the other hand, this change introduces 2800 lines of new code that looks like it'd be tedious to maintain. Also, the comparison here isn't apples to apples, because the old code is more aggressive about reusing objects, and it attempts to apply conversions, which is pure overhead for the performance tests we're using but aren't in other cases. Finally, looking more closely at GenericDatumReader, it has built into it a BUNCH of "customization" points -- methods and objects that can be replaced to customize the reading process, all of which add overhead in the inner-most loop. It's not clear whether how much of the performance gains come from the pre-computation of actions versus simply getting rid of all these customization points. I'm tempted to extend the AVRO-2275 work so that the Action-tree generated by Resolver is a complete mirror of the reader's schema (right now, it stops at DoNothing nodes, which for Unions in particular could be pretty high-up in the schema's tree). Then one could write a FastGenericDatumReader class that simply walks that tree to decode the object. I suspect the resulting code would be on the order of 100 lines and would capture almost all the speed found in this fast-avro patch. (And one could decorate the Action objects with any Conversions for LogicalTypes found in the reader's schema, making it quick and easy to apply conversions while doing the walk.) |
@unchuckable -- send an email to "rstata - at - yahoo - . - com" to better coordinate. Thanks. |
Updated PR to reflect current state of the work, based on Raymie's work in #395 (his changes are contained within this PR, too, and the overall code change of this PR has condensed down to less than 800 lines of code; his refactoring helped lots and made code a lot clearer!) and resolved merge conflicts with current main branch. As the code comes with a feature switch and leaves most things entirely untouched when deactivated, I'd love to see it considered for inclusion in a near-future release, then it might be possible to get some feedback on the real-life effects of the proposed reader design. Feedback welcome. |
This is pretty awesome. I agree with @rstata that we can probably make this more maintainable with some structural changes. Making it "reader schema shaped" is probably ideal. It is similar to ideas I had oh, ... a long time ago when I was heavily contributing to Avro. I discussed some of them with @cutting long ago. The fundamental issue with the old approach is three-fold:
I have thought of this sort of work as the first step. It is an "interpreter" of instructions for transforming data The next step would be to compile those JIT style, if needed; In some cases we might want the interpreter only, if the iteration count is not high enough, in others we can aggressively compile. As for bytecode rewriting later, there are two options I see:
The latter may be fastest after some warmup time (and is easier), but the former will be faster before the JIT has a chance to warm up. |
We introduced automatic code formatting. For more info see the "How to contribute" page. This probably affected this PR can you please rebase it. |
Hi there. Nice to see there's still interest in this. I had been planning to wait with rebasing on @rstata to finish his work on the resolver that I make use of, but I think I'll try to get it all into a merge-able state both function and formatting-wise again since I didn't hear from Raymie in like forever. Will hopefully find some time to do that soon. |
Okay, I pulled myself together and rebased the whole work on the current master branch, incorporating only those parts of AVRO-2275 that have become a dependency for my code to work. @scottcarey - I very much agree with your approach and your perspective. With the approach I present, individual I'll be glad for comments, or (favourably) a positive merge decision 😉 |
Hello, fellas. Since this MR has been in mergeable state and non-considered for a long time despite some interest from various sides, is there any work you'd like to see to get things considered for the main branch? Performance comparisons? Or should I just drop the matter? I'm willing to invest more work into it, but I'd need some pointers in case more work is required. |
lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java
Outdated
Show resolved
Hide resolved
lang/java/avro/src/main/java/org/apache/avro/generic/GenericDatumReader.java
Outdated
Show resolved
Hide resolved
I took a quick look. Seems much more maintainable. Can you post what are the performance improvements? |
Thanks Jamie for taking your time to review the change. I'd suggest the following:
I wanna verify performance improvements, since it has been quite some time since the original code was submitted, and performance impact might vary. |
Alright, I just tested a merge of the code with the current I ran the Perf-Tests that are supposed to be influenced by the changes of this pull request. Without FastRead (
With FastRead (
Observations:
I'll do the nit-fixes in a bit. Please get back at me with your opinion. |
@rstata, anyone - comments? Or shall we rather just drop this one? |
Hello! I'm pretty confident that this is safe to merge, especially having no effect when org.apache.avro.fastread is turned off. I've reproduced (informally) the performance benefits of turning it on, and the technique looks sound. There was a lot of discussion of future work and theory in this and the old PR, so having this in master and/or a release would definitely be motivating and inspiring to continue on the path, as well as help find regressions (if any!) in the new reader. I think one of the reasons this PR took so long is that it's tricky to change the APIs -- we love our stability and backwards compatibility! I think if we had AVRO-2083 or similar already in place, this would have been long accepted... Thanks for this work! I'm going to send a message to the dev@ mailing list to see if we can get this a release sooner-rather-than-later! |
Thank you, @RyanSkraba. If you think it safe, it might be interesting to even backport the feature, behind a feature switch, to 1.8 or 1.9, to be able to get more feedback on the approach from people who might be locked from 1.9 by incompatibilities. |
FYI: if you are interested in Avro performance, I would recommend checking out the fast-serde module of our avro-util project: https://github.com/linkedin/avro-util It is a fork of RTB House's fast-serde project, with a bunch of additional performance and compatibility improvements. It does runtime code-gen to build tailored serializer and deserializer classes, one for each reader/writer pair. It works with GR and SR, and with Avro versions 1.4, 1.7 and 1.8. It can probably work with the other Avro versions as well (1.5, 1.6, 1.9), thanks to our compatibility helper module (also in the same project), but we simply haven't tested those versions yet. Cheers. -F |
Cannot reopen the original PR (#354), since I've rebased to current master.
I've tried to adress the points that @rstata brought up with my approach. The feature switch between traditional and newly suggested reader mechanism now is done inside
GenericDatumReader
. All tests provided with the avro project run smoothly (I stole @rstata's idea to trigger the tests an additional time with the feature switch enabled). Also fixed defaulting in a way that takes advantage of immutable values and only actually re-reads default objects with a distinct decoder when really required.If there is any more things that would need testing, please do give me a pointer.
Overall, the newly proposed writer sacrifices time building a
DatumReader
, allowing it to perform the actual reading at a highly improved rate. For all applications that are remotely "big data", that tradeoff should turn out highly beneficial.I also included a small module (
benchmark
) that uses JMH to test the performance of the proposed reader approach against the current generic reader. Using JMH should be preferable to Perf.java, for it allows to perform benchmarks in a controlled and statistical significant way.As stated in the last PR, I'm open to any changes, fire ahead. It's the overall concept and its aparent reader performance gains that I'm chasing after, not having my implementation find its way into the main branch 1:1.