-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AVRO-2247 - improved java reading performance with new reader #354
Conversation
Can you rebase and fix the merge conflict? |
Rebased as requested, and added small change to Perf.java to use As noted, am curious for any feedback and willing to work on implementation and style details. Just need to know if this is something worth pursuing. With current changes, I get the following Perf.java comparison:
|
Wow, that is an incredible speedup. Just curious, why implement a completely new reader, instead of optimizing the existing ones? Would be nice to run the speed tests every time, to avoid performance regression. |
I don't see where ExcecutionSteps are created. Is some of the code missing from the patch? |
@Fokko - Actually, reason was twofold: For one, I was looking at the code generation of Raymie for AVRO-2090 and was considering working up a concept to do on-the-fly bytecode generation for deserialization. And coming up with something that creates an execution plan was kinda the natural first step for that. I'd really like to extend that in a way that makes the ExecutionSteps generate inlined bytecode at a later point on the fly, so they JVM can optimize even more. And on the other hand, I tried to understand the ResolvingGrammarGenerator and had a hard time with it, so I tried to build something that felt easier for me, and was kinda surprised with the results. I'm well aware it would be preferable to improve on what's already there, but I felt that the one-stage "execution plan" approach was too different from the two-stage "DatumReader and ResolvingDecoder" approach. I'm happy tho even if this PR only serves as inspiration for other changes, and am willing to assist in getting things done another way, too. @cutting - The ExecutionSteps are created in |
I've been meaning to comment on this for a while. Looking at your code quickly, I wasn't convinced that it worked for recursive records (and maybe not even for nested records). Also, the solution as posted re-implements schema resolution. The schema-resolution code is subjected to a large number of regression tests that came about because the resolution logic is subtle in places. A re-implementation of that logic should subject itself to that test suite, which yours does not. Inspired by both your JIRA (AVRO-2247) and my own thoughts about further improving performance of reading with resolution, I have refactored the schema-resolution logic away from the resolving-grammar generation logic. I have published this in the branch You might want to look to see if this would be a good foundation for implementing your improvements. Start at the new Resolver class, and also look to see how ResolvingGrammarGenerator uses the output of Resolver. However, be warned that I intend to "re-write history" on this code pretty severely before proposing it as an actual improvement, so you might want to wait about a week before actually depending upon this code. Over the last few weeks I've been working on the performance-testing suite. What I've found is that the variance between runs of this suite varies significantly: in places, over 30%! Across the board, I see variance of over 5% between runs for over 40% of the test cases. With this much variance, it's impossible to say if a proposed performance improvement is really an improvement (and impossible to tell whether or not an attempt to improve one set of performance cases has degraded performance elsewhere). By the end of next week I should have a proposed set of changes to the performance benchmark, plus a "cookbook" for using it (on AWS), which minimizes variance between runs of the suite. With that in place, I will return to the |
Thanks for your feedback, raymie. Hadn't expected to receive that much input, but I'll try to address your points:
I would be very grateful for some feedback on whether you consider the current approach I present worth spending more time on or whether there are more/other things that would keep it from being considered beneficial for the project. |
Okay, just ran across a major showstopper with this approach when it comes to using default values (and subsequent modifications of the latter). Also, the discussion above and the following study of some of the code Raymie pointed me to have helped me understand some concepts that I somehow couldn't get my head wrapped around before. Thanks for everyone who took the time to look over my submission. I've learned aplenty from it already. |
Note: I'd still be grateful for feedback on the concept of the readers as I tried to implement them (i.e. unifying |
This is the first implementation of a proposed new reader design as described in AVRO-2247 that improves reading performance both for generic and specific records. Please let me know what you think. Classes could be consolidated into inner classes, but I did not want to spend too much aestetics work before getting feedback on whether this feature is feasible.
Feature can be enabled per GenericData or SpecfiicData instance of by setting system property
org.apache.avro.fastread
totrue
. Note that in order to see effects in Perf, it would be required to replace calls tonew GenericDatumReader( schema )
withGenericData.get().createDatumReader( schema )
(this change is not included yet).