Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVRO-2247 - improved java reading performance with new reader #354

Closed
wants to merge 5 commits into from
Closed

AVRO-2247 - improved java reading performance with new reader #354

wants to merge 5 commits into from

Conversation

unchuckable
Copy link
Contributor

This is the first implementation of a proposed new reader design as described in AVRO-2247 that improves reading performance both for generic and specific records. Please let me know what you think. Classes could be consolidated into inner classes, but I did not want to spend too much aestetics work before getting feedback on whether this feature is feasible.

Feature can be enabled per GenericData or SpecfiicData instance of by setting system property org.apache.avro.fastread to true. Note that in order to see effects in Perf, it would be required to replace calls to new GenericDatumReader( schema ) with GenericData.get().createDatumReader( schema ) (this change is not included yet).

@Fokko
Copy link
Contributor

Fokko commented Nov 7, 2018

Can you rebase and fix the merge conflict?

@unchuckable
Copy link
Contributor Author

Rebased as requested, and added small change to Perf.java to use GenericData.get().createDatumReader( schema ) instead of new GenericDatumReader( schema ).
Also, using WeakIdentityHashMap instead of WeakHashMap for schema lookups for additional speedup.

As noted, am curious for any feedback and willing to work on implementation and style details. Just need to know if this is something worth pursuing.

With current changes, I get the following Perf.java comparison:

test name time (fast read disabled) time (fast read enabled)
FooBarSpecificRecordTestRead 5534 ms 3115 ms
GenericRead 4711 ms 3422 ms
GenericStringsRead 4902 ms 3695 ms
GenericNested_Read 7190 ms 4961 ms
GenericNestedFake_Read 2581 ms 2461 ms
GenericWithDefault_Read 8400 ms 3746 ms
GenericWithOutOfOrder_Read 4627 ms 3549 ms
GenericWithPromotion_Read 4991 ms 3673 ms
GenericOneTimeDecoderUse_Read 4618 ms 3496 ms
GenericOneTimeReaderUse_Read 7035 ms 4693 ms
GenericOneTimeUse_Read 6965 ms 4721 ms

@Fokko
Copy link
Contributor

Fokko commented Nov 7, 2018

Wow, that is an incredible speedup. Just curious, why implement a completely new reader, instead of optimizing the existing ones? Would be nice to run the speed tests every time, to avoid performance regression.

@cutting
Copy link
Contributor

cutting commented Nov 7, 2018

I don't see where ExcecutionSteps are created. Is some of the code missing from the patch?

@unchuckable
Copy link
Contributor Author

@Fokko - Actually, reason was twofold: For one, I was looking at the code generation of Raymie for AVRO-2090 and was considering working up a concept to do on-the-fly bytecode generation for deserialization. And coming up with something that creates an execution plan was kinda the natural first step for that. I'd really like to extend that in a way that makes the ExecutionSteps generate inlined bytecode at a later point on the fly, so they JVM can optimize even more. And on the other hand, I tried to understand the ResolvingGrammarGenerator and had a hard time with it, so I tried to build something that felt easier for me, and was kinda surprised with the results. I'm well aware it would be preferable to improve on what's already there, but I felt that the one-stage "execution plan" approach was too different from the two-stage "DatumReader and ResolvingDecoder" approach. I'm happy tho even if this PR only serves as inspiration for other changes, and am willing to assist in getting things done another way, too.

@cutting - The ExecutionSteps are created in FastReader.initializeRecordReader(...).

@rstata
Copy link
Contributor

rstata commented Nov 9, 2018

I've been meaning to comment on this for a while. Looking at your code quickly, I wasn't convinced that it worked for recursive records (and maybe not even for nested records). Also, the solution as posted re-implements schema resolution. The schema-resolution code is subjected to a large number of regression tests that came about because the resolution logic is subtle in places. A re-implementation of that logic should subject itself to that test suite, which yours does not.

Inspired by both your JIRA (AVRO-2247) and my own thoughts about further improving performance of reading with resolution, I have refactored the schema-resolution logic away from the resolving-grammar generation logic. I have published this in the branch 'refactor-resolving-2018-11-09 of my Avro fork. This code is "bug for bug" compatible with Avro's existing schema resolution (e.g., it implements the funky "best match" algorithm currently used for unions), and it passes the full schema-resolution regression suite.

You might want to look to see if this would be a good foundation for implementing your improvements. Start at the new Resolver class, and also look to see how ResolvingGrammarGenerator uses the output of Resolver. However, be warned that I intend to "re-write history" on this code pretty severely before proposing it as an actual improvement, so you might want to wait about a week before actually depending upon this code.

Over the last few weeks I've been working on the performance-testing suite. What I've found is that the variance between runs of this suite varies significantly: in places, over 30%! Across the board, I see variance of over 5% between runs for over 40% of the test cases. With this much variance, it's impossible to say if a proposed performance improvement is really an improvement (and impossible to tell whether or not an attempt to improve one set of performance cases has degraded performance elsewhere).

By the end of next week I should have a proposed set of changes to the performance benchmark, plus a "cookbook" for using it (on AWS), which minimizes variance between runs of the suite. With that in place, I will return to the refactor-resolving work and submit it along with testing that shows it doesn't degrade performance (and, in fact, improves it in places).

@unchuckable
Copy link
Contributor Author

unchuckable commented Nov 9, 2018

Thanks for your feedback, raymie. Hadn't expected to receive that much input, but I'll try to address your points:

  • I moved the feature switch from GenericData.createDatumReader to GenericDatumReader.read. (even tho I don't quite get why the factory method isn't the default way to instantiate a new reader, can someone help me understand?). This way, the newly written code is subject to ALL unit tests of the project. (Admittedly, I had to disable the ReusingArrayReader due to some problems there, will address that later on if desired). With little adjustments (most of which having been fixing the Exception types and messages), all tests seem to pass. If there's further test frameworks to use, please do let me know.

  • I am well aware of the deviations in performance measurements. That's why everyone should take Perf.java results with a grain of salt. Most tests there are VERY sensitive on even small changes and have a too large spread. Actually, for performance measurement, a module that makes use of JMH or similar would be preferable (ideally one that not only measures time, but also allocation activity). Also, structures of different depth and diversity should be checked.

  • It's nice to see you having pursued similar ideas with your branch. I think the main difference is that I am trying to do all verifications ONCE at Reader creation, while your take still makes use of the existing DatumReader, which requires some avoidable lookups at read time (things like SpecificData.getClass(), etc ) and ResolvingDecoder, which still relies on parser information for nearly every read operation. My approach is replacing usage of ResolvingDecoder and DatumReader, using a different approach that makes all necessary decisions only once where possible, storing results in instance variables instead of maps (hoping to affect performance positively that way, see note above). Downside of my approach in its current form is that it only works for Generic and Specific records. Have not looked into what needs to be changed in order to use the other kinds of data (reflective, protobuf, thrift), but I consider generic and specific records to be the most important use case.

  • Sadly, the mechanism I present does not take advantage of the generated reader code of AVRO-2090, but offers performance benefits in a similar extent (depending on actual data structures) and works for all kinds of generic and specific records

  • The big benefit of the current approach is the strong speedup when dealing with default values. Maybe a huge part of the gain could be achieved with a smaller change, I do agree.

I would be very grateful for some feedback on whether you consider the current approach I present worth spending more time on or whether there are more/other things that would keep it from being considered beneficial for the project.

@unchuckable
Copy link
Contributor Author

Okay, just ran across a major showstopper with this approach when it comes to using default values (and subsequent modifications of the latter).

Also, the discussion above and the following study of some of the code Raymie pointed me to have helped me understand some concepts that I somehow couldn't get my head wrapped around before.
I'll try to evaluate how I can fix the problem, or how I might be able to incorporate some ideas into existing code with less intrusive action ;)

Thanks for everyone who took the time to look over my submission. I've learned aplenty from it already.

@unchuckable
Copy link
Contributor Author

Note: I'd still be grateful for feedback on the concept of the readers as I tried to implement them (i.e. unifying DatumReader, ResolvingDecoder and Parser into one functional structure for faster evaluation)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants