AVRO-2247 - improved java reading performance with new reader #354

unchuckable · 2018-10-24T14:31:37Z

This is the first implementation of a proposed new reader design as described in AVRO-2247 that improves reading performance both for generic and specific records. Please let me know what you think. Classes could be consolidated into inner classes, but I did not want to spend too much aestetics work before getting feedback on whether this feature is feasible.

Feature can be enabled per GenericData or SpecfiicData instance of by setting system property org.apache.avro.fastread to true. Note that in order to see effects in Perf, it would be required to replace calls to new GenericDatumReader( schema ) with GenericData.get().createDatumReader( schema ) (this change is not included yet).

Fokko · 2018-11-07T15:01:59Z

Can you rebase and fix the merge conflict?

unchuckable · 2018-11-07T20:25:33Z

Rebased as requested, and added small change to Perf.java to use GenericData.get().createDatumReader( schema ) instead of new GenericDatumReader( schema ).
Also, using WeakIdentityHashMap instead of WeakHashMap for schema lookups for additional speedup.

As noted, am curious for any feedback and willing to work on implementation and style details. Just need to know if this is something worth pursuing.

With current changes, I get the following Perf.java comparison:

test name	time (fast read disabled)	time (fast read enabled)
FooBarSpecificRecordTestRead	5534 ms	3115 ms
GenericRead	4711 ms	3422 ms
GenericStringsRead	4902 ms	3695 ms
GenericNested_Read	7190 ms	4961 ms
GenericNestedFake_Read	2581 ms	2461 ms
GenericWithDefault_Read	8400 ms	3746 ms
GenericWithOutOfOrder_Read	4627 ms	3549 ms
GenericWithPromotion_Read	4991 ms	3673 ms
GenericOneTimeDecoderUse_Read	4618 ms	3496 ms
GenericOneTimeReaderUse_Read	7035 ms	4693 ms
GenericOneTimeUse_Read	6965 ms	4721 ms

Fokko · 2018-11-07T20:55:21Z

Wow, that is an incredible speedup. Just curious, why implement a completely new reader, instead of optimizing the existing ones? Would be nice to run the speed tests every time, to avoid performance regression.

cutting · 2018-11-07T21:57:12Z

I don't see where ExcecutionSteps are created. Is some of the code missing from the patch?

unchuckable · 2018-11-07T22:24:55Z

@Fokko - Actually, reason was twofold: For one, I was looking at the code generation of Raymie for AVRO-2090 and was considering working up a concept to do on-the-fly bytecode generation for deserialization. And coming up with something that creates an execution plan was kinda the natural first step for that. I'd really like to extend that in a way that makes the ExecutionSteps generate inlined bytecode at a later point on the fly, so they JVM can optimize even more. And on the other hand, I tried to understand the ResolvingGrammarGenerator and had a hard time with it, so I tried to build something that felt easier for me, and was kinda surprised with the results. I'm well aware it would be preferable to improve on what's already there, but I felt that the one-stage "execution plan" approach was too different from the two-stage "DatumReader and ResolvingDecoder" approach. I'm happy tho even if this PR only serves as inspiration for other changes, and am willing to assist in getting things done another way, too.

@cutting - The ExecutionSteps are created in FastReader.initializeRecordReader(...).

rstata · 2018-11-09T15:41:57Z

I've been meaning to comment on this for a while. Looking at your code quickly, I wasn't convinced that it worked for recursive records (and maybe not even for nested records). Also, the solution as posted re-implements schema resolution. The schema-resolution code is subjected to a large number of regression tests that came about because the resolution logic is subtle in places. A re-implementation of that logic should subject itself to that test suite, which yours does not.

Inspired by both your JIRA (AVRO-2247) and my own thoughts about further improving performance of reading with resolution, I have refactored the schema-resolution logic away from the resolving-grammar generation logic. I have published this in the branch 'refactor-resolving-2018-11-09 of my Avro fork. This code is "bug for bug" compatible with Avro's existing schema resolution (e.g., it implements the funky "best match" algorithm currently used for unions), and it passes the full schema-resolution regression suite.

You might want to look to see if this would be a good foundation for implementing your improvements. Start at the new Resolver class, and also look to see how ResolvingGrammarGenerator uses the output of Resolver. However, be warned that I intend to "re-write history" on this code pretty severely before proposing it as an actual improvement, so you might want to wait about a week before actually depending upon this code.

Over the last few weeks I've been working on the performance-testing suite. What I've found is that the variance between runs of this suite varies significantly: in places, over 30%! Across the board, I see variance of over 5% between runs for over 40% of the test cases. With this much variance, it's impossible to say if a proposed performance improvement is really an improvement (and impossible to tell whether or not an attempt to improve one set of performance cases has degraded performance elsewhere).

By the end of next week I should have a proposed set of changes to the performance benchmark, plus a "cookbook" for using it (on AWS), which minimizes variance between runs of the suite. With that in place, I will return to the refactor-resolving work and submit it along with testing that shows it doesn't degrade performance (and, in fact, improves it in places).

unchuckable · 2018-11-09T23:56:06Z

Thanks for your feedback, raymie. Hadn't expected to receive that much input, but I'll try to address your points:

I moved the feature switch from GenericData.createDatumReader to GenericDatumReader.read. (even tho I don't quite get why the factory method isn't the default way to instantiate a new reader, can someone help me understand?). This way, the newly written code is subject to ALL unit tests of the project. (Admittedly, I had to disable the ReusingArrayReader due to some problems there, will address that later on if desired). With little adjustments (most of which having been fixing the Exception types and messages), all tests seem to pass. If there's further test frameworks to use, please do let me know.
I am well aware of the deviations in performance measurements. That's why everyone should take Perf.java results with a grain of salt. Most tests there are VERY sensitive on even small changes and have a too large spread. Actually, for performance measurement, a module that makes use of JMH or similar would be preferable (ideally one that not only measures time, but also allocation activity). Also, structures of different depth and diversity should be checked.
It's nice to see you having pursued similar ideas with your branch. I think the main difference is that I am trying to do all verifications ONCE at Reader creation, while your take still makes use of the existing DatumReader, which requires some avoidable lookups at read time (things like SpecificData.getClass(), etc ) and ResolvingDecoder, which still relies on parser information for nearly every read operation. My approach is replacing usage of ResolvingDecoder and DatumReader, using a different approach that makes all necessary decisions only once where possible, storing results in instance variables instead of maps (hoping to affect performance positively that way, see note above). Downside of my approach in its current form is that it only works for Generic and Specific records. Have not looked into what needs to be changed in order to use the other kinds of data (reflective, protobuf, thrift), but I consider generic and specific records to be the most important use case.
Sadly, the mechanism I present does not take advantage of the generated reader code of AVRO-2090, but offers performance benefits in a similar extent (depending on actual data structures) and works for all kinds of generic and specific records
The big benefit of the current approach is the strong speedup when dealing with default values. Maybe a huge part of the gain could be achieved with a smaller change, I do agree.

I would be very grateful for some feedback on whether you consider the current approach I present worth spending more time on or whether there are more/other things that would keep it from being considered beneficial for the project.

unchuckable · 2018-11-10T07:52:17Z

Okay, just ran across a major showstopper with this approach when it comes to using default values (and subsequent modifications of the latter).

Also, the discussion above and the following study of some of the code Raymie pointed me to have helped me understand some concepts that I somehow couldn't get my head wrapped around before.
I'll try to evaluate how I can fix the problem, or how I might be able to incorporate some ideas into existing code with less intrusive action ;)

Thanks for everyone who took the time to look over my submission. I've learned aplenty from it already.

unchuckable · 2018-11-10T08:39:50Z

Note: I'd still be grateful for feedback on the concept of the readers as I tried to implement them (i.e. unifying DatumReader, ResolvingDecoder and Parser into one functional structure for faster evaluation)

unchuckable added 3 commits November 7, 2018 20:43

Added FastData classes with feature switch

cf104fd

Revert accidentially commited changes

e0eacee

Fix test path for test-with-fast-readers in java pom.xml

cfa66cd

unchuckable added 2 commits November 10, 2018 00:12

Move feature toggle to GenericDatumReader. Fix failing tests

7048969

Fix performance regression

e97c950

unchuckable closed this Nov 10, 2018

unchuckable mentioned this pull request Nov 26, 2018

AVRO-2247 - improved java reading performance with new reader #391

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AVRO-2247 - improved java reading performance with new reader #354

AVRO-2247 - improved java reading performance with new reader #354

unchuckable commented Oct 24, 2018

Fokko commented Nov 7, 2018

unchuckable commented Nov 7, 2018

Fokko commented Nov 7, 2018

cutting commented Nov 7, 2018

unchuckable commented Nov 7, 2018

rstata commented Nov 9, 2018

unchuckable commented Nov 9, 2018 •

edited

Loading

unchuckable commented Nov 10, 2018

unchuckable commented Nov 10, 2018

AVRO-2247 - improved java reading performance with new reader #354

AVRO-2247 - improved java reading performance with new reader #354

Conversation

unchuckable commented Oct 24, 2018

Fokko commented Nov 7, 2018

unchuckable commented Nov 7, 2018

Fokko commented Nov 7, 2018

cutting commented Nov 7, 2018

unchuckable commented Nov 7, 2018

rstata commented Nov 9, 2018

unchuckable commented Nov 9, 2018 • edited Loading

unchuckable commented Nov 10, 2018

unchuckable commented Nov 10, 2018

unchuckable commented Nov 9, 2018 •

edited

Loading