NIFI-5805: Pool the BinaryEncoders used by the WriteAvroResultWithExt…#3160
NIFI-5805: Pool the BinaryEncoders used by the WriteAvroResultWithExt…#3160markap14 wants to merge 2 commits intoapache:masterfrom
Conversation
…ernalSchema writer. Unfortunately, the writer that embeds schemas does not allow for this optimization due to the Avro API
|
Performance difference when iterating 10 million times, each time creating a new avro writer and writing a single record, was an improvement from about 58 seconds to about 12 seconds. So approximately 5x improvement in this use case. |
ijokarumawak
left a comment
There was a problem hiding this comment.
@markap14 5x perf improvement is awesome! The change looks good. However, I posted few comments for usability. Please take a look. Thanks!
| .required(true) | ||
| .build(); | ||
|
|
||
| static final PropertyDescriptor ENCODER_POOL_SIZE = new Builder() |
There was a problem hiding this comment.
Since this configuration is performance related and depends on environment, I'd suggest supporting variable registry EL.
| .name("encoder-pool-size") | ||
| .displayName("Encoder Pool Size") | ||
| .description("Avro Writers require the use of an Encoder. Creation of Encoders is expensive, but once created, they can be reused. This property controls the maximum number of Encoders that" + | ||
| " can be pooled and reused. Setting this value too small can result in degraded performance, but setting it higher can result in more heap being used.") |
There was a problem hiding this comment.
Just for clarification, I'd suggest adding a note mentioning that, this property doesn't have any effect with 'Embed Avro Schema' strategy.
| this.recycleQueue = recycleQueue; | ||
|
|
||
| BinaryEncoder reusableEncoder = recycleQueue.poll(); | ||
| encoder = EncoderFactory.get().blockingBinaryEncoder(buffered, reusableEncoder); |
There was a problem hiding this comment.
Probably, we should add a debug log here to provide information whether current number of pool size fits the actual usage. If there are more null reusableEncorder and user want to improve performance, then they can increase pool size ... etc.
|
@ijokarumawak thanks for the review - and all great points! New commit coming momentarily. |
|
New commit pushed. |
|
@markap14 The updates look good to me. I'm +1 on this PR. However, there are some conflicts with the recent change and I think it's safe for you to resolve the conflict instead of me doing that. Please address conflicts and merge it. Thanks! |
…ernalSchema writer. Unfortunately, the writer that embeds schemas does not allow for this optimization due to the Avro API
Thank you for submitting a contribution to Apache NiFi.
In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:
For all changes:
Is there a JIRA ticket associated with this PR? Is it referenced
in the commit message?
Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
Has your PR been rebased against the latest commit within the target branch (typically master)?
Is your initial contribution a single, squashed commit?
For code changes:
For documentation related changes:
Note:
Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.