Skip to content

NIFI-5805: Pool the BinaryEncoders used by the WriteAvroResultWithExt…#3160

Closed
markap14 wants to merge 2 commits intoapache:masterfrom
markap14:NIFI-5805
Closed

NIFI-5805: Pool the BinaryEncoders used by the WriteAvroResultWithExt…#3160
markap14 wants to merge 2 commits intoapache:masterfrom
markap14:NIFI-5805

Conversation

@markap14
Copy link
Contributor

@markap14 markap14 commented Nov 8, 2018

…ernalSchema writer. Unfortunately, the writer that embeds schemas does not allow for this optimization due to the Avro API

Thank you for submitting a contribution to Apache NiFi.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

  • Is there a JIRA ticket associated with this PR? Is it referenced
    in the commit message?

  • Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.

  • Has your PR been rebased against the latest commit within the target branch (typically master)?

  • Is your initial contribution a single, squashed commit?

For code changes:

  • Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder?
  • Have you written or updated unit tests to verify your changes?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly?
  • If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly?
  • If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties?

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered?

Note:

Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.

…ernalSchema writer. Unfortunately, the writer that embeds schemas does not allow for this optimization due to the Avro API
@markap14
Copy link
Contributor Author

markap14 commented Nov 8, 2018

Performance difference when iterating 10 million times, each time creating a new avro writer and writing a single record, was an improvement from about 58 seconds to about 12 seconds. So approximately 5x improvement in this use case.

Copy link
Member

@ijokarumawak ijokarumawak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markap14 5x perf improvement is awesome! The change looks good. However, I posted few comments for usability. Please take a look. Thanks!

.required(true)
.build();

static final PropertyDescriptor ENCODER_POOL_SIZE = new Builder()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this configuration is performance related and depends on environment, I'd suggest supporting variable registry EL.

.name("encoder-pool-size")
.displayName("Encoder Pool Size")
.description("Avro Writers require the use of an Encoder. Creation of Encoders is expensive, but once created, they can be reused. This property controls the maximum number of Encoders that" +
" can be pooled and reused. Setting this value too small can result in degraded performance, but setting it higher can result in more heap being used.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for clarification, I'd suggest adding a note mentioning that, this property doesn't have any effect with 'Embed Avro Schema' strategy.

this.recycleQueue = recycleQueue;

BinaryEncoder reusableEncoder = recycleQueue.poll();
encoder = EncoderFactory.get().blockingBinaryEncoder(buffered, reusableEncoder);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, we should add a debug log here to provide information whether current number of pool size fits the actual usage. If there are more null reusableEncorder and user want to improve performance, then they can increase pool size ... etc.

@markap14
Copy link
Contributor Author

markap14 commented Nov 9, 2018

@ijokarumawak thanks for the review - and all great points! New commit coming momentarily.

@markap14
Copy link
Contributor Author

markap14 commented Nov 9, 2018

New commit pushed.

@ijokarumawak
Copy link
Member

@markap14 The updates look good to me. I'm +1 on this PR. However, there are some conflicts with the recent change and I think it's safe for you to resolve the conflict instead of me doing that. Please address conflicts and merge it. Thanks!

@asfgit asfgit closed this in d3b1674 Nov 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants