Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LUCENE-10657: CopyBytes now saves one memory copy on ByteBuffersDataOutput #1034

Merged
merged 4 commits into from
Jul 19, 2022

Conversation

luyuncheng
Copy link
Contributor

This is derived from LUCENE-10627 at #987
Jira: https://issues.apache.org/jira/browse/LUCENE-10657

The abstract method copyBytes in DataOutput have to copy from input to a copyBuffer and then write into ByteBuffersDataOutput.blocks, i think there is unnecessary, we can override it, copy directly from input into output.

with override this method, we can:

  1. Reduce memory copy in Lucene90CompressingStoredFieldsWriter#copyOneDoc -> bufferdDocs.copyBytes(DataInput input)
  2. Reduce memory copy in Lucene90CompoundFormat.writeCompoundFile -> data.copyBytes when input is BufferedChecksumIndexinput and output is ByteBuffersDataOutput
  3. Reduce memory IndexWriter#copySegmentAsIs ->CopyFrom -> copyBytes

…eDoc -> bufferdDocs.copyBytes(DataInput input)

Abstract method copyBytes need to copy from input to a buffer and then write into ByteBuffersDataOutput, i think there is unnecessary, we can override it, copy directly from input into output
Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic makes sense to me, can you add unit tests?

1. Add UnitTest For ByteBuffersDataOutput.copyBytes
@luyuncheng
Copy link
Contributor Author

The logic makes sense to me, can you add unit tests?

Thanks for your review, in commits 926dd i added check for the direct memory, and i added unit tests for different allocator in copyBytes

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, can you add a CHANGES entry that copyBytes now saves one memory copy on ByteBuffersDataOutput?

@luyuncheng luyuncheng changed the title LUCENE-10657: Reduce memory copy in DataOutput copyBytes LUCENE-10657: CopyBytes now saves one memory copy on ByteBuffersDataOutput Jul 18, 2022
@luyuncheng
Copy link
Contributor Author

luyuncheng commented Jul 18, 2022

This looks good to me, can you add a CHANGES entry that copyBytes now saves one memory copy on ByteBuffersDataOutput?

@jpountz Thanks for your review and suggestions, I have been updated the CHANGES entry

@jpountz jpountz merged commit e5bf76b into apache:main Jul 19, 2022
jpountz pushed a commit that referenced this pull request Jul 19, 2022
…utput (#1034)

Abstract method copyBytes need to copy from input to a buffer and then write into ByteBuffersDataOutput, i think there is unnecessary, we can override it, copy directly from input into output
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants