-
Notifications
You must be signed in to change notification settings - Fork 483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ORC-1286: [C++] replace DataBuffer with BlockBuffer in class BufferedOutputStream #1275
Conversation
…OutputStream(PART II)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coderex2522 . Could you use a new JIRA instead of using PART 1
and PART II
?
Apache community uses ORC JIRA ID for trace-ability.
@dongjoon-hyun Thank you for your suggestion. I create a new JIRA ORC-1286. |
Thank you for updating the PR and creating JIRA, @coderex2522 . |
c++/src/io/OutputStream.cc
Outdated
if (mergeBlock.data + mergeBlock.size == curBlock.data) { | ||
mergeBlock.size += curBlock.size; | ||
} else { | ||
outputStream->write(mergeBlock.data, mergeBlock.size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may also limit each write not to exceed outputStream->getNaturalWriteSize()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The writeTo() of BlockBuffer will limit each write not to exceed NaturalWriteSize of outputStream.
c++/src/BlockBuffer.cc
Outdated
if (chunkSize == 0) { | ||
throw std::logic_error("Natural write size cannot be zero"); | ||
} | ||
char* chunk = memoryPool.malloc(chunkSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If blockNumber == 1, we should avoid this unnecessary allocation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if natural write size less than block size, here need to malloc chunk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, however you can still get rid of the allocation when blockNumber == 1 && naturalWriteSize >= block.size. BTW, we can keep the current implementation for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allocation can be avoided here when blockNumber == 1 && currentSize <= NaturalWriteSize.
c++/src/io/OutputStream.cc
Outdated
SCOPED_STOPWATCH(metrics, IOBlockingLatencyUs, IOCount); | ||
outputStream->write(dataBuffer->data(), dataSize); | ||
SCOPED_STOPWATCH(metrics, IOBlockingLatencyUs, nullptr); | ||
uint64_t IOCount = dataBuffer->writeTo(outputStream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return IOCount is weird. Better to pass the pointer of metrics into the writeTo function as a parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
c++/src/BlockBuffer.cc
Outdated
if (chunkSize == 0) { | ||
throw std::logic_error("Natural write size cannot be zero"); | ||
} | ||
char* chunk = memoryPool.malloc(chunkSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, however you can still get rid of the allocation when blockNumber == 1 && naturalWriteSize >= block.size. BTW, we can keep the current implementation for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM +1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM for Apache ORC 1.9.
Merged to main for Apache ORC 1.9. cc @williamhyun , too |
…OutputStream ### What changes were proposed in this pull request? This PR can solve the huge memory taken by BufferedOutputStream and refactor the write data logic in class CompressionBase. ### Why are the changes needed? This patch use BlockBuffer to replace DataBuffer of class BufferedOutputStream in order to solve the [issue](apache#1240). ### How was this patch tested? The UTs in TestBufferedOutputStream.cc and TestCompression.cc can cover this patch. Add the TestBlockBuffer.write_to UT. Closes apache#1275 from coderex2522/ORC-1280-PART2. Authored-by: coderex2522 <rex6817@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
What changes were proposed in this pull request?
This PR can solve the huge memory taken by BufferedOutputStream and refactor the write data logic in class CompressionBase.
Why are the changes needed?
This patch use BlockBuffer to replace DataBuffer of class BufferedOutputStream in order to solve the issue.
How was this patch tested?
The UTs in TestBufferedOutputStream.cc and TestCompression.cc can cover this patch. Add the TestBlockBuffer.write_to UT.