Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-24266 #1576

Merged
merged 2 commits into from Oct 14, 2020
Merged

HIVE-24266 #1576

merged 2 commits into from Oct 14, 2020

Conversation

szlta
Copy link
Contributor

@szlta szlta commented Oct 13, 2020

in HDFS environment if a writer is using hflush to write ORC ACID files during a transaction commit, the results might be seen as missing when reading the table before this file is completely persisted to disk (thus synced)

This is due to hflush not persisting the new buffers to disk, it rather just ensures that new readers can see the new content. This causes the block information to be incomplete, on which BISplitStrategy relies on. Although the side file (_flush_length) tracks the proper end of the file that is being written, this information is neglected in the favour of block information, and we may end up generating a very short split instead of the larger, available length.
When ETLSplitStrategy is used there is not even a try to rely on ACID side file when calculating file length, so that needs to fixed too.

Moreover we might see the newly committed rows not to appear due to OrcTail caching in ETLSplitStrategy. For now I'm just going to recommend turning that cache off to anyone that wants real time row updates to be read in:

set hive.orc.cache.stripe.details.mem.size=0;
..as tweaking with that code would probably open a can of worms..

Change-Id: I54f414af6b81f3180217f70ac7ac03d2c376324b
Change-Id: I5ba9f0f98d694b99b4a62ee9a322177a2a1c914d
Copy link
Contributor

@pvary pvary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 pending tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants