HIVE-24266 #1576

szlta · 2020-10-13T13:00:27Z

in HDFS environment if a writer is using hflush to write ORC ACID files during a transaction commit, the results might be seen as missing when reading the table before this file is completely persisted to disk (thus synced)

This is due to hflush not persisting the new buffers to disk, it rather just ensures that new readers can see the new content. This causes the block information to be incomplete, on which BISplitStrategy relies on. Although the side file (_flush_length) tracks the proper end of the file that is being written, this information is neglected in the favour of block information, and we may end up generating a very short split instead of the larger, available length.
When ETLSplitStrategy is used there is not even a try to rely on ACID side file when calculating file length, so that needs to fixed too.

Moreover we might see the newly committed rows not to appear due to OrcTail caching in ETLSplitStrategy. For now I'm just going to recommend turning that cache off to anyone that wants real time row updates to be read in:

set hive.orc.cache.stripe.details.mem.size=0;
..as tweaking with that code would probably open a can of worms..

Change-Id: I54f414af6b81f3180217f70ac7ac03d2c376324b

ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java

Change-Id: I5ba9f0f98d694b99b4a62ee9a322177a2a1c914d

pvary

+1 pending tests

HIVE-24266 - initial commit

66d61ad

Change-Id: I54f414af6b81f3180217f70ac7ac03d2c376324b

szlta requested a review from pvary October 13, 2020 13:00

kgyrtkirk added the tests pending label Oct 13, 2020

pvary reviewed Oct 13, 2020

View reviewed changes

ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java Outdated Show resolved Hide resolved

pvary reviewed Oct 13, 2020

View reviewed changes

ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java Outdated Show resolved Hide resolved

kgyrtkirk added tests unstable and removed tests pending labels Oct 13, 2020

Addressing review comments

82d734c

Change-Id: I5ba9f0f98d694b99b4a62ee9a322177a2a1c914d

kgyrtkirk added tests pending and removed tests unstable labels Oct 14, 2020

pvary approved these changes Oct 14, 2020

View reviewed changes

kgyrtkirk added tests failed tests pending tests passed and removed tests pending tests failed labels Oct 14, 2020

szlta merged commit 5d1a7fa into apache:master Oct 14, 2020

szlta deleted the HIVE-24266 branch October 14, 2020 19:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HIVE-24266 #1576

HIVE-24266 #1576

szlta commented Oct 13, 2020

pvary left a comment

HIVE-24266 #1576

HIVE-24266 #1576

Conversation

szlta commented Oct 13, 2020

pvary left a comment

Choose a reason for hiding this comment