Skip to content

[HUDI-8605] Fix Trino failure when reading corrupted block at end of log file#12393

Open
usberkeley wants to merge 1 commit intoapache:masterfrom
usberkeley:HUDI-8605
Open

[HUDI-8605] Fix Trino failure when reading corrupted block at end of log file#12393
usberkeley wants to merge 1 commit intoapache:masterfrom
usberkeley:HUDI-8605

Conversation

@usberkeley
Copy link
Contributor

@usberkeley usberkeley commented Dec 2, 2024

Change Logs

Background

When a corrupted block appears at the end of a Log file, the Trino Reader (LogScanner) fails to read it. This is because Hudi attempts to use InputStream#seek to locate the end of the LogBlock to check for corruption. However, Trino's TrinoInputStream#seek does not necessarily throw an EOFException when seeking beyond the end of the file. In some file systems, such as AzureInputStream#seek and so on, it may throw an IOException, not an EOFException.

Why do we need this PR ?

The Trino issue is not unique, we might encounter custom InputStreams that encapsulate their own exception classes in reader. Therefore, we compare the position obtained from InputStream#getPos with the file length to determine if the end of the file has been reached

Impact

Fix Trino failure when reading corrupted block at end of log file

Risk level (write none, low medium or high below)

low

Documentation Update

none

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@github-actions github-actions bot added the size:L PR with lines of changes in (300, 1000] label Dec 2, 2024
@usberkeley usberkeley marked this pull request as draft December 2, 2024 04:04
@hudi-bot
Copy link
Collaborator

hudi-bot commented Dec 2, 2024

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@usberkeley usberkeley marked this pull request as ready for review December 2, 2024 06:08
@yihua yihua self-assigned this Dec 13, 2024
@yihua
Copy link
Contributor

yihua commented Dec 12, 2025

@voonhous could you review this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L PR with lines of changes in (300, 1000]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants