Skip to content

[HUDI-6896] HoodieAvroHFileReader.RecordIterator iteration never terminates#9789

Merged
bvaradar merged 1 commit intoapache:masterfrom
lokeshj1703:HUDI-6896
Oct 30, 2023
Merged

[HUDI-6896] HoodieAvroHFileReader.RecordIterator iteration never terminates#9789
bvaradar merged 1 commit intoapache:masterfrom
lokeshj1703:HUDI-6896

Conversation

@lokeshj1703
Copy link
Collaborator

@lokeshj1703 lokeshj1703 commented Sep 26, 2023

Change Logs

org.apache.hudi.io.storage.HoodieAvroHFileReader.RecordIterator#hasNext uses org.apache.hadoop.hbase.io.hfile.HFileScanner#isSeeked to seek to the first line of the file.

        if (!scanner.isSeeked()) {
          hasRecords = scanner.seekTo();
        }

if isSeeked returns false, scanner seeks to start of file.

After end of file is reached, isSeeked would still return false and the next time hasNext is called it seeks to start of file again leading to an infinite loop.

Documentation for HFileScanner#isSeeked
True is scanner has had one of the seek calls invoked; i.e. seekBefore(Cell) or seekTo() or seekTo(Cell). Otherwise returns false.

The PR adds a flag(eof) so that false is returned for hasNext if the flag is true.

Impact

NA

Risk level (write none, low medium or high below)

low

Documentation Update

NA

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

Copy link
Member

@codope codope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Is it possible to add a unit test to test the state change of eof?

@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

// NOTE: This is required for idempotency
if (eof) {
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems a fundamental change, why the HFile reader can work now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under what condition does the infinite iteration happen? How to reproduce it in a test?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is being handled correctly in KeyPrefixIterator but is missing in this RecordIterator. The fix is to bring the change also to this class.

@bvaradar bvaradar added the type:bug Bug reports and fixes label Oct 4, 2023
@bvaradar bvaradar self-assigned this Oct 4, 2023
// NOTE: This is required for idempotency
if (eof) {
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is being handled correctly in KeyPrefixIterator but is missing in this RecordIterator. The fix is to bring the change also to this class.

@bvaradar bvaradar merged commit d85b57e into apache:master Oct 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type:bug Bug reports and fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants