Skip to content

[HUDI-8955] Resolve Kafka beginning offsets with retention to prevent OffsetOutOfRange exception#12762

Merged
yihua merged 2 commits intoapache:masterfrom
kroushan-nit:HUDI-8955
Feb 24, 2025
Merged

[HUDI-8955] Resolve Kafka beginning offsets with retention to prevent OffsetOutOfRange exception#12762
yihua merged 2 commits intoapache:masterfrom
kroushan-nit:HUDI-8955

Conversation

@kroushan-nit
Copy link
Contributor

@kroushan-nit kroushan-nit commented Feb 2, 2025

Change Logs

  • Streams reading from Kafka source can result in OffsetOutOfRangeException when trying to read messages that are out of range for a partition. This could happen because messages may expire while the job is progressing due to retention

  • The idea is to get topic retention config and try to move the starting offset forward by some buffer (configurable), to avoid keep losing data due to successive failures

Kafka has time and size based retention but in this PR we are only taking time based retention into account.

Impact

No API changes

Risk level (write none, low medium or high below)

No risk as such as message skipping based on retention only happens if desired config is enabled by the user

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

This PR introduces a new config: https://github.com/apache/hudi/pull/12762/files#diff-f4175c9ad2bf10bb31a3ddf52cf922ae555c65b387dfe305ed760d27a11bc4d1R170

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@kroushan-nit kroushan-nit marked this pull request as draft February 2, 2025 14:42
@github-actions github-actions bot added the size:L PR with lines of changes in (300, 1000] label Feb 2, 2025
@kroushan-nit kroushan-nit changed the title [HUDI-8955] Resolve Kafka beginning offsets with retention to prevent… [HUDI-8955] Resolve Kafka beginning offsets with retention to prevent OffsetOutOfRange exception Feb 4, 2025
@kroushan-nit kroushan-nit marked this pull request as ready for review February 5, 2025 13:53
@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Copy link
Contributor

@yihua yihua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yihua yihua merged commit 6edf209 into apache:master Feb 24, 2025
43 checks passed
voonhous pushed a commit to voonhous/hudi that referenced this pull request Apr 8, 2025
… OffsetOutOfRange exception (apache#12762)

(cherry picked from commit 6edf209)
voonhous pushed a commit to voonhous/hudi that referenced this pull request Apr 8, 2025
… OffsetOutOfRange exception (apache#12762)

(cherry picked from commit 6edf209)
voonhous pushed a commit to voonhous/hudi that referenced this pull request Apr 9, 2025
… OffsetOutOfRange exception (apache#12762)

(cherry picked from commit 6edf209)
voonhous pushed a commit to voonhous/hudi that referenced this pull request Apr 15, 2025
… OffsetOutOfRange exception (apache#12762)

(cherry picked from commit 6edf209)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-1.0.2 size:L PR with lines of changes in (300, 1000]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants