Skip to content

[HUDI-5162] Allow user specified start offset for streaming query#7138

Merged
YannByron merged 5 commits intoapache:masterfrom
boneanxs:Improve_streaming
Nov 20, 2022
Merged

[HUDI-5162] Allow user specified start offset for streaming query#7138
YannByron merged 5 commits intoapache:masterfrom
boneanxs:Improve_streaming

Conversation

@boneanxs
Copy link
Contributor

@boneanxs boneanxs commented Nov 4, 2022

Change Logs

Add new configure: hoodie.datasource.streaming.startOffset to allow users to specify start offset

Impact

Should not affect users as it keeps the default behavior to fetch earliest offset if users don't set the configure.

Risk level (write none, low medium or high below)

low

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@boneanxs
Copy link
Contributor Author

boneanxs commented Nov 8, 2022

@nsivabalan @YannByron can take a look now, the CI passed

@xushiyan xushiyan added the priority:high Significant impact; potential bugs label Nov 11, 2022
@boneanxs
Copy link
Contributor Author

@hudi-bot run azure

case HoodieEarliestOffsetRangeLimit =>
INIT_OFFSET
case HoodieLatestOffsetRangeLimit =>
getLatestOffset.getOrElse(throw new HoodieException("Cannot fetch latest offset from table, " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use INIT_OFFSET when getLatestOffset is empty ? I mean getLatestOffset.getOrElse(INIT_OFFSET).

package org.apache.hudi

import org.apache.hadoop.fs.Path

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please keep the import code style that separate the different package by a blank line.

@YannByron
Copy link
Contributor

Nice work. Looks good, just leave two comments to solve.

@boneanxs
Copy link
Contributor Author

@hudi-bot run azure

@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@boneanxs
Copy link
Contributor Author

Hey @YannByron All comments are addressed. Please take a look.

@YannByron YannByron merged commit d976671 into apache:master Nov 20, 2022
satishkotha pushed a commit to satishkotha/incubator-hudi that referenced this pull request Dec 12, 2022
fengjian428 pushed a commit to fengjian428/hudi that referenced this pull request Apr 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority:high Significant impact; potential bugs

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

5 participants