Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-15091: Fix misleading Javadoc for SourceTask::commit #13948

Merged

Conversation

yashmayya
Copy link
Contributor

From https://issues.apache.org/jira/browse/KAFKA-15091:

The Javadocs for SourceTask::commit state that the method should:

Commit the offsets, up to the offsets that have been returned by poll().

However, this is obviously incorrect given how the Connect runtime (when not configured with exactly-once support for source connectors) performs polling and offset commits on separate threads. There's also some extensive discussion on the semantics of that method in KAFKA-5716 where it's made clear that altering the behavior of the runtime to align with the documented semantics of that method is not a viable option.

We should update the Javadocs for this method to state that it does not have anything to do with the offsets returned from SourceTask:poll and is instead just a general, periodically-invoked hook to let the task know that an offset commit has taken place (but with no guarantees as to which offsets have been committed and which ones correspond to still-in-flight records).

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

* being committed won't necessarily correspond to the latest offsets returned by this source task via
* {@link #poll()}. When exactly-once support is disabled, offsets are committed periodically and asynchronously
* (i.e. on a separate thread from the one which calls {@link #poll()}). When exactly-once support is enabled,
* offsets are committed on transaction commits (also see {@link TransactionBoundary}).
* <p>
* SourceTasks are not required to implement this functionality; Kafka Connect will record offsets
* automatically. This hook is provided for systems that also need to store offsets internally
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on the fence on whether or not to remove this bit as well since I can't think of a realistic use-case where this method would help with offset tracking in the source system in its current form (and hence the question on deprecation of this method in https://issues.apache.org/jira/browse/KAFKA-15091).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "store offsets internally" language isn't great, but I'd rather leave it for now and explore it further if/when we start discussing deprecating this method. Connector developers might theoretically use this for acknowledging JMS records, for example, which in a very loose sense is storing offsets (or at least, some JMS-specific equivalent of them) in that system.

@yashmayya yashmayya requested a review from C0urante July 6, 2023 11:12
Copy link
Contributor

@C0urante C0urante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Yash!

@C0urante C0urante merged commit d6aaddf into apache:trunk Jul 18, 2023
1 check failed
@C0urante C0urante added the docs label Jul 18, 2023
C0urante pushed a commit that referenced this pull request Jul 18, 2023
C0urante pushed a commit that referenced this pull request Jul 18, 2023
C0urante pushed a commit that referenced this pull request Jul 18, 2023
jolshan added a commit to confluentinc/kafka that referenced this pull request Jul 18, 2023
* ak/trunk: (110 commits)
  MINOR: Update docs to include ZK deprecation notice and information (apache#14031)
  KAFKA-15091: Fix misleading Javadoc for SourceTask::commit (apache#13948)
  KAFKA-14669: Use the generated docs for MirrorMaker configs in the doc (apache#13658)
  KAFKA-14953: Add tiered storage related metrics (apache#13944)
  KAFKA-15121: Implement the alterOffsets method in the FileStreamSourceConnector and the FileStreamSinkConnector (apache#13945)
  Revert "MINOR: Update .asf.yaml file with refreshed github_whitelist, and collaborators" (apache#14037)
  MINOR: Update .asf.yaml file with refreshed github_whitelist, and collaborators
  KAFKA-14737: Move kafka.utils.json to server-common (apache#13585)
  KAFKA-14647: Move TopicFilter to server-common/utils (apache#13158)
  MINOR: remove unused variable in examples (apache#14021)
  ...
jeqo pushed a commit to jeqo/kafka that referenced this pull request Jul 21, 2023
srpconfluent pushed a commit to confluentinc/kafka that referenced this pull request Jul 24, 2023
)

Reviewers: Chris Egerton <chrise@aiven.io>
(cherry picked from commit 35d4e9e)
Cerchie pushed a commit to Cerchie/kafka that referenced this pull request Jul 25, 2023
jeqo pushed a commit to aiven/kafka that referenced this pull request Aug 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants