KAFKA-14368: Connect offset write REST API #13465

yashmayya · 2023-03-28T11:22:14Z

https://issues.apache.org/jira/browse/KAFKA-14368
KIP-875: First-class offsets support in Kafka Connect
Implements the new PATCH /connectors/{connector}/offsets REST API

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

yashmayya · 2023-03-28T11:46:43Z

Hi @C0urante, this will need to be rebased once #13424 and #13434 are merged but I raised this draft PR to solicit some early feedback if possible - particularly around this statement from the KIP:

Offsets will be reset transactionally for each topic that they exist in: a single transaction will be used to emit all tombstone records for the connector's dedicated offsets topic (if one is used) and another transaction will be used to emit all tombstone records for the worker's global offsets topic.

We don't currently use a transactional producer for writing offsets to the worker's global offset backing store. Wouldn't doing so be a breaking change due to the requirement of additional ACLs for the worker's producer principal? For workers where exactly-once source support is enabled, the only way we could do so would be if the connector's configured offsets topic is the same as the worker's global offset topic (in which case, a connector specific offset store is used with a transactional producer). If the connector's configured offsets topic is a different one or if exactly-once source support is not enabled, I don't think we'll be able to write all the provided offsets in a single transaction for the worker's global offset backing store right?

Also one minor comment on the KIP section - the "tombstone records" bit that was copied from the resetting offsets section needs to be updated.

C0urante · 2023-03-28T17:15:20Z

Hi Yash! Thanks for the draft, I'll try to get to it sometime this week but that may not happen.

With regards to your points:

We don't currently use a transactional producer for writing offsets to the worker's global offset backing store. Wouldn't doing so be a breaking change due to the requirement of additional ACLs for the worker's producer principal?

Since we're introducing a new opt-in API, it's not a breaking change. That said, it's probably worth calling out in the KIP and on the discussion thread. And now that I think about it, maybe a transaction on the global offsets topic isn't really necessary if we've already done a round of zombie fencing. Thoughts? We should notify the discussion thread for the KIP no matter what, just wanted to bounce the idea off you first.

If the connector's configured offsets topic is a different one or if exactly-once source support is not enabled, I don't think we'll be able to write all the provided offsets in a single transaction for the worker's global offset backing store right?

The use of transactions is only necessary if exactly-once source support is enabled for source connectors (both paragraphs that mention the use of transactions begin with "If exactly-once source support is enabled").

Also one minor comment on the KIP section - the "tombstone records" bit that was copied from the resetting offsets section needs to be updated.

🤦 (done)

yashmayya · 2023-03-28T17:44:08Z

Thanks for the swift reply Chris and no rush on the review, I mainly wanted to get the clarifications regarding transactions on the offset topic(s).

Since we're introducing a new opt-in API, it's not a breaking change. That said, it's probably worth calling out in the KIP and on the discussion thread. And now that I think about it, maybe a transaction on the global offsets topic isn't really necessary if we've already done a round of zombie fencing. Thoughts? We should notify the discussion thread for the KIP no matter what, just wanted to bounce the idea off you first.

Ah yeah that's a good point about the opt-in API, we wouldn't require the use of a transactional producer for regular connector task initiated offset writes to the worker's global offset backing store. However, I do agree that it doesn't seem necessary to use transactions on the global offsets topic if exactly-once source support is enabled and the connector is using a custom offsets topic (if it isn't, then we can write to the global offsets topic transactionally using a transactional producer corresponding to the connector). This will be in line with how regular offset writes are handled where not only do we not write offsets transactionally to the secondary store of a connector (the worker's global offset backing store assuming the connector has a custom offset topic configured as the primary store), but we also essentially ignore any errors arising from writes to the secondary store.

The use of transactions is only necessary if exactly-once source support is enabled for source connectors (both paragraphs that mention the use of transactions begin with "If exactly-once source support is enabled").

Ah okay, I misunderstood then. I thought the "If exactly-once source support is enabled" bit was only applicable to the zombie fencing.

connect/runtime/src/main/java/org/apache/kafka/connect/runtime/Worker.java

checkstyle/import-control.xml

connect/runtime/src/main/java/org/apache/kafka/connect/runtime/Worker.java

gharris1727

Change looks very good, I did a pass over the main implementation and only found nits.
Looking forward to getting this in before feature freeze!

connect/runtime/src/main/java/org/apache/kafka/connect/runtime/Worker.java

...ect/runtime/src/test/java/org/apache/kafka/connect/util/clusters/EmbeddedConnectCluster.java

gharris1727 · 2023-04-11T22:58:08Z

So I was manually testing this feature and ran across a serialization problem. Here's the most concise repro case I can think of:

$ curl -sSX PATCH -H "Content-Type: application/json" localhost:8083/connectors/test/offsets -d '{
  "offsets": [
    {
      "partition": {
        "float": 1.0
      },
      "offset": {
        "key": "value"
      }
    }
  ]
}' | jq .
{
  "message": "The Connect framework managed offsets for this connector have been altered successfully. However, if this connector manages offsets externally, they will need to be manually altered in the system that the connector uses."
}
$ curl -sSX GET localhost:8083/connectors/test/offsets | jq .
{
  "offsets": [
    {
      "partition": {
        "float": 1
      },
      "offset": {
        "key": "value"
      }
    }
  ]
}
$ curl -sSX PATCH -H "Content-Type: application/json" localhost:8083/connectors/test/offsets -d '{
  "offsets": [
    {
      "partition": { 
        "float": 1
      },  
      "offset": {
        "key": "value"
      }
    }
  ]
}' | jq .
{
  "message": "The Connect framework managed offsets for this connector have been altered successfully. However, if this connector manages offsets externally, they will need to be manually altered in the system that the connector uses."
}
$ curl -sSX GET localhost:8083/connectors/test/offsets | jq .
{
  "offsets": [
    {
      "partition": {
        "float": 1
      },
      "offset": {
        "key": "value"
      }
    },
    {
      "partition": {
        "float": 1
      },
      "offset": {
        "key": "value"
      }
    }
  ]
}

The GET portion of the API is mapping decimals 1.0 to the integer-looking 1, which is distinct when serialized by the JsonConverter in the offsets topic. When you copy-paste the result of GET for a subsequent PATCH, it actually edits a completely different partition, since the equality check (and kafka keying/compaction) is done by the serialized form.

I think the ConnectorOffset serialization needs to be tweaked to force showing the decimals, to be consistent with the JsonConverter.

Other manual testing seems to indicate this works great. LGTM once the above is addressed.

yashmayya · 2023-04-12T05:12:26Z

I tried really hard to reproduce this via integration tests and I wasn't able to, so I tried doing a repro manually just as you outlined above. Turns out that this isn't actually a bug in either the GET or PATCH offsets API - jq is automatically converting floats like x.0 into x (you can try the exact same repro above without using jq to confirm).

Thanks for the manual testing (I've done some myself too, but the more the merrier of course), it's really appreciated since there's only so many cases that ITs can cover!

gharris1727 · 2023-04-12T05:39:24Z

jq is automatically converting floats like x.0 into x

Thanks for catching that, you're completely correct. I should have cut jq out of my tests to verify that but I hadn't even considered that jq would change that.

It sounds like there's nothing to be done on the framework side then. Hopefully offsets containing floating point numbers is rare enough that not too many people end up finding the same footgun.

C0urante · 2023-04-12T17:54:13Z

connect/runtime/src/main/java/org/apache/kafka/connect/runtime/Worker.java

@@ -114,6 +127,7 @@ public class Worker {

    public static final long CONNECTOR_GRACEFUL_SHUTDOWN_TIMEOUT_MS = TimeUnit.SECONDS.toMillis(5);
    public static final long EXECUTOR_SHUTDOWN_TERMINATION_TIMEOUT_MS = TimeUnit.SECONDS.toMillis(1);
+    public static final long ALTER_OFFSETS_TIMEOUT_MS = TimeUnit.SECONDS.toMillis(5);


Hmm... I think this may be too short. With sink connectors it's fairly straightforward to alter consumer group offsets, but for source connectors we have to start and complete a read-to-end of the offsets topic, then write the new offsets to it. And in both cases, we have the alterOffsets connector method to worry about as well.

Can we make the Worker API for altering offsets asynchronous, similar to what we do for reading offsets?

I know that there's concern about tasks being brought up for the connector while the request is being handled, but I think this might be alright.

If the connector is a sink connector, the requests to alter its consumer group's offsets will be rejected by the broker if any tasks are active.

If the connector is a source connector and exactly-once support is enabled, zombie fencing will take place and we won't be able to complete our write to the offsets topic.

Unless I'm mistaken, the only case that's left is non-exactly-once source connectors, which IMO it's acceptable for us to ignore since we can't guarantee that there aren't zombie tasks running around writing their own offsets anyways.

Hm but for non-exactly-once source connectors (which is the default mode), this would leave the door open to confusing behavior where users could get successful responses from the alter offsets API but the connector could completely ignore the overwritten offsets (if the user resumes the connector in the interim). I agree that the zombie task case is unhandled for non EoS source connectors, but at least that would only typically occur for misbehaving connector plugins whereas making the alter offsets API async would allow users to shoot themselves in the foot. I don't disagree that the 5 second timeout is quite non-ideal and even more concerning is the fact that if a connector's alterOffsets method hangs, it can disable a worker (something that was a big problem with other connector methods until your elegant fix in #8069). I'm just trying to weigh the pros and cons here but it does seem like doing alter offset operations asynchronously in the worker has more benefits than drawbacks.

users could get successful responses from the alter offsets API but the connector could completely ignore the overwritten offsets (if the user resumes the connector in the interim).

That's a fair point. I was thinking that with the STOPPED state and a possible additional guard to prevent cancelled source tasks from committing offsets, we would have reasonable protection against zombie tasks from overwriting recently-altered offsets. However, I wasn't thinking of the other scenario, where the offset alter request is initiated and left ongoing while the connector is resumed.

I think the risk of blocking the herder thread that you've brought up is perhaps the most convincing argument still in favor of making this operation asynchronous. There's risks not just with calls to the alterOffsets method, but also with reading to the end of offsets topics and contacting the transaction coordinator (if altering offsets for an exactly-once source connector), to name a few.

If we really want to get fancy, one way we could try to decrease the risks of an asynchronous API for non-exactly-once source connectors could be to refuse to assign tasks for connectors with ongoing offset alterations during rebalance, even if the connector is resumed. The task configs for that connector could continue to live in the config topic, we'd just hold off on assigning them until the operation succeeds. Of course, this doesn't work if the leader of the cluster changes while an offset alter request is being serviced, but then (correct me if I'm wrong?) the same risks apply even with a synchronous API (although they're probably less likely). We could also try to add interruption logic that cancels any in-progress offset alter/reset requests when a rebalance starts. Either of these would be fine as a follow-up ticket, if they sound reasonable at all.

Thanks for the detailed response and it sounds like we're on the same page now. I've refactored the alter offsets worker API to be asynchronous.

possible additional guard to prevent cancelled source tasks from committing offsets

I guess we don't really need to worry too much about cancelled source tasks since during regular task stop, we also remove the periodic offset commit task in the SourceTaskOffsetCommitter?

one way we could try to decrease the risks of an asynchronous API for non-exactly-once source connectors could be to refuse to assign tasks for connectors with ongoing offset alterations during rebalance, even if the connector is resumed

That's an interesting idea but it does seem to be a pretty invasive change w.r.t the current rebalancing logic which is agnostic to all on-going operations in the workers. The limitation is also a valid one and yeah the same risks apply even with the sync API although I'm not sure I follow why you think it's less likely? Isn't it more likely that a synchronous alter offsets request hangs and causes the leader to fall out of the group leading to a new leader being elected?

We could also try to add interruption logic that cancels any in-progress offset alter/reset requests when a rebalance starts

We would need to be careful about the exact points where we allow interruptions. For instance, we wouldn't want to abandon a request midway through writing offsets (in the non-EoS source connector case where it isn't an atomic operation, or for consumer groups when we're altering offsets for some partitions + resetting offsets for some others). Although, this does seem like a more appealing option overall and I've filed this Jira as a potential follow up item - https://issues.apache.org/jira/browse/KAFKA-14910

connect/runtime/src/main/java/org/apache/kafka/connect/runtime/Worker.java

C0urante

Whew, finally made it all the way through the non-testing changes!

Thanks for your patience Yash, and apologies for the delays. If it helps, we can get together and discuss this further in person at Kafka Summit to help speed things along; let me know.

...ct/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java

connect/runtime/src/main/java/org/apache/kafka/connect/runtime/Worker.java

…fsets

…hods that take an admin factory for testing

…connector with no custom offsets topic

…s concurrently instead of sequentially; improve checks in DistributedHerder::alterConnectorOffsetsChecks; various other renames, rewordings, refactors and simplifications.

connect/runtime/src/main/java/org/apache/kafka/connect/runtime/Worker.java

C0urante · 2023-05-23T16:50:53Z

connect/runtime/src/main/java/org/apache/kafka/connect/runtime/Worker.java

+                offsetStore.configure(config);
+                // This reads to the end of the offsets topic and can be a potentially time-consuming operation
+                offsetStore.start();


Believe this still needs to be addressed?

C0urante

Getting really close!

… getFenceZombieSourceTasksCallable method; refactor composite future construction in Worker::alterSinkConnectorOffsets

C0urante

Thanks Yash, I've finished a full pass over the test code. Everything's looking great, this is really close to being merged!

...ntime/src/test/java/org/apache/kafka/connect/runtime/rest/entities/ConnectorOffsetsTest.java

connect/runtime/src/test/java/org/apache/kafka/connect/util/SinkUtilsTest.java

...untime/src/test/java/org/apache/kafka/connect/runtime/distributed/DistributedHerderTest.java

...ct/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java

connect/runtime/src/test/java/org/apache/kafka/connect/runtime/WorkerTest.java

…eword exception message on attempting to alter offsets for connector not in stopped state; introduce additional null checks for Kafka topic names and partitions in SinkUtils::parseSinkConnectorOffsets; various minor test improvements

C0urante

LGTM, thanks Yash! Looking forward to the offset reset PR 😄

yashmayya · 2023-05-25T13:54:25Z

I just noticed that testAlterSinkConnectorOffsetsDifferentKafkaClusterTargeted is failing in the CI run. There's also https://issues.apache.org/jira/browse/KAFKA-14956 where testGetSinkConnectorOffsetsDifferentKafkaClusterTargeted has been failing on CI for a while. Interestingly, neither of them have failed for me locally in over a 100 runs under various loads. The point at which both of them are failing is also interesting:

We create an embedded Connect cluster with its own embedded backing Kafka cluster
We create a second embedded Kafka cluster
We configure a sink connector in the embedded Connect cluster which consumes from a topic on the second embedded Kafka cluster
We produce 10 messages each to 5 different partitions of a topic on the second Kafka cluster (which the connector is configured to consume from)
We use the offsets read REST API to get the consumer group offsets for the sink connector and wait until it "catches up" to the expected offsets. This operation is retried up to 15 seconds and if the consumer group offsets (obtained via an admin client in the worker) don't match the expected offsets, the test fails.

Both the tests are failing at this point. Since they consistently pass locally, it doesn't seem to be a correctness issue with connectors that target different Kafka clusters. I'm wondering if we need to up the timeout although 15 seconds should be enough to consume just 50 messages 😕

C0urante · 2023-05-25T14:01:53Z

Ah, spoke too soon!

I'd be open to bumping timeouts. If this does turn out to be a correctness issue (which is still possible since the timing on CI may be different and therefore more likely to unearth certain kinds of concurrency bugs), we can investigate further.

Also worth noting that WorkerTest::testAlterOffsetsSourceConnectorError is also failing right now because offsetStore::stop hasn't been invoked by the time we check for it. I think you handle this kind of issue elsewhere by adding a second timeout(1000) argument when making calls to Mockito::verify; hopefully that's sufficient for this test as well?

…ector offset alter tests in WorkerTest; increase offset read timeouts in OffsetsApiIntegrationTest

C0urante

🎉

yashmayya · 2023-05-26T17:02:38Z

Thanks Chris!

C0urante added connect kip Requires or implements a KIP labels Mar 28, 2023

yashmayya force-pushed the KAFKA-14368-offset-write-api branch 3 times, most recently from 84ca2f3 to 3bfe686 Compare April 6, 2023 11:45

yashmayya commented Apr 6, 2023

View reviewed changes

connect/runtime/src/main/java/org/apache/kafka/connect/runtime/Worker.java Show resolved Hide resolved

yashmayya force-pushed the KAFKA-14368-offset-write-api branch 2 times, most recently from 2a41100 to 4dd4686 Compare April 9, 2023 11:20

C0urante reviewed Apr 11, 2023

View reviewed changes

checkstyle/import-control.xml Outdated Show resolved Hide resolved

connect/runtime/src/main/java/org/apache/kafka/connect/runtime/Worker.java Outdated Show resolved Hide resolved

connect/runtime/src/main/java/org/apache/kafka/connect/runtime/Worker.java Show resolved Hide resolved

yashmayya force-pushed the KAFKA-14368-offset-write-api branch from d3f2e09 to d237d6c Compare April 11, 2023 14:33

yashmayya marked this pull request as ready for review April 11, 2023 14:33

yashmayya changed the title ~~KAFKA-14368: WIP: Connect offset write REST API~~ KAFKA-14368: Connect offset write REST API Apr 11, 2023

gharris1727 reviewed Apr 11, 2023

View reviewed changes

gharris1727 approved these changes Apr 12, 2023

View reviewed changes

C0urante reviewed Apr 12, 2023

View reviewed changes

yashmayya commented Apr 15, 2023

View reviewed changes

connect/runtime/src/main/java/org/apache/kafka/connect/runtime/Worker.java Show resolved Hide resolved

C0urante reviewed May 9, 2023

View reviewed changes

yashmayya added 7 commits May 23, 2023 16:44

KAFKA-14368: Connect offset write REST API

dc1adf7

Add missing override annotation to StandaloneHerder::alterConnectorOf…

b788094

…fsets

Remove unnecessary declaration in import-control; refactor worker met…

94f773a

…hods that take an admin factory for testing

Swap parameter order in Worker::alterConnectorOffsets

9d9e1b3

Make alter offsets worker API asynchronous

793b475

Ensure that admin client is closed even on synchronous errors

a9b17a9

Add testAlterOffsetsSourceConnectorError in WorkerTest

e0109c2

Fix producer leak when altering offsets for regular (non-EoS) source …

bc2fc25

…connector with no custom offsets topic

yashmayya force-pushed the KAFKA-14368-offset-write-api branch 2 times, most recently from f99e610 to 8704674 Compare May 23, 2023 12:08

Refactor Worker::alterSinkConnectorOffsets to perform admin operation…

0d19157

…s concurrently instead of sequentially; improve checks in DistributedHerder::alterConnectorOffsetsChecks; various other renames, rewordings, refactors and simplifications.

yashmayya force-pushed the KAFKA-14368-offset-write-api branch from 8704674 to 0d19157 Compare May 23, 2023 12:21

yashmayya mentioned this pull request May 23, 2023

MINOR: Handle the config topic read timeout edge case in DistributedHerder's stopConnector method #13750

Merged

3 tasks

C0urante reviewed May 23, 2023

View reviewed changes

Remove redundant getAlterConnectorOffsetsCallable method and refactor…

16f34fb

… getFenceZombieSourceTasksCallable method; refactor composite future construction in Worker::alterSinkConnectorOffsets

C0urante reviewed May 24, 2023

View reviewed changes

C0urante approved these changes May 25, 2023

View reviewed changes

Add timeouts to verifications for offset store closure in source conn…

08c20a9

…ector offset alter tests in WorkerTest; increase offset read timeouts in OffsetsApiIntegrationTest

yashmayya force-pushed the KAFKA-14368-offset-write-api branch from edf20dc to 08c20a9 Compare May 25, 2023 14:47

yashmayya requested a review from C0urante May 26, 2023 02:11

C0urante approved these changes May 26, 2023

View reviewed changes

C0urante merged commit 7ff2dbb into apache:trunk May 26, 2023
1 check failed

yashmayya mentioned this pull request Jun 6, 2023

KAFKA-14784: Connect offset reset REST API #13818

Merged

3 tasks

C0urante mentioned this pull request Jun 7, 2023

KAFKA-15059: Remove pending rebalance check when fencing zombie source connector tasks #13819

Merged

yashmayya mentioned this pull request Jun 26, 2023

KAFKA-14930: Document the new PATCH and DELETE offsets REST APIs for Connect #13915

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KAFKA-14368: Connect offset write REST API #13465

KAFKA-14368: Connect offset write REST API #13465

yashmayya commented Mar 28, 2023

yashmayya commented Mar 28, 2023

C0urante commented Mar 28, 2023

yashmayya commented Mar 28, 2023

gharris1727 left a comment

gharris1727 commented Apr 11, 2023 •

edited

yashmayya commented Apr 12, 2023 •

edited

gharris1727 commented Apr 12, 2023

C0urante Apr 12, 2023

yashmayya Apr 13, 2023

C0urante Apr 13, 2023 •

edited

yashmayya Apr 15, 2023

C0urante left a comment •

edited

C0urante May 23, 2023

C0urante left a comment

C0urante left a comment

C0urante left a comment

yashmayya commented May 25, 2023

C0urante commented May 25, 2023

C0urante left a comment

yashmayya commented May 26, 2023

KAFKA-14368: Connect offset write REST API #13465

KAFKA-14368: Connect offset write REST API #13465

Conversation

yashmayya commented Mar 28, 2023

Committer Checklist (excluded from commit message)

yashmayya commented Mar 28, 2023

C0urante commented Mar 28, 2023

yashmayya commented Mar 28, 2023

gharris1727 left a comment

Choose a reason for hiding this comment

gharris1727 commented Apr 11, 2023 • edited

yashmayya commented Apr 12, 2023 • edited

gharris1727 commented Apr 12, 2023

C0urante Apr 12, 2023

Choose a reason for hiding this comment

yashmayya Apr 13, 2023

Choose a reason for hiding this comment

C0urante Apr 13, 2023 • edited

Choose a reason for hiding this comment

yashmayya Apr 15, 2023

Choose a reason for hiding this comment

C0urante left a comment • edited

Choose a reason for hiding this comment

C0urante May 23, 2023

Choose a reason for hiding this comment

C0urante left a comment

Choose a reason for hiding this comment

C0urante left a comment

Choose a reason for hiding this comment

C0urante left a comment

Choose a reason for hiding this comment

yashmayya commented May 25, 2023

C0urante commented May 25, 2023

C0urante left a comment

Choose a reason for hiding this comment

yashmayya commented May 26, 2023

gharris1727 commented Apr 11, 2023 •

edited

yashmayya commented Apr 12, 2023 •

edited

C0urante Apr 13, 2023 •

edited

C0urante left a comment •

edited