Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-16047: Use REQUEST_TIMEOUT_MS_CONFIG in AdminClient.fenceProducers #16151

Merged
merged 2 commits into from
Jun 4, 2024

Conversation

edoardocomar
Copy link
Contributor

Use REQUEST_TIMEOUT_MS_CONFIG in AdminClient.fenceProducers as transaction timeout.
No transaction will be started with this timeout, but ReplicaManager.appendRecords uses this value as its timeout. Use REQUEST_TIMEOUT_MS_CONFIG like a regular producer append to allow for replication to take place.

Co-Authored-By: Adrian Preston prestona@uk.ibm.com

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

Use REQUEST_TIMEOUT_MS_CONFIG in AdminClient.fenceProducers as transaction
timeout.
No transaction will be started with this timeout, but
ReplicaManager.appendRecords uses this value as its timeout.
Use REQUEST_TIMEOUT_MS_CONFIG like a regular producer append
to allow for replication to take place.

Co-Authored-By: Adrian Preston <prestona@uk.ibm.com>
@@ -4569,7 +4569,7 @@ public ListTransactionsResult listTransactions(ListTransactionsOptions options)
public FenceProducersResult fenceProducers(Collection<String> transactionalIds, FenceProducersOptions options) {
AdminApiFuture.SimpleAdminApiFuture<CoordinatorKey, ProducerIdAndEpoch> future =
FenceProducersHandler.newFuture(transactionalIds);
FenceProducersHandler handler = new FenceProducersHandler(logContext);
FenceProducersHandler handler = new FenceProducersHandler(logContext, requestTimeoutMs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a requestTimeoutMs in options, can you use that if specified?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as an override to the AdminClient REQUEST_TIMEOUT_MS_CONFIG ?
The options doc says it's on override on API timeout
but yes we could use that too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at it, and the option timeoutMs is used already as an API timeout - ie to calc a deadline when retrying.
The requestTimeoutMs is instead used for a single append, so I'd actually prefer not to mix the two and leave the PR as is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@edoardocomar edoardocomar Jun 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @gharris1727 I'd appreciate if you can take a 2nd look or approve as IMHO without this small fix, exactly once MM2/connect is unusable in production, so I'd like to have this in 3.8.0 and 3.7.1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the AdminClient is no longer waiting for the fence request to complete, does it make sense for the underlying append to continue?

If the AdminClient is willing to wait longer for the fence request to complete, should the underlying append be aborted early?

If someone wants to raise the timeout for this one operation, I don't think that we should require them to increase the client-global request.timeout.ms.

@edoardocomar
Copy link
Contributor Author

edoardocomar commented Jun 4, 2024

If someone wants to raise the timeout for this one operation, I don't think that we should require them to increase the client-global request.timeout.ms.

I agree to that. Hopefully the commit addresses your concerns.

Copy link
Contributor

@gharris1727 gharris1727 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @edoardocomar and @prestona for the fix, and @akaltsikis for finding this originally!

@edoardocomar
Copy link
Contributor Author

Jenkins PR Jdk8 connect failure retested manually

./gradlew --no-daemon --no-build-cache cleantest connect:runtime:test --tests "ConnectorRestartApiIntegrationTest.testMultiWorkerRestartBothConnectorAndTasks"

PASSED

@edoardocomar edoardocomar merged commit eea9ebf into apache:trunk Jun 4, 2024
1 check failed
edoardocomar added a commit that referenced this pull request Jun 4, 2024
…ers (#16151)

Use REQUEST_TIMEOUT_MS_CONFIG in AdminClient.fenceProducers, 
or options.timeoutMs if specified, as transaction timeout.

No transaction will be started with this timeout, but
ReplicaManager.appendRecords uses this value as its timeout.
Use REQUEST_TIMEOUT_MS_CONFIG like a regular producer append
to allow for replication to take place.

Co-Authored-By: Adrian Preston <prestona@uk.ibm.com>
edoardocomar added a commit that referenced this pull request Jun 4, 2024
…ers (#16151)

Use REQUEST_TIMEOUT_MS_CONFIG in AdminClient.fenceProducers, 
or options.timeoutMs if specified, as transaction timeout.

No transaction will be started with this timeout, but
ReplicaManager.appendRecords uses this value as its timeout.
Use REQUEST_TIMEOUT_MS_CONFIG like a regular producer append
to allow for replication to take place.

Co-Authored-By: Adrian Preston <prestona@uk.ibm.com>
@edoardocomar
Copy link
Contributor Author

merged in trunk, cherry-picked to 3.7.1 and 3.8.0. thanks @gharris1727

@edoardocomar edoardocomar deleted the KAFKA-16047 branch June 4, 2024 11:02
sjhajharia pushed a commit to sjhajharia/kafka that referenced this pull request Jun 4, 2024
…ers (apache#16151)

Use REQUEST_TIMEOUT_MS_CONFIG in AdminClient.fenceProducers,
or options.timeoutMs if specified, as transaction timeout.

No transaction will be started with this timeout, but
ReplicaManager.appendRecords uses this value as its timeout.
Use REQUEST_TIMEOUT_MS_CONFIG like a regular producer append
to allow for replication to take place.

Co-Authored-By: Adrian Preston <prestona@uk.ibm.com>
@edoardocomar
Copy link
Contributor Author

I think this also fixes https://issues.apache.org/jira/browse/KAFKA-16488

TaiJuWu pushed a commit to TaiJuWu/kafka that referenced this pull request Jun 8, 2024
…ers (apache#16151)

Use REQUEST_TIMEOUT_MS_CONFIG in AdminClient.fenceProducers, 
or options.timeoutMs if specified, as transaction timeout.

No transaction will be started with this timeout, but
ReplicaManager.appendRecords uses this value as its timeout.
Use REQUEST_TIMEOUT_MS_CONFIG like a regular producer append
to allow for replication to take place.

Co-Authored-By: Adrian Preston <prestona@uk.ibm.com>
gongxuanzhang pushed a commit to gongxuanzhang/kafka that referenced this pull request Jun 12, 2024
…ers (apache#16151)

Use REQUEST_TIMEOUT_MS_CONFIG in AdminClient.fenceProducers, 
or options.timeoutMs if specified, as transaction timeout.

No transaction will be started with this timeout, but
ReplicaManager.appendRecords uses this value as its timeout.
Use REQUEST_TIMEOUT_MS_CONFIG like a regular producer append
to allow for replication to take place.

Co-Authored-By: Adrian Preston <prestona@uk.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants