Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-14275: KRaft Controllers should crash after failing to apply any metadata record #12709

Merged

Conversation

niket-goel
Copy link
Contributor

Make all faults in metadata processing on standby controllers be fatal. This is the same behavior-wise as the active controller.

@cmccabe
Copy link
Contributor

cmccabe commented Oct 3, 2022

Thanks, @niket-goel.

  • I'd prefer to keep the name fatalFaultHandler since it accurately describes what the handler does. It makes it clear that you're terminating the process (when not in test context) which is good to know

  • as per our earlier discussion, we shouldn't need to remove metadataFaultHandler from ControllerServer, etc. since we'll still be using it there (later). Just from QuorumController, at the moment. This should also make the PR smaller

  • as you mentioned offline, we do need a test for this. it shouldn't be too hard to get a bad record committed and then wait for everyone to invoke the FaultHandler. I believe there is a test that does this already, but maybe just for the active. I can't recall if we use the same fault handler for everyone when in testkit -- if so, we can just wait for that this thing to be called 3 times (or whatever)

@niket-goel
Copy link
Contributor Author

Thanks for the review @cmccabe

I'd prefer to keep the name fatalFaultHandler

Agreed! I don't think I changed that. Please let me know if I did.

Just from QuorumController, at the moment

Sounds good. I will add it back in.

if so, we can just wait for that this thing to be called 3 times (or whatever)

Yeah, I guess that is one way of doing it. Will play around to get a test in the next iteration.

…y metadata record

Make all faults in metadata processing on standby controllers be fatal.
This is the same behavior-wise as the active controller.
@niket-goel niket-goel force-pushed the kafka-14275-controller-metadata-fault-handling branch from 8f81299 to 64d6396 Compare October 8, 2022 01:24
@niket-goel
Copy link
Contributor Author

The two tests that are failing in the build pass locally reliably:

kafka.api.TransactionsTest.testCommitTransactionTimeout(String).quorum=zk
org.apache.kafka.controller.QuorumControllerTest.testFenceMultipleBrokers()

Copy link
Contributor

@hachikuji hachikuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a minor comment. LGTM overall.

}
}

private final static List<ApiMessageAndVersion> CORRUPT_RECORD =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: perhaps better to call this INVALID_RECORD with a brief comment mentioning the topicId is not set. Also, since it's not used in any other tests, maybe we can keep it local to testFatalMetadataReplayErrorOnStandbys? It's always possible to factor it out if it does get used in other tests.

Copy link
Contributor

@hachikuji hachikuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@niket-goel
Copy link
Contributor Author

niket-goel commented Oct 11, 2022

The failed tests (5) on the most recent version of the PR seem unrelated and flaky.

@niket-goel
Copy link
Contributor Author

Two different tests failing this time (timeout and metric not updated)

org.apache.kafka.common.network.SslTransportLayerTest
org.apache.kafka.streams.processor.internals.DefaultStateUpdaterTest

@hachikuji hachikuji merged commit 98a3dcb into apache:trunk Oct 11, 2022
niket-goel added a commit to niket-goel/kafka that referenced this pull request Oct 12, 2022
…y metadata record (apache#12709)

Make all faults in metadata processing on standby controllers be fatal. This is the same behavior-wise as the active controller. This prevents a standby controller from eventually becoming active with incomplete state.

Reviewers: Colin Patrick McCabe <cmccabe@apache.org>, Jason Gustafson <jason@confluent.io>
(cherry picked from commit 98a3dcb)
cmccabe pushed a commit that referenced this pull request Oct 13, 2022
…y metadata record (#12709)

Make all faults in metadata processing on standby controllers be fatal. This is the same behavior-wise as the active controller. This prevents a standby controller from eventually becoming active with incomplete state.

Reviewers: Colin Patrick McCabe <cmccabe@apache.org>, Jason Gustafson <jason@confluent.io>
guozhangwang pushed a commit to guozhangwang/kafka that referenced this pull request Jan 25, 2023
…y metadata record (apache#12709)

Make all faults in metadata processing on standby controllers be fatal. This is the same behavior-wise as the active controller. This prevents a standby controller from eventually becoming active with incomplete state.

Reviewers: Colin Patrick McCabe <cmccabe@apache.org>, Jason Gustafson <jason@confluent.io>
rutvijmehta-harness pushed a commit to rutvijmehta-harness/kafka that referenced this pull request Feb 9, 2024
…y metadata record (apache#12709)

Make all faults in metadata processing on standby controllers be fatal. This is the same behavior-wise as the active controller. This prevents a standby controller from eventually becoming active with incomplete state.

Reviewers: Colin Patrick McCabe <cmccabe@apache.org>, Jason Gustafson <jason@confluent.io>
rutvijmehta-harness added a commit to rutvijmehta-harness/kafka that referenced this pull request Feb 9, 2024
…y metadata record (apache#12709) (#48)

Make all faults in metadata processing on standby controllers be fatal. This is the same behavior-wise as the active controller. This prevents a standby controller from eventually becoming active with incomplete state.

Reviewers: Colin Patrick McCabe <cmccabe@apache.org>, Jason Gustafson <jason@confluent.io>

Co-authored-by: Niket <niket-goel@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants