MINOR: Add draft threat model + SECURITY.md + AGENTS.md for security-model discoverability by potiuk · Pull Request #22431 · apache/kafka

potiuk · 2026-05-31T10:18:17Z

This is a draft proposal for the Kafka PMC to review — please correct,
reject, or discuss as needed. Nothing here is a requirement; the
maintainers are the decision-makers, and this describes Kafka as the
PMC says it is.

This PR adds THREAT_MODEL.md + SECURITY.md + AGENTS.md, wiring
AGENTS.md -> SECURITY.md -> THREAT_MODEL.md.

Framing: Kafka is a configurable platform — it provides mechanisms
(SASL/mTLS auth, an ACL authorizer, TLS, quotas) and the operator
chooses which listeners use them; a broker can run wide open
(PLAINTEXT, no authorizer) or fully locked down. The untrusted network
client is the adversary; the operator and trusted cluster peers /
metadata quorum are out of model.

Draft-first, mostly inferred (~16 documented / 0 maintainer / ~58
inferred); every *(inferred)* claim routes to a numbered §14
question. The wave-1 rulings decide VALID-vs-misconfiguration:

Is running the default PLAINTEXT listener with no authorizer a
supported posture (network-trust), so an "unauthenticated broker"
report against defaults is by-design — or should it be VALID?
Under StandardAuthorizer, is the default
allow.everyone.if.no.acl.found "no ACL ⇒ deny"?
Does the Connect REST API require authentication by default, and
how should connector-config URL handling (SSRF) be treated?

Scope note: this covers the broker + Connect; Kafka Streams is treated
as a client library (in-app trust), and tools/shell/trogdor/tests are
out of the runtime model.

Context: the ASF Security team is preparing the project for an automated
agentic security scan we're piloting. Drafted via the
threat-model-producer
rubric. If you'd rather author it yourselves, close this PR and we'll
regroup.

Reviewers: Christo Lolov lolovc@amazon.com, Luke Chen
showuon@gmail.com, Mickael Maison mickael.maison@gmail.com

…l discoverability Adds a draft (v0) threat model plus SECURITY.md and AGENTS.md so an automated scan agent can discover the model via AGENTS.md -> SECURITY.md -> THREAT_MODEL.md. The model is a proposal for the PMC to review; most claims are (inferred) and route to open questions in its section 14. Generated-by: Claude Code (Claude Opus 4.8)

clolov

Thanks for opening this PR. I have tried providing answers to the questions I have answers to and pulled in others who I think are better-suited at answering the ones I don't.

clolov · 2026-06-01T09:49:53Z

+## §14 Open questions for the maintainers
+
+**Wave 1 — the default-posture rulings (decide VALID-vs-misconfig; §5a/§8/§9):**
+1. Is running a broker with the **default PLAINTEXT listener and no authorizer** a *supported* posture (relying


My opinion is that a "PLAINTEXT listener and no authorizer" is valid only for development.

clolov · 2026-06-01T09:52:41Z

+1. Is running a broker with the **default PLAINTEXT listener and no authorizer** a *supported* posture (relying
+   on network controls), so an "unauthenticated broker" report against defaults is `BY-DESIGN` — or should it
+   be `VALID`? *Proposed:* operator must secure before exposing; open default is dev-only.
+2. With the StandardAuthorizer, what is the default of **`allow.everyone.if.no.acl.found`**, and is "no ACL ⇒


The default is DENY (source: https://github.com/apache/kafka/blob/trunk/metadata/src/main/java/org/apache/kafka/metadata/authorizer/StandardAuthorizer.java#L210-L214)

clolov · 2026-06-01T10:15:35Z

+**Wave 2 — auth/authz mechanics (§8):**
+4. Which **SASL mechanisms** are recommended/discouraged by default, and does the broker enforce TLS for
+   credential-exposing mechanisms (PLAIN)? *Proposed:* SCRAM/GSSAPI/OAUTHBEARER recommended; PLAIN requires TLS.
+5. Are **delegation tokens** and idempotent/transactional state gated by ACLs the same as normal operations?


Idempotent producers and transactions are gated by ACLs (source: https://kafka.apache.org/43/security/authorization-and-acls/#operations-and-resources-on-protocols)

Delegation tokens are a mix of both ACLs and additional checks specifically for tokens i.e. something authenticated with a token cannot create another token. For the purposes of this the answer is yes.

clolov · 2026-06-01T10:27:11Z

+   operator-trusted configs is out of model, but an unauthenticated REST API is the real exposure.
+
+**Wave 2 — auth/authz mechanics (§8):**
+4. Which **SASL mechanisms** are recommended/discouraged by default, and does the broker enforce TLS for


SCRAM/GSSAPI/OAUTHBEARER are recommended.

As far as I am aware nothing enforces TLS for PLAIN (sources: https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/common/security/auth/SecurityProtocol.java#L28, https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/common/security/auth/SecurityProtocol.java#L32). Everything over PLAINTEXT should be development-only.

For OAUTHBEARER we support client_credentials and client_assertion (source: https://cwiki.apache.org/confluence/display/KAFKA/KIP-1258%3A+Add+Support+for+OAuth+Client+Assertion+to+client_credentials+Grant+Type)

clolov · 2026-06-01T10:28:42Z

+**Wave 3 — DoS, peers, §11a (§7/§8/§11a):**
+6. What **request-size / quota / throttling** guarantees bound RPC DoS, and where is the resource line?
+   *Proposed:* `socket.request.max.bytes` + quotas bound it; beyond that, operator config.
+7. Confirm **cluster peers / the KRaft quorum / ZooKeeper** are trusted (out of §7). *Proposed:* yes.


Let's trust cluster peers and KRaft quorum for now. Let's remove references to Apache ZooKeeper. Technically versions 3.9.x still have ZooKeeper, but it isn't on trunk.

clolov · 2026-06-01T10:29:20Z

+6. What **request-size / quota / throttling** guarantees bound RPC DoS, and where is the resource line?
+   *Proposed:* `socket.request.max.bytes` + quotas bound it; beyond that, operator config.
+7. Confirm **cluster peers / the KRaft quorum / ZooKeeper** are trusted (out of §7). *Proposed:* yes.
+8. What do scanners most often (re)report that the PMC considers a **non-finding**? (Seeds §11a.)


@mimaison maybe you know some examples? Or @showuon?

clolov · 2026-06-01T10:33:24Z

+8. What do scanners most often (re)report that the PMC considers a **non-finding**? (Seeds §11a.)
+
+**Meta:**
+9. Confirm this model lives as root `THREAT_MODEL.md` referenced from a new `SECURITY.md`, covering the broker


Ideally the THREAT_MODEL.md makes it in as a new page under docs/security, but I am happy to move it as a subsequent step if that formatting will somehow break things. I think it is fair to treat Streams as client library for now (@mjsax maybe you can weigh in here?). Given that we expose REST APIs from Connect I have a feeling we need to go in a bit more depth (@mimaison thoughts?).

clolov · 2026-06-01T10:34:08Z

+   be `VALID`? *Proposed:* operator must secure before exposing; open default is dev-only.
+2. With the StandardAuthorizer, what is the default of **`allow.everyone.if.no.acl.found`**, and is "no ACL ⇒
+   deny" the intended secured behavior? *Proposed:* deny by default under StandardAuthorizer.
+3. Does the **Connect REST API** require authentication by default, and is connector-config URL handling


@mimaison you may be the best-suited person to answer this question? If not maybe you know who might be?

clolov · 2026-06-01T11:31:22Z

+   *Proposed:* yes.
+
+**Wave 3 — DoS, peers, §11a (§7/§8/§11a):**
+6. What **request-size / quota / throttling** guarantees bound RPC DoS, and where is the resource line?


The configurations which protect against a DoS and their default values are:
socket.request.max.bytes at 100 MiB
queued.max.requests at 500
connection.failed.authentication.delay.ms at 100 ms

We further have the following unset by default ones:
queued.max.request.bytes
max.connections{.per.ip}
max.connection.creation.rate

By default quotas are unset. They can be set on a produce/consume level, on a generic request level, or on the number of mutations processable by the controller.

clolov · 2026-06-01T11:38:47Z

+| Metadata control plane | KRaft quorum (`raft`, `metadata`) / ZooKeeper (legacy) | network | **Yes (peer-trust)** |
+| Coordinators | group / transaction / share coordinators | — | **Yes** |
+| Storage + tiered storage | log segments; remote-storage plugins | filesystem; remote store | **Yes** |
+| Kafka Connect | REST control plane + connector plugins | network egress; plugin code | **Yes (addendum C)** |


What is addendum C?

clolov · 2026-06-01T11:42:24Z

+  an untrusted REST caller (if the REST API is unauthenticated) is the real finding.
+- **Findings in `tools`, `shell`, `trogdor`, `tests`, `docker`, samples** — out of scope (§3).
+- **Streams application-level issues** — out of the broker model (§3).
+- **Idempotent-producer / replication internals** not reachable from an unauthorized client — out of surface.


What is the reason that an idempotent producer is grouped with the replication internals? Is the idea here that Kafka has some internal state (i.e. for idempotent producer or for replication) which lives on brokers and is not exposed?

clolov · 2026-06-01T11:50:47Z

+| Kafka Connect | REST control plane + connector plugins | network egress; plugin code | **Yes (addendum C)** |
+| Kafka Streams | client library (runs in the app) | — | Light → §3 |
+| Clients library | parses broker responses | — | **Yes (client-side)** |
+| tools / shell / trogdor / tests / docker | — | — | No → §3 |


I don't know whether you need an exhaustive list or just an list of examples, but let's also exclude committer-tools

And bin. bin contains the .sh files which we use to start a broker, for example, but if we are assuming that this is out of scope due to it being a responsibility of the operator I don't think we need to look into it either

Generated-by: Claude Code

potiuk · 2026-06-04T02:31:44Z

Thanks @clolov — pushed a revision addressing the review:

Removed all ZooKeeper references (gone from trunk; legacy 3.9.x out of scope).
Framed PLAINTEXT listener + no authorizer as development-only; authorizer default = DENY (StandardAuthorizer); idempotent producers / transactions / delegation tokens are ACL-gated (a token can't mint another).
SASL guidance (SCRAM/GSSAPI/OAUTHBEARER recommended; nothing enforces TLS for PLAIN; OAUTHBEARER client_credentials + client_assertion).
DoS defaults documented (socket.request.max.bytes=100MiB, queued.max.requests=500, connection.failed.authentication.delay.ms=100ms; queued.max.request.bytes/max.connections*/quotas unset by default).
Kafka Streams treated as a client library; committer-tools + bin out of scope; defined "addendum C".

Left as open §14 questions where you pinged others: in-cluster-peer threat examples (@mimaison / @showuon), the Streams-as-client-library final call (@mjsax), Connect/REST depth (@mimaison), and whether the out-of-scope list should be exhaustive vs examples. WDYT?

showuon

Thanks for the PR. Had a review, but I don't know how this works. Could we try to run with this, and based on the results and then update and run again?

showuon · 2026-06-04T03:19:24Z

+
+Apache Kafka follows the [Apache Software Foundation security process](https://www.apache.org/security/).
+Please report suspected vulnerabilities **privately** to `security@apache.org` (the Kafka PMC is reachable
+at `private@kafka.apache.org`). Do **not** open public GitHub issues or pull requests for security reports.


Suggested change

at `private@kafka.apache.org`). Do **not** open public GitHub issues or pull requests for security reports.

at `private@kafka.apache.org`). Do **not** open public JIRA issues or pull requests for security reports.

showuon · 2026-06-04T07:09:46Z

+1. **(@mimaison / @showuon)** Are there concrete **in-cluster-peer threats** (a malicious broker/controller
+   holding valid cluster credentials) worth eventually modelling, even though peers + the KRaft quorum are
+   trusted *for now*? *(Solicited by @clolov; no examples yet.)*
+2. **(@mjsax)** Confirm **Kafka Streams** should be treated as a **client library** (current position) rather


No, I don't remember there's any in-cluster-peer threats. I think it's safe we trust it.

mimaison

Thanks for getting the process started. I left a number of comments across the files.

Overall I found the markdown files really hard to read and review. There is a lot of duplication, odd formatting and inconsistent phrasing/grammar. Maybe it does not really matter as it's for LLMs (or maybe it's even by design?).

mimaison · 2026-06-04T13:05:24Z

@@ -0,0 +1,15 @@
+# Security Policy


Do we require the SECURITY.md and THREAD_MODEL.md files? I'd rather have these sections in our documentation directly, also in markdown in the repository. That way it would available on the website and to end users.

mimaison · 2026-06-04T13:07:13Z

+Security model: [SECURITY.md](./SECURITY.md) -> [THREAT_MODEL.md](./THREAT_MODEL.md)
+
+Agents that scan this repository should consult `SECURITY.md` and the linked
+`THREAT_MODEL.md` before reporting issues. Kafka is a configurable platform: it


I'm a bit confused by the sentences after
Agents that scan this repository should consult SECURITY.mdand the linkedTHREAT_MODEL.md before reporting issues..

Is it trying to be a TLDR of the other files?

mimaison · 2026-06-04T13:19:18Z

+development-only posture** *(maintainer — clolov)*. The adversary is an **untrusted network client** of a
+broker (or the Connect REST API); the operator and trusted cluster peers are out of model.


The adversary is an **untrusted network client** of a broker (or the Connect REST API);

I'm not sure I understand that sentence.

I agree that the adversary is a client, as in an attack cannot depend on a rogue administrator. Clients only access Kafka components via the network, via Kafka RPCs to brokers and controllers, and HTTP/REST to Connect workers. An attack cannot rely on local access to the brokers, controllers, and Connect workers hosts.

However, I think, an adversary could be a trusted client that breaches its boundaries. For example a client has READ access but manages to perform a WRITE (privilege escalation). I guess it depends what you meant exactly by untrusted.

mimaison · 2026-06-04T13:24:27Z

+| Kafka Connect | REST control plane + connector plugins | network egress; plugin code | **Yes (addendum C — §4)** |
+| Kafka Streams | client library (runs in the app) | — | Client library → §3 *(maintainer — clolov, pending @mjsax)* |
+| Clients library | parses broker responses | — | **Yes (client-side)** |
+| tools / committer-tools / bin / shell / trogdor / tests / docker | — | — | No → §3 *(maintainer — clolov)* |


tools is pretty much the same as Clients library. These are just clients setup to perform specific tasks

mimaison · 2026-06-04T13:25:28Z

+| Storage + tiered storage | log segments; remote-storage plugins | filesystem; remote store | **Yes** |
+| Kafka Connect | REST control plane + connector plugins | network egress; plugin code | **Yes (addendum C — §4)** |
+| Kafka Streams | client library (runs in the app) | — | Client library → §3 *(maintainer — clolov, pending @mjsax)* |
+| Clients library | parses broker responses | — | **Yes (client-side)** |


What do we mean by parses broker responses? Is that responses from the brokers could be malicious and that clients must handle them gracefully?

mimaison · 2026-06-04T14:05:29Z

+
+- **"Unauthenticated access / no TLS"** against a default/sample config — the PLAINTEXT + no-authorizer
+  default is a **development-only** posture; `OUT-OF-MODEL: non-default-build` *(maintainer — clolov)*.
+- **"Admin/cluster operation succeeds for an authorized principal"** — by design; the admin is trusted (§7).


Do we really need tho state that if you're allowed to perform an action, being able to perform it is not an issue? Or are we trying to say something else and I don't understand?

mimaison · 2026-06-04T14:05:58Z

+- **Idempotent-producer / replication internals** not reachable from an unauthorized client — these are
+  broker-internal state (e.g. the producer-ID / sequence state for idempotence, and replication state) that
+  is not directly exposed to an unauthorized client and is ACL-gated where it is reachable; they are grouped
+  together as broker-internal-state surface rather than client-facing surface *(answering @clolov's grouping
+  question; idempotent/transactional access itself is ACL-gated — §8.2)*.


I have no idea what this is referring to?

mimaison · 2026-06-04T14:42:22Z

+*(maintainer)*; the items below are the ones still genuinely open or pending another reviewer.)*
+
+**Pending other reviewers:**
+1. **(@mimaison / @showuon)** Are there concrete **in-cluster-peer threats** (a malicious broker/controller


I think it's out of scope, brokers and controllers are assumed to be trusted. If a user manages to make a malicious broker/controller join the cluster they must have already breached a number of other security measures.

mimaison · 2026-06-04T14:44:56Z

+   trusted *for now*? *(Solicited by @clolov; no examples yet.)*
+2. **(@mjsax)** Confirm **Kafka Streams** should be treated as a **client library** (current position) rather
+   than modelled in its own right. *(Folded as the working position; final call pending.)*
+3. **(@mimaison)** **Kafka Connect (addendum C)** — because Connect exposes a **REST API**, does it warrant a


As said in https://github.com/apache/kafka/pull/22431/changes#r3356434930, Connect is in a pretty poor state if we don't consider it a client side application.

Regardless we should separate it from the brokers/controllers as it works differently.

mimaison · 2026-06-04T14:56:24Z

+## §11a Known non-findings (recurring false positives)
+
+*(v0 seed — the PMC will own the authoritative list — §14.)*
+


Known issues (not considered security issues) include:

FileConfigProvider and DirectoryConfigProvider don't handle symlinks in their path allowlist validation logic. While we should fix this issue, I don't consider it a security issue as it requires local disk access and the ability to create arbitrary symlinks on Connect workers.

https://issues.apache.org/jira/browse/KAFKA-20450: Over the years we got a lot of reports for this one. We tidied the code recently but it's not in a release yet. Again not a security issue as it requires disk write access on Connect workers for exploitation.

Abuse of sasl.server.max.receive.size: The default value of 512kB could (should?) be reduced.

Abuse of connections.max.idle.ms pre authentication: We same idle period is applied to both authed and unauthed connections. The config defaults to 10mins, it could be shorter pre-auth.

mimaison · 2026-06-04T15:35:51Z

I also want to note we, well mostly @clolov, had started work in another PR last week: #22398

Our draft is less comprehensive and much shorter but I find it way easier to read and well organized. So I'm considering whether it would be easier to update it, by lifting a few sections from here, to match what is needed for the scan. I understand it's an option you gave us:

If you'd rather author it yourselves, close this PR and we'll regroup.

But I'd be curious what other PMC members think?

potiuk · 2026-06-05T00:53:25Z

Thanks @mimaison, @showuon (and @clolov) — genuinely helpful, and the readability point is fair. Since the PMC already has its own draft in #22398 that you find easier to work with, the right move is to defer to that one — it's PMC-authored and PMC-owned, which is exactly where a threat model should live.

Rather than keep two competing PRs open, I'll close this one (#22431) in favor of #22398. Two pieces from here might be worth grafting across, since they're the bits most directly useful when triaging scan output (and easy to under-include in a first pass):

§11a "known non-findings" — the recurring false-positive suppression list; highest-leverage section for keeping scan noise down.
§13 triage dispositions — the closed set of outcomes (VALID / BY-DESIGN / OUT-OF-MODEL / …) so every finding routes somewhere.

I'm happy to open a small follow-up against #22398 porting just those two (trimmed to your format) — or leave it entirely to you. And @showuon, on how it's used: the model is the reference you classify each scan finding against (in-scope / out-of-scope / known-non-finding), and it's meant to be iterated as real results arrive — so "run with it and refine" is exactly right.

Closing this in favor of #22398; just say the word if you'd like the §11a/§13 graft.

github-actions Bot added the docs label May 31, 2026

clolov reviewed Jun 1, 2026

View reviewed changes

potiuk changed the title ~~Add draft threat model + SECURITY.md + AGENTS.md for security-model discoverability~~ MINOR: Add draft threat model + SECURITY.md + AGENTS.md for security-model discoverability Jun 2, 2026

potiuk added 2 commits June 2, 2026 21:17

Merge branch 'trunk' into asf-security/threat-model-2026-05-31

c953d9c

Revise threat model per PMC review

5bbb3e8

Generated-by: Claude Code

showuon reviewed Jun 4, 2026

View reviewed changes

mimaison reviewed Jun 4, 2026

View reviewed changes

potiuk closed this Jun 5, 2026

	at `private@kafka.apache.org`). Do not open public GitHub issues or pull requests for security reports.
	at `private@kafka.apache.org`). Do not open public JIRA issues or pull requests for security reports.

		development-only posture** (maintainer — clolov). The adversary is an untrusted network client of a
		broker (or the Connect REST API); the operator and trusted cluster peers are out of model.

		## §11a Known non-findings (recurring false positives)

		(v0 seed — the PMC will own the authoritative list — §14.)

Conversation

potiuk commented May 31, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clolov left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

potiuk commented Jun 4, 2026

Uh oh!

showuon left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mimaison left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mimaison commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

potiuk commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

potiuk commented May 31, 2026 •

edited by github-actions Bot

Loading

mimaison commented Jun 4, 2026 •

edited

Loading