Skip to content

docs(security): explicit security model, role/capability matrix, AGENTS.md linkage#40503

Merged
rusackas merged 17 commits into
apache:masterfrom
sha174n:docs/security-model-and-agents-md
May 30, 2026
Merged

docs(security): explicit security model, role/capability matrix, AGENTS.md linkage#40503
rusackas merged 17 commits into
apache:masterfrom
sha174n:docs/security-model-and-agents-md

Conversation

@sha174n
Copy link
Copy Markdown
Contributor

@sha174n sha174n commented May 28, 2026

SUMMARY

Makes .github/SECURITY.md the canonical, principle-based source of
truth for what Apache Superset considers a security vulnerability, and
wires AGENTS.md to point at it.

The new Security Model section in .github/SECURITY.md defines:

  • Three trust boundaries (Admin role / operator / codebase)
  • A role and capability matrix (Public / Gamma / sql_lab / Alpha /
    Admin / Embedded guest token) covering data read, object write, SQL
    execution, database management, and user/role management
  • A single in/out-of-scope test phrased as one question
  • Illustrative class lists for both in-scope and out-of-scope findings,
    written as principles rather than enumerations of specific files or
    libraries, so the model stays valid as the codebase evolves

The new Security and Threat Model section near the top of
AGENTS.md gives LLM agents and automated tooling a short,
structured entry point: it inlines the trust boundaries and canonical
authorization pattern (@has_access_api + raise_for_access), and
defers to .github/SECURITY.md for the full matrix and scope lists.

BEFORE/AFTER SCREENSHOTS

N/A, documentation-only change.

TESTING INSTRUCTIONS

Markdown renders cleanly on GitHub. No code paths changed.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration
  • Introduces new feature or API
  • Removes existing feature or API

@netlify
Copy link
Copy Markdown

netlify Bot commented May 28, 2026

Deploy Preview for superset-docs-preview ready!

Name Link
🔨 Latest commit bc2fc2f
🔍 Latest deploy log https://app.netlify.com/projects/superset-docs-preview/deploys/6a1a6f08b56f53000842f6e3
😎 Deploy Preview https://deploy-preview-40503--superset-docs-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
🤖 Make changes Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

@pull-request-size pull-request-size Bot added size/L and removed size/M labels May 28, 2026
sha174n added 7 commits May 28, 2026 17:26
…TS.md linkage

Adds a Security Model section to .github/SECURITY.md so the file is now
the canonical, principle-based source of truth for what Apache Superset
considers a vulnerability. The section is written in terms of trust
boundaries (Admin / Operator / Codebase) and a role-and-capability
matrix rather than enumerating specific files or libraries, so the
model stays valid as the codebase evolves and is unambiguous for both
human reporters and automated scanners.

The Vulnerability Scope section is reframed around a single test:

  "Does this finding let a principal perform an action the role and
   capability matrix does not entitle them to?"

In-scope and out-of-scope class lists are illustrative applications of
that test, not exhaustive enumerations.

AGENTS.md gains a top-level "Security and Threat Model" section that
directs LLM agents and automated tooling to .github/SECURITY.md first,
inlines the trust-boundary summary and the canonical authorization
pattern, and cross-references the matrix. The pre-existing
"Architecture Patterns > Security & Features" block is left in place
with a one-line pointer to the new section.
Applies eight of ten review suggestions:

- Trust Boundaries: the Admin claim now lists concrete capabilities
  (database registration, arbitrary SQL through SQL Lab, Jinja
  rendering, configuration override) that make the OS-trust-equivalent
  framing unavoidable, so the CNA Operational Rules 4.1 citation
  supports the reasoning rather than replacing it.
- Trust Boundaries: adds the fail-closed / fail-open asymmetry to the
  operator boundary so the boundary cannot be read as a blanket
  disclaimer for insecure defaults shipped by the codebase.
- Roles and Capabilities: explicit note that the Public role follows
  the same custom-role rule (operator grants shift the boundary for
  that deployment, do not redefine the model).
- In Scope: SSRF bullet now describes a concrete test ("Apache
  Superset itself, not the operator, controls the outbound destination
  set") rather than the circular "codebase is responsible for
  restricting".
- In Scope: new bullet for insecure-by-default behavior, pairing with
  the codebase-fail-closed responsibility above.
- Out of Scope: third-party dependency clause split into two cases
  (transitive / operator-selected: out; direct-pinned-vulnerable or
  misused-by-codebase: in).
- AGENTS.md: authorization-pattern paragraph qualifies "not a finding
  by itself" with "on authorization grounds", and notes that
  injection, SSRF, XSS, or other classes are evaluated separately.
- AGENTS.md: new requirements block for automated tooling, requiring
  each finding to name the matrix row violated and the assumed
  principal; findings that cannot identify both are filed as questions,
  not vulnerabilities.

Two suggestions intentionally not applied:

- "Theoretical attack vectors" wording kept strict on purpose; softening
  to "deprioritized but reopenable on PoC" invites every speculative
  submission and undermines the triage filter.
- Outcome of Reports section reviewed for consistency with the new
  model; no rewording needed.
The previous wording "Theoretical attack vectors without a demonstrable,
reproducible exploit" was the one place in the document that used
"demonstrable" as a load-bearing word without distinguishing
demonstrability from exploitability. That conflated two different
filter outcomes (the reporter did not bother to demonstrate vs. the
finding is not exploitable) into a single permanent closure, which is
the same kind of policy laundering the rest of the model is structured
to avoid.

The replacement is no less strict in practice (the burden still rests
with the reporter; findings without a PoC are still closed), it just
names what the filter is filtering for and removes the permanence
claim by allowing a refile if a proof of concept is later produced.
The Alpha and sql_lab rows did not match either superset/security/manager.py
or docs/admin_docs/security/security.mdx. Fixed:

- Alpha read column: now "all data sources", matching
  ALPHA_ONLY_PERMISSIONS = {"all_database_access", "all_datasource_access"}
  in superset/security/manager.py and the public docs ("Alpha users have
  access to all data sources").
- Alpha SQL column: now "no by default (requires the sql_lab role)".
  SQLLAB_ONLY_PERMISSIONS are gated to the sql_lab role
  (superset/security/manager.py: set_role("sql_lab", ...) at line 1515)
  and the docs explicitly state that Alpha needs sql_lab on a per
  database basis.
- Alpha manage-databases column: footnoted as "data upload to existing
  databases only" to capture ALPHA_ONLY_PMVS = {("can_upload", "Database")}.
- Gamma SQL column: same correction as Alpha; needs sql_lab role.
- Gamma write column: "own and explicitly shared" replaced with the
  doc-accurate "own charts and dashboards on granted datasets" ("shared"
  was not a real Superset concept).
- Removed the standalone "sql_lab" row, which presented the additive role
  as a peer principal. Added an explicit note below the table naming it
  additive and tying its scope to the base role's database grants.
- Embedded guest token: tightened "datasets bound to the embedded dashboard"
  to "data sources bound to the dashboards in the token's resources claim",
  matching the JWT structure in superset/security/guest_token.py.

The model itself is unchanged; only its description in the matrix.
… enumeration

Two changes:

1. Removed the parenthetical "(REST API, web UI, SQL Lab, embedded SDK,
   notification system, server-side workers)" from the codebase trust
   boundary. The phrase "across its product surface" carries the
   principle on its own; enumerating components reintroduces the
   blacklist pattern the rest of the document was structured to avoid,
   and dates the doc as new surfaces are added.

2. Added documented-control enforcement to Trust Boundary 3 and a
   matching bullet in the in-scope list. The previous model covered
   matrix bypass, injection, authentication bypass, SSRF, XSS, and
   insecure defaults, but had no explicit principle for the class of
   findings where an operator has enabled a Superset-documented control
   (denylists, allowlists, sanitizers, validators) and a user evades it
   via a parser quirk, dialect trick, encoding, or comment splice. The
   new bullet makes that class in scope and explicitly applies it to
   all principals including Admin, since the operator's configuration
   of the control is the policy the codebase failed to enforce.
… boundaries

The previous commit overcommitted by putting Apache Superset on the hook
for enforcing any operator-configurable control as if it were a security
boundary. That positioning is wrong for most of the configurable
hardening Apache Superset exposes (SQL function or table denylists, URI
restrictions on already-authorized database connectors, sanitization
passes layered on top of an already-authorized SQL Lab session): these
are belt-and-braces controls operators can layer on top of the role and
capability matrix, not firewall-grade guarantees the codebase commits
to.

Two related changes:

1. Trust Boundary 3: drop the broad documented-control responsibility
   added in the previous commit. The codebase commits to enforcing the
   matrix; configurable hardening is treated separately under
   Vulnerability Scope.

2. In-Scope bullet rewritten to cover only controls Apache Superset
   documents specifically as security boundaries (row-level security,
   access checks tied to the matrix, documented sandboxes, and similar
   features whose documentation positions them as security-relevant).

3. New Out-of-Scope bullet making the carve-out explicit: bypasses of
   configurable defense-in-depth hardening that the documentation does
   not position as a security boundary are hardening improvements, not
   vulnerabilities.

This restores the intended boundary: the matrix is the line, hardening
is opt-in layering on top of it.
… wording, lift headings

Six small fixes from a re-read of the full document. None of these
change the security model; they remove an internal contradiction,
sharpen ambiguous wording, and make the heading hierarchy match the
substantive structure.

1. Remove the legacy third-party-dependency paragraph in "Outcome of
   Reports" that contradicted the case (a)/(b) paragraph added in the
   previous commit. The new paragraph stays; the old one is gone.

2. Tighten "Outcome of Reports" itself. "may be converted into public
   GitHub issues" is now "are typically converted into public GitHub
   issues, where the community can contribute fixes alongside the
   maintainers", and explicitly notes that the triage decision and
   reasoning are communicated back to the reporter in either case.

3. Trust Boundary 3: the standalone "Apache Superset is not, by
   default, a SQL or database firewall" sentence has been folded into
   the Out-of-Scope hardening bullet under Vulnerability Scope, where
   it belongs alongside the explicit carve-out. In its place, Trust
   Boundary 3 now scopes the codebase's commitments to "the role and
   capability matrix and to controls Apache Superset's own
   documentation explicitly positions as security boundaries", which
   makes the test for the in-scope "documented security boundary"
   bullet checkable instead of judgement-call.

4. Promote Trust Boundaries, Roles and Capabilities, and Vulnerability
   Scope from bold subsections to `###` headings. They are the
   substantive body of the Security Model and should carry section
   weight. Trailing sections (Outcome of Reports, Vulnerability
   Aggregation & CVE Attribution) stay as `**bold**` since they are
   lighter-weight footers and unchanged from before this PR.

5. Embedded guest token row: "data sources bound to the dashboards in
   the token's resources claim" tightened to "data sources reachable
   through the embedded dashboards the token authorizes": same
   meaning, less JWT-claim jargon for non-technical reporters.

6. Drop "by default" from the firewall disclaimer. Apache Superset is
   not a SQL or database firewall; the qualifier invited the question
   "what if I configure it to be one?", which it cannot.

MITRE CNA citations (Operational Rules 4.1 in the Admin trust boundary
and Out-of-Scope, and 4.1.10/4.1.11/4.2.13 in Vulnerability
Aggregation) are unchanged and remain in their original locations.
@sha174n sha174n force-pushed the docs/security-model-and-agents-md branch from 9e47c2f to 9f5dd3a Compare May 28, 2026 16:28
@sha174n sha174n marked this pull request as ready for review May 28, 2026 18:02
@dosubot dosubot Bot added the doc Namespace | Anything related to documentation label May 28, 2026
@github-actions github-actions Bot removed the doc Namespace | Anything related to documentation label May 28, 2026
Copy link
Copy Markdown
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments

Comment thread SECURITY.md
Comment thread SECURITY.md
Comment thread AGENTS.md Outdated
sha174n added 2 commits May 29, 2026 00:01
Move .github/SECURITY.md to the repo root for discoverability (matches
Apache Spark's convention) and update AGENTS.md cross-references. Add an
explicit out-of-scope callout that compromise of operator-managed
backend infrastructure (metastore, cache backends, message brokers,
secret stores) is post-compromise and not a Superset vulnerability, and
strengthen the AGENTS.md Trust Boundary section with the same statement.
Comment thread SECURITY.md Outdated
Comment thread AGENTS.md Outdated
sha174n and others added 2 commits May 29, 2026 06:00
- Replace "annihilating" with "resolving" in the reporting intro per
  reviewer suggestion.
- Restate the codebase-trust-boundary clause in AGENTS.md in concrete
  terms (entry points: routes, commands, DAOs, UI handlers, jobs)
  instead of the "product surface" phrasing.
- Fix `../AGENTS.md` link in SECURITY.md; the file now lives at the
  repo root so the relative path is just `AGENTS.md`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@dpgaspar dpgaspar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other topics we could add:

  • Intentionally unauthenticated endpoints

  • allow_browser_login = True will be misread

    30+ API classes set allow_browser_login = True at the class level. This is a FAB setting that means "accept session cookies in addition to API tokens" — it does not weaken auth. But a scanner reading
    the name literally could flag it as "browser login allowed without protection." The doc should clarify this is a transport mechanism, not an auth bypass.

  • Guest token auth has no decorator — it's middleware

    Unlike @protect() or @has_access_api, guest token authentication works through Flask-Login's request_loader hook in security/manager.py. There is no @guest_token_required decorator. A scanner looking
    for decorators on endpoints that serve embedded dashboards will see none and flag them.

    The doc should explain: guest tokens are validated via the X-GuestToken header in the request loader, gated by the EMBEDDED_SUPERSET feature flag. It's not a decorator pattern.

  • is_owner() as an implicit authorization layer

is_owner() checks in the security manager are used inside raise_for_access() as a fallback — "you don't have the dataset permission, but you own this object, so access granted." A scanner won't see a
decorator or explicit call; it's buried in the security manager logic. If the doc doesn't mention ownership as an authorization mechanism, the scanner might flag owner-accessible routes as "missing
explicit permission check."

  • Feature-flagged security boundaries

    Several security behaviors are gated by feature flags but the PR doesn't enumerate them:

    • EMBEDDED_SUPERSET — gates the entire guest token auth system. If off, request_loader skips token validation entirely.
    • DASHBOARD_RBAC — changes how DashboardAccessFilter works (role-based vs. datasource-based)

Comment thread AGENTS.md Outdated
Comment thread AGENTS.md
sha174n and others added 2 commits May 29, 2026 14:36
…vel checks

Address review feedback on AGENTS.md:
- Distinguish @Protect(), @has_access_api, and @has_access by route type
- Limit raise_for_access expectation to data-bearing resources
  (dashboards, charts, datasets, queries, database/table access,
  query contexts); annotations, tags, CSS templates, reports, and
  RLS rules rely on route-level decorators plus DAO base_filters
  by design.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@rusackas rusackas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! We can always iterate :D

@rusackas rusackas merged commit 8e47eb1 into apache:master May 30, 2026
55 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants