Skip to content

Claude/audit project improvements wm0w8#11

Merged
criptogus merged 4 commits into
mainfrom
claude/audit-project-improvements-wm0w8
May 16, 2026
Merged

Claude/audit project improvements wm0w8#11
criptogus merged 4 commits into
mainfrom
claude/audit-project-improvements-wm0w8

Conversation

@criptogus
Copy link
Copy Markdown
Owner

What does this PR add?

Type

  • New package (skill / playbook / soul / guardrail)
  • Improvement to an existing package (bumped version)
  • Platform code or docs

Checklist (for content PRs)

  • Filename matches slug and the type's folder
  • At least 2 worked examples (skills) — realistic input + exact expected output
  • Original work, public-domain, or properly attributed; no secrets / PII
  • bun run validate:content passes locally
  • I've read CONTRIBUTING.md and agree to license under Apache-2.0 (code) / CC-BY-SA-4.0 (content)

Notes for reviewers

claude added 4 commits May 16, 2026 14:10
Audit of the upload + publish + MCP-serving paths found a critical chain:

- author.server.insertDraftPackage hardcoded author_handle="@admin" and
  author_verified=true for EVERY uploaded package, regardless of uploader.
- The UI bulkUploadPackages trusted a client-supplied `publish:true`, and
  the "packages author write" RLS policy is FOR ALL USING(author_id=uid)
  with no WITH CHECK and no column-level guard. Since the browser uses the
  anon key, an authenticated user could bypass all server-fn gates with a
  direct PostgREST update setting is_published+review_status='approved'+
  author_verified=true, getting an arbitrary/malicious skill served by the
  MCP discovery tools to every connected agent as admin-verified and
  review-approved, with no adversarial testing.

Fixes:
- insertDraftPackage now always creates a private, unverified, draft
  package (author_verified=false, is_published=false, review_status=draft).
- New migration adds BEFORE INSERT/UPDATE triggers that revert any change
  to trust/visibility columns (author_verified, author_handle,
  review_status, reviewed_*, is_published, install_count, author_id) unless
  the caller is an admin; the author write policy gets a WITH CHECK; any
  published-but-unapproved drift is normalized.
- UI upload no longer offers an instant-publish toggle; publishing routes
  through submit-for-review + admin approval. Dead `publish` plumbing
  removed from the upload pipeline; admin import/meta-ads flows now publish
  via an explicit, authorized post-insert update.
- MCP instructions/tool copy corrected (they advertised publish:true which
  the code never honored).

https://claude.ai/code/session_01CV6zb1KBVoe3eBttyK9U4Z
Make "every published primitive is adversarially vetted" an enforced
invariant rather than a convention an admin could skip.

- review.functions.setReviewStatus now blocks approval (422) unless the
  CURRENT version of the package has an adversarial run with zero
  high/critical failures, severity-weighted score >= 0.9 and pass rate
  >= 0.9.
- Defense-in-depth: a BEFORE UPDATE trigger (require_adversarial_pass)
  enforces the same bar at the database layer, so no path — server fn,
  admin tool, or direct service-role write — can flip a package to
  published/approved without a passing run on the live version. Trigger
  ordering means it composes correctly with the trust-column guard.
- Admin import / meta-ads "publish" now submits drafts into the gated
  review queue (review_status='pending') instead of auto-approving, so
  there is a single enforced publish chokepoint.

https://claude.ai/code/session_01CV6zb1KBVoe3eBttyK9U4Z
Resolved package.json by taking the superset:
- keep validate:content with --experimental-strip-types (the
  prompt-injection guard gate added in PR#8 needs the TS loader);
  PR#7's audit-skills.mjs also uses --experimental-strip-types so
  this is consistent
- keep audit:skills (PR#7's malicious-function / injection gate)
- test:plain keeps tests/trust-attestation.test.mjs
- test:ts is the union: adds tests/audit-skills.test.mjs alongside
  the trust/guard suites

CI workflow auto-merged: it now runs both validate:content and
audit:skills on content/security PRs. The two security gates are
complementary — validate:content enforces schema + injection on
contributed packages; audit:skills adds malicious-function /
exfiltration heuristics. Both reuse the production inspectContent
guard so detection stays in sync.
@criptogus criptogus merged commit 6b00216 into main May 16, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants