PMCS Skills Schema — a governance extension for production grade corpus management #1118
Replies: 3 comments
-
|
This is actually the most thoughtful critique I've seen of the Agent Skills standard. You've nailed the exact problem — the standard tells you how to package a skill, but nothing about how to keep a corpus healthy as it grows. And yeah, twelve different mental models of what a skill is will absolutely destroy routing reliability. The six dimensions you laid out make a lot of sense, especially conflict surface and decay resistance — those are the ones that silently kill production corpora because nobody thinks to document them until it's too late. The fact that the schema itself is deployable as a Claude skill is clever meta-humor, but also genuinely useful for adoption. I'd be curious to see what an audit of a real enterprise corpus surfaces — my guess is most skills are "Prompt Grade" on the trigger definition dimension alone, because people write vague triggers like "when the user asks about data" and then wonder why every conversation routes to the same skill. Have you run this against any public corpuses yet? Would be interesting to see the distribution of Grade vs. Marginal vs. Prompt across something like the official skills repository. |
Beta Was this translation helpful? Give feedback.
-
|
Thank you, you have identified the exact dimension where I expect the
highest failure rate too. Trigger definition is where the gap between
"feels right when you write it" and "works reliably in production" is
widest. "When the user asks about data" is a perfect example of a vague
trigger that looks reasonable in isolation and becomes a routing disaster
at scale.
I have not run a formal audit against a public corpus yet but that is the
right next move. The official Anthropic skills repository is the obvious
candidate, real skills, public, built by a range of contributors with
different mental models. I will run the PMCS schema against a sample and
post the results here. My prediction aligns with yours: trigger definition
and conflict surface will show the highest prompt grade concentration.
Decay resistance will be the surprise, most skills will have no decay
management at all because the problem is invisible until something silently
becomes wrong.
Results incoming.
…On Fri, May 15, 2026 at 3:32 PM Umer Nazakat ***@***.***> wrote:
This is actually the most thoughtful critique I've seen of the Agent
Skills standard. You've nailed the exact problem — the standard tells you
how to package a skill, but nothing about how to keep a corpus healthy as
it grows. And yeah, twelve different mental models of what a skill is will
absolutely destroy routing reliability. The six dimensions you laid out
make a lot of sense, especially conflict surface and decay resistance —
those are the ones that silently kill production corpora because nobody
thinks to document them until it's too late. The fact that the schema
itself is deployable as a Claude skill is clever meta-humor, but also
genuinely useful for adoption. I'd be curious to see what an audit of a
real enterprise corpus surfaces — my guess is most skills are "Prompt
Grade" on the trigger definition dimension alone, because people write
vague triggers like "when the user asks about data" and then wonder why
every conversation routes to the same skill. Have you run this against any
public corpuses yet? Would be interesting to see the distribution of Grade
vs. Marginal vs. Prompt across something like the official skills
repository.
—
Reply to this email directly, view it on GitHub
<#1118 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/CA6GWGBOGLOU55EATXKZQYL425WFDAVCNFSM6AAAAACYX7LAVSVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTMOJTGQ3TOMA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
|
Results from the first formal run. I pulled eight skills from the official repository spanning all four category groups — document skills (docx, pptx), development and technical (webapp-testing, mcp-builder, skill-creator), creative and design (algorithmic-art, frontend-design), and enterprise (brand-guidelines) — and scored each against the six PMCS dimensions. Forty-eight scores total.
Skill | Trigger | Failure modes | Scope boundary | Handoff | Conflict surface | Decay resistance | Overall
-- | -- | -- | -- | -- | -- | -- | --
docx | ✅ PMCS |
The prediction from the prior comment was accurate on two dimensions and wrong on one. Trigger definition came out stronger than expected. Seven of eight skills are PMCS Grade on this dimension. Anthropic's approach to description authoring — the "pushy" instruction in the skill-creator SKILL.md, the explicit negative cases in document skills — produces testable entry conditions. The corpus is not failing at the front door. Conflict surface is the structural gap. Seven of eight skills are Prompt Grade. Not a single skill in the sample declares a routing conflict with an adjacent skill by name. frontend-design, algorithmic-art, and canvas-design have overlapping domains. docx, pdf, and pptx are adjacent document skills. mcp-builder and webapp-testing both operate in the same technical development space. None of them name the overlap. A corpus agent routing "build me a generative data visualization" has no schema-level guidance on which of three plausible skills should take the request. Decay resistance is worse than expected. Six of eight Prompt Grade. This is the invisible failure. The mcp-builder skill references live SDK URLs from the modelcontextprotocol repository with fetch instructions — that is a partial mitigation, not a decay strategy. The rest have no mechanism for detecting when their content has silently become wrong. For document skills targeting stable file formats this is lower risk. For mcp-builder and webapp-testing, which depend on evolving toolchains, it is a live problem. Failure mode coverage produced the most interesting finding. skill-creator and webapp-testing are the only two PMCS Grade skills on this dimension, and it is not coincidental — they are the two skills that are explicitly meta-cognitive about the contexts in which AI assistance fails. skill-creator names undertriggering as a failure mode and builds architectural instruction to address it. webapp-testing names context window pollution and defines the exact decision branch that avoids it. The other six skills define correct behavior but leave incorrect behavior unspecified. The overall distribution is 15 PMCS Grade, 16 Marginal, 17 Prompt Grade across the 48 scores. The corpus is not a routing disaster — it is a corpus where the packaging layer is strong and the governance layer is absent. The skills in this repository are well-built by the standard they were built to. The PMCS schema is measuring a different thing. The two dimensions with zero PMCS Grade representation — conflict surface and decay resistance — are precisely the ones that produce no observable failures in a small corpus and compounding failures in a large one. This is why the governance layer problem is invisible until it isn't. For v1.1 of the schema I am considering whether conflict surface belongs in the SKILL.md itself or in a corpus-level manifest that sits above the individual skills. There is an argument that a skill should not need to know what other skills exist — that knowledge belongs at the corpus layer. Intake gate question for conflict surface currently requires the author to name adjacent skills at write time, which only works if the author has full corpus visibility. Most contributors do not. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
The Agent Skills standard tells you how to package a skill. It does not tell you whether a skill is safe to add to a production corpus.
Six months of enterprise corpus growth has produced a recognizable failure pattern. Teams move fast, build many skills, and then discover their agent cannot route reliably because twelve different mental models of what a skill is have produced twelve incompatible scopes, overlapping trigger conditions, and no declared conflicts between adjacent skills. The corpus grows. The routing gets worse.
The Anthropic standard addresses the infrastructure layer well. SKILL.md format, progressive disclosure loading, distribution via Git — solid. What it does not specify is the governance layer: whether a skill's trigger is precise enough to route unambiguously, whether it names its failure modes, whether it declares conflicts with adjacent skills, and whether its time sensitive content will silently become wrong.
I have been working on a governance extension that operates on top of the standard rather than replacing it. The PMCS Skills Schema defines six dimensions that distinguish a production grade skill from a prompt with a file extension:
Trigger definition — is the entry condition explicit and testable?
Failure mode coverage — does the skill name what bad output looks like, not just good?
Scope boundary — is the operating domain defined precisely enough to detect overlap?
Handoff protocol — does the skill define what it passes downstream and under what conditions?
Conflict surface — does the skill declare routing conflicts with named adjacent skills?
Decay resistance — is time sensitive content flagged with review triggers?
Each dimension has a named failure mode taxonomy, a detection signal, and a three tier scoring rubric (PMCS Grade / Marginal / Prompt Grade). The schema also includes a Corpus Health Report structure for auditing an entire corpus, a six question Intake Governance Protocol that functions as a pre merge gate, and a PMCS Grade SKILL.md template that extends the Anthropic standard format.
The schema itself is deployable as a Claude skill — the document contains its own SKILL.md activation file.
Full specification (9 sections, markdown):
This is version 1.0. Feedback from practitioners actively building enterprise corpora is the thing that will make version 1.1 better. If you have a corpus that is exhibiting routing failures and want to run it against the schema, I am interested in what the audit surfaces.
Beta Was this translation helpful? Give feedback.
All reactions