PMCS Skills Schema — a governance extension for production grade corpus management #1118

jupitermecury-afk · 2026-05-10T12:33:28Z

jupitermecury-afk
May 10, 2026

The Agent Skills standard tells you how to package a skill. It does not tell you whether a skill is safe to add to a production corpus.
Six months of enterprise corpus growth has produced a recognizable failure pattern. Teams move fast, build many skills, and then discover their agent cannot route reliably because twelve different mental models of what a skill is have produced twelve incompatible scopes, overlapping trigger conditions, and no declared conflicts between adjacent skills. The corpus grows. The routing gets worse.
The Anthropic standard addresses the infrastructure layer well. SKILL.md format, progressive disclosure loading, distribution via Git — solid. What it does not specify is the governance layer: whether a skill's trigger is precise enough to route unambiguously, whether it names its failure modes, whether it declares conflicts with adjacent skills, and whether its time sensitive content will silently become wrong.
I have been working on a governance extension that operates on top of the standard rather than replacing it. The PMCS Skills Schema defines six dimensions that distinguish a production grade skill from a prompt with a file extension:

Trigger definition — is the entry condition explicit and testable?
Failure mode coverage — does the skill name what bad output looks like, not just good?
Scope boundary — is the operating domain defined precisely enough to detect overlap?
Handoff protocol — does the skill define what it passes downstream and under what conditions?
Conflict surface — does the skill declare routing conflicts with named adjacent skills?
Decay resistance — is time sensitive content flagged with review triggers?

Each dimension has a named failure mode taxonomy, a detection signal, and a three tier scoring rubric (PMCS Grade / Marginal / Prompt Grade). The schema also includes a Corpus Health Report structure for auditing an entire corpus, a six question Intake Governance Protocol that functions as a pre merge gate, and a PMCS Grade SKILL.md template that extends the Anthropic standard format.
The schema itself is deployable as a Claude skill — the document contains its own SKILL.md activation file.
Full specification (9 sections, markdown):
This is version 1.0. Feedback from practitioners actively building enterprise corpora is the thing that will make version 1.1 better. If you have a corpus that is exhibiting routing failures and want to run it against the schema, I am interested in what the audit surfaces.

UMER-Devtechs · 2026-05-15T19:32:12Z

UMER-Devtechs
May 15, 2026

This is actually the most thoughtful critique I've seen of the Agent Skills standard. You've nailed the exact problem — the standard tells you how to package a skill, but nothing about how to keep a corpus healthy as it grows. And yeah, twelve different mental models of what a skill is will absolutely destroy routing reliability. The six dimensions you laid out make a lot of sense, especially conflict surface and decay resistance — those are the ones that silently kill production corpora because nobody thinks to document them until it's too late. The fact that the schema itself is deployable as a Claude skill is clever meta-humor, but also genuinely useful for adoption. I'd be curious to see what an audit of a real enterprise corpus surfaces — my guess is most skills are "Prompt Grade" on the trigger definition dimension alone, because people write vague triggers like "when the user asks about data" and then wonder why every conversation routes to the same skill. Have you run this against any public corpuses yet? Would be interesting to see the distribution of Grade vs. Marginal vs. Prompt across something like the official skills repository.

0 replies

jupitermecury-afk · 2026-05-15T19:53:49Z

jupitermecury-afk
May 15, 2026
Author

Thank you, you have identified the exact dimension where I expect the highest failure rate too. Trigger definition is where the gap between "feels right when you write it" and "works reliably in production" is widest. "When the user asks about data" is a perfect example of a vague trigger that looks reasonable in isolation and becomes a routing disaster at scale. I have not run a formal audit against a public corpus yet but that is the right next move. The official Anthropic skills repository is the obvious candidate, real skills, public, built by a range of contributors with different mental models. I will run the PMCS schema against a sample and post the results here. My prediction aligns with yours: trigger definition and conflict surface will show the highest prompt grade concentration. Decay resistance will be the surprise, most skills will have no decay management at all because the problem is invisible until something silently becomes wrong. Results incoming.

…

On Fri, May 15, 2026 at 3:32 PM Umer Nazakat ***@***.***> wrote: This is actually the most thoughtful critique I've seen of the Agent Skills standard. You've nailed the exact problem — the standard tells you how to package a skill, but nothing about how to keep a corpus healthy as it grows. And yeah, twelve different mental models of what a skill is will absolutely destroy routing reliability. The six dimensions you laid out make a lot of sense, especially conflict surface and decay resistance — those are the ones that silently kill production corpora because nobody thinks to document them until it's too late. The fact that the schema itself is deployable as a Claude skill is clever meta-humor, but also genuinely useful for adoption. I'd be curious to see what an audit of a real enterprise corpus surfaces — my guess is most skills are "Prompt Grade" on the trigger definition dimension alone, because people write vague triggers like "when the user asks about data" and then wonder why every conversation routes to the same skill. Have you run this against any public corpuses yet? Would be interesting to see the distribution of Grade vs. Marginal vs. Prompt across something like the official skills repository. — Reply to this email directly, view it on GitHub <#1118 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/CA6GWGBOGLOU55EATXKZQYL425WFDAVCNFSM6AAAAACYX7LAVSVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTMOJTGQ3TOMA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you authored the thread.Message ID: ***@***.***>

0 replies

jupitermecury-afk · 2026-05-15T20:05:42Z

jupitermecury-afk
May 15, 2026
Author

Results from the first formal run.

I pulled eight skills from the official repository spanning all four category groups — document skills (docx, pptx), development and technical (webapp-testing, mcp-builder, skill-creator), creative and design (algorithmic-art, frontend-design), and enterprise (brand-guidelines) — and scored each against the six PMCS dimensions. Forty-eight scores total.

The prediction from the prior comment was accurate on two dimensions and wrong on one.

Trigger definition came out stronger than expected. Seven of eight skills are PMCS Grade on this dimension. Anthropic's approach to description authoring — the "pushy" instruction in the skill-creator SKILL.md, the explicit negative cases in document skills — produces testable entry conditions. The corpus is not failing at the front door.

Conflict surface is the structural gap. Seven of eight skills are Prompt Grade. Not a single skill in the sample declares a routing conflict with an adjacent skill by name. frontend-design, algorithmic-art, and canvas-design have overlapping domains. docx, pdf, and pptx are adjacent document skills. mcp-builder and webapp-testing both operate in the same technical development space. None of them name the overlap. A corpus agent routing "build me a generative data visualization" has no schema-level guidance on which of three plausible skills should take the request.

Decay resistance is worse than expected. Six of eight Prompt Grade. This is the invisible failure. The mcp-builder skill references live SDK URLs from the modelcontextprotocol repository with fetch instructions — that is a partial mitigation, not a decay strategy. The rest have no mechanism for detecting when their content has silently become wrong. For document skills targeting stable file formats this is lower risk. For mcp-builder and webapp-testing, which depend on evolving toolchains, it is a live problem.

Failure mode coverage produced the most interesting finding. skill-creator and webapp-testing are the only two PMCS Grade skills on this dimension, and it is not coincidental — they are the two skills that are explicitly meta-cognitive about the contexts in which AI assistance fails. skill-creator names undertriggering as a failure mode and builds architectural instruction to address it. webapp-testing names context window pollution and defines the exact decision branch that avoids it. The other six skills define correct behavior but leave incorrect behavior unspecified.

The overall distribution is 15 PMCS Grade, 16 Marginal, 17 Prompt Grade across the 48 scores. The corpus is not a routing disaster — it is a corpus where the packaging layer is strong and the governance layer is absent. The skills in this repository are well-built by the standard they were built to. The PMCS schema is measuring a different thing.

The two dimensions with zero PMCS Grade representation — conflict surface and decay resistance — are precisely the ones that produce no observable failures in a small corpus and compounding failures in a large one. This is why the governance layer problem is invisible until it isn't.

For v1.1 of the schema I am considering whether conflict surface belongs in the SKILL.md itself or in a corpus-level manifest that sits above the individual skills. There is an argument that a skill should not need to know what other skills exist — that knowledge belongs at the corpus layer. Intake gate question for conflict surface currently requires the author to name adjacent skills at write time, which only works if the author has full corpus visibility. Most contributors do not.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PMCS Skills Schema — a governance extension for production grade corpus management #1118

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

PMCS Skills Schema — a governance extension for production grade corpus management #1118

Uh oh!

jupitermecury-afk May 10, 2026

Replies: 3 comments

Uh oh!

UMER-Devtechs May 15, 2026

Uh oh!

jupitermecury-afk May 15, 2026 Author

Uh oh!

jupitermecury-afk May 15, 2026 Author

jupitermecury-afk
May 10, 2026

UMER-Devtechs
May 15, 2026

jupitermecury-afk
May 15, 2026
Author

jupitermecury-afk
May 15, 2026
Author