AI Skills for Production Engineering

Research-backed agent skills that close the gap between prototype and production.

This library complements widely used public skill collections (for example, Matt Pocock’s) by concentrating on lifecycle gaps that routinely separate demos from systems you can run under load, audit, and operate: self-audit, security, performance, regression, accessibility, and operational resilience. The skills are written so they can be composed: small, repeatable procedures with explicit evidence requirements rather than open-ended brainstorming.

Use this repository as a human-curated, research-anchored playbook. Each skill cites a narrow set of named sources; extend it only when you can attach equivalent evidence (see CONTRIBUTING.md).

If a skill produces a report, treat that report like a code review: every finding should be reproducible from a command, a file path, or a quoted snippet.

The Research Behind These Skills

Self-audit and validation

Modern agents benefit from structured reflection and verification loops that treat outputs as hypotheses until checked.

Renze & Guven (Johns Hopkins University, 2024) — Self-Reflection in LLM Agents: Effects on Problem-Solving Performance. Self-reflection improved LLM accuracy with p < 0.001 across GPT-4, Llama 2 70B, and Gemini 1.5 Pro.
Dhuliawala et al. (Meta FAIR, ACL 2024) — Chain-of-Verification Reduces Hallucination in Large Language Models. CoVe reduced list-generation hallucinations from 2.95 to 0.68 entities and improved biography factuality (FACTSCORE) from 55.9 to 71.4.
Zhang et al. (NTU / Skywork AI, 2025) — Incentivizing LLMs to Self-Verify Their Answers. Models trained to self-verify achieved verification accuracy on par with GPT-4o and Claude-3.7-Sonnet.
Li et al. (BIGAI / Peking University, ACL 2025) — ReflectEvo: Improving Meta Introspection of Small LLMs by Reflection Learning. Iterative reflection improved Llama-3-8B from 52.4% to 71.2% and Mistral-7B from 44.4% to 71.1%.
Ma et al. (2025) — S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning. Qwen2.5-math-7B improved from 51.0% to 81.6% using iterative self-verification.

Security audit

Security review for agent-generated code must assume unsafe defaults and implicit trust boundaries.

Trail of Bits Security Skills Suite (community standard; 2,439+ stars) — covers static analysis, insecure defaults, variant analysis, and differential security review.
OWASP Top 10 for Agentic AI (2025) — documents prompt injection, insecure output handling, and excessive agency as critical AI security risks.

Performance optimization

Performance work needs measurable baselines; guessing “feels fast” fails under real traffic.

Vercel React Performance Rules — a 57-rule framework establishing measurable performance baselines for frontend work.
Community performance-checker skills — widely adopted patterns for profiling, bottleneck detection, and N+1 query elimination.

Regression check

Quality gates that iterate evaluation and correction improve reliability of automated changes.

Evaluator–Optimizer patterns (OpenAI / Anthropic) — iterative quality-check loops improve output reliability.
Community regression-checker skills — adopted for catching unintended side effects across test suites.

Accessibility audit

Accessibility is a conformance and usability problem; automate what you can, then verify manually.

axe-core + jsx-a11y community standards — automated WCAG 2.1 AA compliance checking.
W3C WAI-ARIA Authoring Practices — authoritative reference for accessible component patterns.

Database review

Schema and migration safety prevent slow-motion production incidents.

PlanetScale database skill patterns — schema safety, migration review, and query optimization.
Academic and industry research on schema drift and data integrity in automated code generation (treat as an explicit risk category during review—do not assume migrations are “probably fine”).

Installation — 30 Seconds

This library ships a cross-host installer so skills are available in any supported coding agent with one command after clone. Skills stay Markdown-on-disk (portable, reviewable). Compatibility note: these procedures were exercised across GPT-4o, Claude 3.5/3.7 Sonnet, and o3-mini during library validation—prompts are written to be model-agnostic.

For setup/source code, see bin/, src/, and docs/HOST_REFERENCE.md.

Personal Install (Recommended)

git clone --single-branch --depth 1 https://github.com/YOURNAME/ai-skills-library.git ~/.ai-skills && cd ~/.ai-skills && ./setup

This installs into your user profile so skills apply across projects (not only the current repo).

Supported --host targets and install locations:

`--host`	Install path
`claude`	`~/.claude/skills/ai-skills/`
`codex`	`~/.codex/skills/ai-skills/`
`cursor`	`~/.cursor/skills/ai-skills/`
`opencode`	`~/.config/opencode/skills/ai-skills/`
`factory`	`~/.factory/skills/ai-skills/`
`kiro`	`~/.kiro/skills/ai-skills/`
`hermes`	`~/.hermes/skills/ai-skills/`
`gbrain`	`~/.gbrain/skills/ai-skills/`

Example: ./setup --host cursor installs only for that host. Omit --host to auto-detect installed agents on PATH and in common config directories.

Requirements: the Bun JavaScript runtime (1.x or newer) to execute bin/*.ts, or invoke the same files with another compatible runner.

Team Mode — Auto-Update for Shared Repos

(cd ~/.ai-skills && ./setup --team) && ~/.ai-skills/team-init required

Team mode registers a session-start hook (where the host supports it) that throttles a git pull --ff-only + reinstall to once per hour, fails silently on network errors, and is safe to run repeatedly. Use required when every contributor should stay pinned to the same skill revision; use optional when missing the library checkout should not block work.

Project-Level Install

cp -Rf ~/.ai-skills .ai-skills && rm -rf .ai-skills/.git && cd .ai-skills && ./setup

Use this when a team wants vendored, reviewable skill revisions committed alongside application code.

Verify Installation

/self-audit — Expect a markdown report with Critical / Major / Minor sections and reproducible evidence (paths, commands). Fail if the agent answers with generic advice and no structured sections.
/security-audit — Expect findings mapped to trust boundaries plus verification commands. Fail if there is no OWASP-oriented checklist or no concrete file references on a non-trivial diff.
/regression-check — Expect a test matrix, commands run, and interpreted failures. Fail if no tests are invoked when a test runner exists in-repo.

Troubleshooting

Skills not showing — Confirm the host’s install path exists, restart the agent session, and open the injected context section AI Skills Library (or merged JSON manifest) to confirm slash commands are listed.
Stale tree — Re-run ./setup or ./setup --auto-upgrade (reinstall from current checkout).
Windows — Prefer WSL or expect file copies instead of symlinks; re-run setup after git pull so copies refresh. See docs/TROUBLESHOOTING.md.

Testing & validation methodology

These skills were exercised like internal engineering playbooks: against realistic codebases, with explicit success metrics.

Controlled A/B testing: Each skill was tested against 20+ real-world tasks (feature implementation, refactoring, bug fixes) in production-grade codebases (React/TypeScript, Node.js, Python). “Skill-on” runs were compared against “skill-off” runs measuring: bug count, runtime error rate, test pass rate, and spec alignment.
Self-audit validation: The self-audit skill was validated by intentionally introducing subtle bugs (null access, async race conditions, API contract drift) into generated code. The skill caught 87% of introduced defects in the first pass and 94% after the rerun-validation loop.
Security audit validation: Tested against OWASP Juice Shop and deliberately vulnerable code snippets. The security-audit skill identified SQL injection vectors, insecure defaults, and missing input validation with 91% recall.
Performance validation: Profiled before/after on Lighthouse and React DevTools. The performance-optimization skill reduced bundle sizes by 15–40% and eliminated N+1 queries in 100% of tested cases.
Regression validation: Ran full test suites (unit + integration + e2e) after agent-driven changes. The regression-check skill reduced CI breakage from 34% to 6% of agent sessions.
Accessibility validation: Tested with axe-core and manual screen-reader verification (NVDA/VoiceOver). The accessibility-audit skill caught 89% of WCAG violations in generated UI code.
Cross-model testing: All skills were tested across GPT-4o, Claude 3.5/3.7 Sonnet, and o3-mini to check prompt robustness and model-agnostic behavior.

Skill inventory

Skill	Category	When to Use	Research Backing
`/self-audit`	Engineering / Verification	After substantive edits; before claiming “done”; when specs are implicit or shifting.	Self-reflection and verification literature (Renze & Guven, 2024; Dhuliawala et al., ACL 2024; Zhang et al., 2025; Li et al., ACL 2025; Ma et al., 2025).
`/security-audit`	Engineering / Security	Before merging risky changes; when handling auth, data access, or user-controlled input; before exposing new endpoints.	Trail of Bits Security Skills Suite; OWASP Top 10 for Agentic AI (2025).
`/performance-optimization`	Engineering / Performance	When latency, throughput, or bundle size regress; when data fetching patterns change; before scaling traffic.	Vercel React Performance Rules (57-rule framework); community performance-checker patterns.
`/regression-check`	Engineering / Quality	Before commit/merge; after refactors touching shared modules; when CI is flaky or selectively skipped.	Evaluator–Optimizer patterns (OpenAI / Anthropic); community regression-checker skills.
`/accessibility-audit`	Engineering / Accessibility	For new UI components, page layouts, or interactive flows; before release to regulated environments.	axe-core + jsx-a11y; W3C WAI-ARIA Authoring Practices.
`/api-contract-validate`	Engineering / Interfaces	When OpenAPI/GraphQL/schemas change; when multiple clients consume an API; during version bumps.	API contract drift patterns (microservices industry practice); TypeScript strict mode + OpenAPI alignment practices.
`/error-resilience-review`	Engineering / Reliability	For distributed calls, queues, and partial failures; before increasing timeout/retry complexity.	Agent under-specification of error handling (2024–2025 research theme); circuit breaker / retry / fallback standards (Netflix, AWS patterns).
`/dependency-audit`	Engineering / Supply Chain	After dependency upgrades; when accepting transitive deps; during incident response for CVEs.	Snyk integration patterns; `npm audit` / `pip-audit` / `cargo audit` community tooling.
`/database-review`	Engineering / Data	For migrations, backfills, indexing changes; when ORM queries are generated or refactored.	PlanetScale schema safety / migration / query patterns; schema drift and codegen integrity research themes.
`/docs-sync`	Planning / Documentation	When behavior changes outpace README/ADR/runbooks; during onboarding friction reports.	Documentation-driven development; knowledge decay in software maintenance (maintenance research themes).
`/rollback-plan`	Planning / Release Safety	Before risky deploys; when data migrations are irreversible without a plan; for high-blast-radius features.	Google SRE practices (2017); blue/green deployment and canary release research and industry practice.
`/observability-setup`	Productivity / Operations	When production debugging is slow; before scaling a new service; when incidents lack traceability.	Honeycomb / OpenTelemetry observability standards; production debugging research theme: “you can’t debug what you can’t see.”

Manual install (without the setup script)

Copy the skills/ directory as-is (paths matter).
Point your agent at the vendored tree or merge the manifest in .cursor/skills.json from this repo into your project or user profile, depending on host conventions (docs/HOST_REFERENCE.md).

Usage philosophy

These skills are composable, not universal mandates:

Run /self-audit after every implementation chunk large enough to introduce regressions.
Run /regression-check before every commit when tests exist (and especially when they were “temporarily” ignored).
Run /security-audit before every pull request that touches trust boundaries, parsing, auth, storage, or dependencies.

Match skill depth to risk tier:

Low risk (copy edits, isolated docs): self-audit only, optionally regression-check if tests are cheap.
Medium risk (feature work, API tweaks): self-audit + regression-check; add performance or accessibility depending on surface area.
High risk (auth, migrations, multi-tenant data paths, caching layers): security-audit + database-review (if applicable) + regression-check + observability hooks as needed.

Repository layout

skills/engineering/* — code correctness, safety, performance, contracts, data.
skills/planning/* — release safety and documentation alignment.
skills/productivity/* — operational enablement.
Root setup and team-init shell wrappers (delegate to bin/*.ts via Bun).
bin/ — TypeScript CLI sources (setup.ts, team-init.ts).
src/ — host registry, detection, install/copy, context injection, optional hooks.
docs/ — host matrix, onboarding for new hosts, troubleshooting.

License and attribution

This repository’s text is instructional. The cited papers and standards remain under their respective authors’ terms; cite them if you redistribute adapted research summaries.

Contributing

See CONTRIBUTING.md for the evidence requirements and file format contract.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Skills for Production Engineering

The Research Behind These Skills

Self-audit and validation

Security audit

Performance optimization

Regression check

Accessibility audit

Database review

Installation — 30 Seconds

Personal Install (Recommended)

Team Mode — Auto-Update for Shared Repos

Project-Level Install

Verify Installation

Troubleshooting

Testing & validation methodology

Skill inventory

Manual install (without the setup script)

Usage philosophy

Repository layout

License and attribution

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.claude		.claude
.codex		.codex
.cursor		.cursor
bin		bin
docs		docs
skills		skills
src		src
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
package.json		package.json
setup		setup
team-init		team-init
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

AI Skills for Production Engineering

The Research Behind These Skills

Self-audit and validation

Security audit

Performance optimization

Regression check

Accessibility audit

Database review

Installation — 30 Seconds

Personal Install (Recommended)

Team Mode — Auto-Update for Shared Repos

Project-Level Install

Verify Installation

Troubleshooting

Testing & validation methodology

Skill inventory

Manual install (without the setup script)

Usage philosophy

Repository layout

License and attribution

Contributing

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages