diff --git a/AGENTS.md b/AGENTS.md index 427813d..793fa6b 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -17,14 +17,16 @@ When running evals or testing skills, create all workspaces in a temp location: **Why:** Eval artifacts — branches, commits, local git config — leak into the real repo history and are painful to clean up. The skill source lives in a git repo; eval output does not belong here. -## Per-Skill Evals - -Every repo-managed skill must include its own `evals/evals.json` file at `skills//evals/evals.json`. - -- Treat this as a required artifact for every first-party skill in this repo -- Run evals **per skill**, not as one shared repo-level eval file -- Run evals from a temp workspace such as `$env:TEMP/-workspace/`, never from inside this repository -- When creating or modifying a repo-managed skill, run both `with_skill` and `without_skill` comparison executions from that temp workspace before the work is considered complete +## Per-Skill Evals + +Every repo-managed skill must include its own `evals/evals.json` file at `skills//evals/evals.json`. + +- Treat this as a required artifact for every first-party skill in this repo +- Eval entries may include an optional `files` array of skill-relative fixture paths such as `evals/files/example.md` +- When `files` is present, keep the paths relative to `skills//` and stage those fixtures into the temp eval workspace for both `with_skill` and `without_skill` runs +- Run evals **per skill**, not as one shared repo-level eval file +- Run evals from a temp workspace such as `$env:TEMP/-workspace/`, never from inside this repository +- When creating or modifying a repo-managed skill, run both `with_skill` and `without_skill` comparison executions from that temp workspace before the work is considered complete - For a brand-new skill, the baseline is `without_skill`; for an existing skill, use either `without_skill` or the previous/original skill version as the baseline, matching the `skill-creator` benchmark flow - Generate the human-review artifacts too: aggregate the comparison into `benchmark.json` and launch `eval-viewer/generate_review.py` from the installed Anthropic `skill-creator` copy (typically under `~/.agents/skills/skill-creator/` or `~/.claude/skills/skill-creator/`) so the user can inspect `Outputs` and `Benchmark` before sign-off - Deterministic scaffold/template skills must keep local deterministic validators as well; evals supplement validators, they do not replace them @@ -80,20 +82,22 @@ After changing any repo-managed skill, sync the touched files across the repo co Every skill follows this layout: ``` -skills// -├── SKILL.md # Required — the skill definition (loaded by Claude) -├── FORMS.md # Optional — structured form fields for parameter collection -├── assets/ # Optional — file templates, fonts, icons used in output -│ └── / # Group by variant when a skill supports multiple (e.g. library/, app/) -├── scripts/ # Optional — executable code (Python, Bash, etc.) -├── references/ # Optional — detailed reference docs the agent consults during generation -└── evals/ # Required for repo-managed skills — per-skill eval prompts and expectations -``` +skills// +├── SKILL.md # Required — the skill definition (loaded by Claude) +├── FORMS.md # Optional — structured form fields for parameter collection +├── assets/ # Optional — file templates, fonts, icons used in output +│ └── / # Group by variant when a skill supports multiple (e.g. library/, app/) +├── scripts/ # Optional — executable code (Python, Bash, etc.) +├── references/ # Optional — detailed reference docs the agent consults during generation +└── evals/ # Required for repo-managed skills — per-skill eval prompts and expectations + └── files/ # Optional — input fixtures referenced by evals/evals.json files[] +``` - `SKILL.md` is the entry point — it contains the workflow, conventions, and step-by-step instructions -- `assets/` holds file templates, fonts, icons, and other static content used in output (the agent reads and substitutes placeholders) -- `references/` holds detailed specs that `SKILL.md` references but are too long to inline -- `evals/` holds the per-skill `evals.json` definitions used to verify that the skill still works after changes +- `assets/` holds file templates, fonts, icons, and other static content used in output (the agent reads and substitutes placeholders) +- `references/` holds detailed specs that `SKILL.md` references but are too long to inline +- `evals/` holds the per-skill `evals.json` definitions used to verify that the skill still works after changes +- `evals/files/` holds optional skill-local fixture inputs referenced by `evals/evals.json` when a benchmark needs attached source material ## Template Files Are Literal diff --git a/CHANGELOG.md b/CHANGELOG.md index 0649db3..db1061f 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,6 +6,26 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), ## [Unreleased] +## [0.3.2] - 2026-03-23 + +This is a minor release introducing the markdown-illustrator skill for visualization-first document analysis, with expanded repository branding, comprehensive skill documentation, and foundational eval fixture file infrastructure across the skill suite. + +### Added + +- `markdown-illustrator` skill that reads markdown files and generates a document-wide Visual Brief plus one compiled diffusion-ready prompt, with zero follow-up questions and inferred visual strategy defaults (hero-focused cinematic editorial by default, steerable toward whiteboard, blackboard, isometric, or blueprint styles), +- Hero image assets for repository branding at `/assets/hero.jpg` and for individual skills (`trunk-first-repo/assets/hero.jpg`), +- Optional `files` array support in eval infrastructure (`evals/evals.json`) to stage skill-relative fixture paths into temporary eval workspaces for both `with_skill` and `without_skill` runs, +- Eval fixtures for `markdown-illustrator` with real-world examples (microservices architecture, product launch, transformers explanation), +- Benchmark contract reference documentation in `skill-creator-agnostic` with fixture guidance patterns. + +### Changed + +- Enhanced README with markdown-illustrator installation snippet and comprehensive "Why markdown-illustrator?" section explaining visual-brief anchoring, inferred defaults, good trigger examples, and reference visual directions for users, +- Extended AGENTS.md with detailed eval fixture file documentation, explaining the optional `files` property and fixture staging workflow for skill evaluation, +- Updated CONTRIBUTING.md with eval fixture guidance and temp-workspace isolation setup instructions, +- Improved validation script (`validate-skill-templates.ps1`) to enforce fixture file path checks and consistency across skills, +- Applied fixture guidance pattern to `skill-creator-agnostic` with benchmark contract examples and reference documentation. + ## [0.3.1] - 2026-03-19 This is a patch release introducing three new NuGet-focused skills and runner-agnostic benchmark tooling, with enhanced release automation, comprehensive documentation standardization, and skill refinements. @@ -82,7 +102,8 @@ This is a minor release that introduces two complementary git workflow skills, e - Improved scaffold fidelity with hidden `.bot` asset preservation, explicit UTF-8 and BOM handling, and checks aimed at preventing mojibake or incomplete generated output. -[Unreleased]: https://github.com/codebeltnet/agentic/compare/v0.3.1...HEAD +[Unreleased]: https://github.com/codebeltnet/agentic/compare/v0.3.2...HEAD +[0.3.2]: https://github.com/codebeltnet/agentic/compare/v0.3.1...v0.3.2 [0.3.1]: https://github.com/codebeltnet/agentic/compare/v0.3.0...v0.3.1 [0.3.0]: https://github.com/codebeltnet/agentic/compare/v0.2.0...v0.3.0 [0.2.0]: https://github.com/codebeltnet/agentic/compare/v0.1.0...v0.2.0 diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index a1a93cc..ff811a9 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -68,18 +68,21 @@ The `description` is the most important field — it's how the AI decides to loa Evals let you verify the skill works and measure improvement over a baseline. Every repo-managed skill in this repository must include `evals/evals.json`: -```json -{ - "skill_name": "your-skill-name", - "evals": [ - { - "id": 0, - "prompt": "The user message to test against", - "expected_output": "What a correct response looks like — used for manual or automated grading" - } - ] -} -``` +```json +{ + "skill_name": "your-skill-name", + "evals": [ + { + "id": 0, + "prompt": "The user message to test against", + "expected_output": "What a correct response looks like — used for manual or automated grading", + "files": ["evals/files/example.md"] + } + ] +} +``` + +`files` is optional. When present, list one or more fixture files relative to `skills//`. A common pattern is to store those fixtures under `evals/files/` so benchmark runners can copy or attach the same source inputs for both `with_skill` and `without_skill` runs. Aim for 3–5 evals that cover distinct scenarios: happy path, edge cases, and cases where the skill should *not* do something. @@ -130,9 +133,10 @@ powershell -NoProfile -ExecutionPolicy Bypass -File .\scripts\validate-skill-tem - [ ] `SKILL.md` has valid front matter with `name` and `description` - [ ] Skill is stack-agnostic (or clearly scoped to a specific tech in the name/description) - [ ] Examples are generic — no personal emails, usernames, or project-specific identifiers -- [ ] At least one eval in `evals/evals.json` -- [ ] The skill's `evals/evals.json` exists and its `skill_name` matches the folder/frontmatter name -- [ ] Skill changes were benchmarked from a temp workspace with both `with_skill` and `without_skill` runs +- [ ] At least one eval in `evals/evals.json` +- [ ] The skill's `evals/evals.json` exists and its `skill_name` matches the folder/frontmatter name +- [ ] Any optional `files` entries in `evals/evals.json` point to real fixture files under the same skill folder +- [ ] Skill changes were benchmarked from a temp workspace with both `with_skill` and `without_skill` runs - [ ] `benchmark.json` and `eval-viewer/generate_review.py` from the installed Anthropic `skill-creator` copy were used so a human could compare `Outputs` and `Benchmark` - [ ] `scripts/validate-skill-templates.ps1` passes for the current working tree when changing scaffold or template behavior - [ ] If CI is enabled for the branch, the GitHub Actions validation job passes too diff --git a/README.md b/README.md index 2cd86d4..9475d6c 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,7 @@ # Agentic Skills +![Skills Applied](assets/hero.jpg) + A curated collection of [skills](https://skills.sh) — reusable instruction sets that teach AI agents how to follow specific workflows, conventions, and standards. Designed to work with any agent that supports the skills ecosystem: GitHub Copilot, Claude Code, Cursor, Codex, OpenCode, and [many more](https://skills.sh). ## What are skills? @@ -14,7 +16,7 @@ Another part of that workflow is now mandatory too: when a repo-managed skill is One more consistency rule matters for form-driven skills: native input fields are treated as a host feature, not something a model can rely on. Skills in this repo must stay usable with or without UI widgets, and must fall back to the same deterministic one-field-at-a-time flow when the host only supports plain chat. -Validation follows the same philosophy: run `scripts/validate-skill-templates.ps1` locally for the fast feedback loop, and let GitHub Actions rerun that same script on pull requests as the safety net. That validator also checks skill frontmatter metadata such as per-skill `evals/evals.json` files and the 1024-character YAML description limit; it does not replace the paired benchmark review workflow. +Validation follows the same philosophy: run `scripts/validate-skill-templates.ps1` locally for the fast feedback loop, and let GitHub Actions rerun that same script on pull requests as the safety net. That validator also checks skill frontmatter metadata such as per-skill `evals/evals.json` files, optional eval fixture paths declared through `files`, and the 1024-character YAML description limit; it does not replace the paired benchmark review workflow. ## Install a skill @@ -49,6 +51,7 @@ npx skills add https://github.com/codebeltnet/agentic --skill git-nuget-release- npx skills add https://github.com/codebeltnet/agentic --skill git-nuget-readme npx skills add https://github.com/codebeltnet/agentic --skill git-visual-squash-summary npx skills add https://github.com/codebeltnet/agentic --skill skill-creator-agnostic +npx skills add https://github.com/codebeltnet/agentic --skill markdown-illustrator npx skills add https://github.com/codebeltnet/agentic --skill trunk-first-repo npx skills add https://github.com/codebeltnet/agentic --skill dotnet-strong-name-signing # npx skills add https://github.com/codebeltnet/agentic --skill another-skill @@ -75,6 +78,7 @@ npx skills add https://github.com/codebeltnet/agentic --skill dotnet-strong-name | [git-nuget-readme](skills/git-nuget-readme/SKILL.md) | Git-aware NuGet README companion for .NET repos that advertise a package from `src/`. Resolves the real packable project the README should sell, combines git history with actual package metadata, source capabilities, and relevant tests when feasible, preserves honest badge/docs/contributing sections, and writes a forthcoming, adoption-friendly `README.md` with repo-derived branding, clear value, install, framework-support, and quick-start guidance. | | [git-visual-squash-summary](skills/git-visual-squash-summary/SKILL.md) | Non-mutating grouped-summary companion to `git-visual-commits`. Turns noisy commit stacks into a curated set of compact summary lines for PR or squash contexts, preserving technical identifiers, merging overlap, dropping low-signal noise, highlighting distinct meaningful efforts, and avoiding changelog-style wording or unsupported claims. | | [skill-creator-agnostic](skills/skill-creator-agnostic/SKILL.md) | Runner-agnostic overlay for Anthropic `skill-creator`. Adds repo and environment guardrails for skill authoring and benchmarking: temp-workspace isolation, `iteration-N/eval-name/{config}/run-N/` benchmark layout, valid `grading.json` summaries, generated `benchmark.json`, honest `MEASURED` vs `SIMULATED` labeling, and sync/README discipline for repo-managed skills. | +| [markdown-illustrator](skills/markdown-illustrator/SKILL.md) | Reads a markdown file and answers directly in chat with one document-wide Visual Brief plus one compiled prompt. Infers a compact visual strategy by default, keeps follow-up questions near zero, and only branches when the user explicitly asks for added specificity. | | [dotnet-new-lib-slnx](skills/dotnet-new-lib-slnx/SKILL.md) | Scaffold a new .NET NuGet library solution following codebeltnet engineering conventions. Dynamic defaults for TFM/repository metadata, latest-stable NuGet package resolution, tuning projects plus a tooling-based benchmark runner, TFM-aware test environments, strong-name signing, NuGet packaging, DocFX documentation, CI/CD pipeline, and code quality tooling. | | [dotnet-new-app-slnx](skills/dotnet-new-app-slnx/SKILL.md) | Scaffold a new .NET standalone application solution following codebeltnet engineering conventions. Supports Console, Web, and Worker host families with Startup or Minimal hosting patterns; Web expands into Empty Web, Web API, MVC, or Web App / Razor, plus functional tests and a simplified CI pipeline. | | [trunk-first-repo](skills/trunk-first-repo/SKILL.md) | Initialize a git repository following [scaled trunk-based development](https://trunkbaseddevelopment.com/#scaled-trunk-based-development). Seeds an empty `main` branch and creates a versioned feature branch (`v0.1.0/init`), enforcing a PR-first workflow where content only reaches main through peer-reviewed pull requests. | @@ -120,6 +124,12 @@ npx skills add https://github.com/codebeltnet/agentic --skill git-visual-squash- npx skills add https://github.com/codebeltnet/agentic --skill skill-creator-agnostic ``` +`markdown-illustrator` + +```bash +npx skills add https://github.com/codebeltnet/agentic --skill markdown-illustrator +``` + `dotnet-new-lib-slnx` ```bash @@ -239,6 +249,90 @@ Anthropic's `skill-creator` is an excellent base workflow, but the day-to-day fr - **PowerShell-safe** — calls out UTF-8 no BOM, stable counting, provider-path normalization, and other Windows-specific pitfalls - **Repo-managed discipline** — keeps per-skill evals, local-install sync, and README updates in scope for first-party skills +### Why markdown-illustrator? + +Markdown-heavy documents often need one image that sells the whole idea fast: a conference opener, article cover, pitch-slide hero, or visual hook that makes the audience want to keep reading. The problem with many prompt workflows is that they branch immediately into model menus, theme toggles, and style comparisons before the document has even been understood. + +**markdown-illustrator** keeps the job focused. It reads the markdown, distills the whole document into a visualization-first Visual Brief, silently infers a compact visual strategy from the request, and turns that shared brief into one compiled prompt returned directly in chat. If you explicitly ask for a named model or a narrower aesthetic, it honors that request without dragging you through a selection workflow. + +- **Visual-Brief first** — distills the document into subject, narrative, visual opportunity, mood, and must-show elements before prompting +- **One shared Visual Brief, one committed result** — optimized for covers, keynote slides, and "capture the essence" illustration requests where decisiveness matters more than variants +- **Prompt-compiler behavior** — translates abstract meaning into concrete visual structure, readable composition, physical medium cues, and explicit failure-mode control +- **Infer, don't interrogate** — defaults to a strong non-interactive strategy instead of turning intent, treatment, abstraction, and label density into follow-up questions +- **Hero-first defaults** — when the request is underspecified, the skill defaults toward `hero + cinematic editorial + concept-led + minimal labels + 16:9 (or 3:2 when it composes better)` rather than a dry explainer graphic +- **Cross-diffuser by design** — prefers strong natural-language prompting over vendor-specific branching unless the user asks +- **Text-safe prompting** — steers away from dense embedded copy, fake words, and fragile readable text unless very short labels are truly necessary +- **Anti-repetition by default** — avoids repeated labels, bullets, steps, callouts, mirrored panels, and echoed document fragments so the image reads like one authoritative artifact rather than many near-duplicates +- **No selection detours** — skips file creation, model-family, style, theme, and scope menus so the workflow stays fast and focused +- **User steerable when needed** — the skill stays minimal, but users can still explicitly steer toward directions like `whiteboard`, `blackboard`, `isometric`, or `blueprint` + +#### Inferred Defaults For markdown-illustrator + +The skill should not ask the user to configure these unless the request is genuinely ambiguous in a way that affects correctness. It infers a compact strategy and proceeds. + +- **Intent** — infer `hero`, `digest`, `diagram`, or `cover` from the user's phrasing; if there is no stronger signal, default to `hero` +- **Visual treatment** — preserve explicit styles such as `whiteboard`, `blackboard`, `scientific`, `hand-drawn`, `isometric`, or `minimal`; otherwise default to `cinematic editorial` +- **Abstraction level** — use `concept-led` for spectacle and interest-raising requests, `balanced` for explanatory or onboarding requests, and `literal` only when the user explicitly asks for strict fidelity +- **Label density** — default to `minimal`, move toward `none` for hero or infographic-first requests, and use `academic` only for scientific or textbook-style requests +- **Aspect ratio** — honor explicit ratios, otherwise default to a wide frame: prefer `16:9`, use `3:2` when the composition is more editorial or object-centered, and avoid square by default + +#### Good Trigger Examples For markdown-illustrator + +These phrasings reliably signal the skill's intent: a markdown file goes in, and one document-wide visual direction comes back. + +- `Use markdown-illustrator on SKILL.md and return the Visual Brief plus one final prompt.` +- `Read roadmap.md and create one strong visual direction that captures the whole document.` +- `Create a visual digest for onboarding-notes.md.` +- `Turn launch-plan.md into a keynote opener image prompt.` +- `Use markdown-illustrator on systems.md and keep it blackboard style.` +- `Turn product-brief.md into a single Flux-ready hero-image prompt.` + +#### Common Visual Directions For markdown-illustrator + +These are reference directions for users, not built-in branches in the skill. If you want one of them, ask for it explicitly in the prompt. + +`whiteboard` + +- Pros: approachable, collaborative, strong for brainstorming, product planning, workshops, and messy human energy +- Cons: can feel too casual or cluttered for polished keynote or editorial uses +- Guidance: ask for this when the document is about ideation, strategy sessions, or product thinking + +`blackboard` + +- Pros: dramatic, intellectual, layered, great for systems thinking and technical storytelling +- Cons: can become visually noisy if the source material is already dense +- Guidance: ask for this when the document is about architecture, strategy, layered concepts, or technical explanation + +`isometric` + +- Pros: excellent for platforms, ecosystems, infrastructure, and layered technical worlds +- Cons: weaker for abstract or emotional narratives that need symbolism more than structure +- Guidance: ask for this when the document describes systems, services, stacks, networks, or architectural relationships + +`blueprint` + +- Pros: precise, engineered, authoritative, strong for protocols, design intent, and technical rigor +- Cons: can feel cold or overly schematic for marketing or human-centered subjects +- Guidance: ask for this when the document should feel exact, technical, and intentionally designed + +`editorial illustration` + +- Pros: expressive, conceptual, and strong for article covers, essays, and symbolic storytelling +- Cons: less literal, so it may underperform when the image must explain concrete architecture +- Guidance: ask for this when the document needs metaphor, mood, or a polished publication-style visual + +`cinematic` + +- Pros: emotional, aspirational, high-impact, strong for keynote heroes and launch moments +- Cons: can become too grand if the source material really needs clarity over spectacle +- Guidance: ask for this when the image should feel premium, dramatic, and audience-grabbing + +`minimal poster` + +- Pros: high signal-to-noise, memorable, clean, and strong for one dominant idea +- Cons: can oversimplify documents with important operational or technical nuance +- Guidance: ask for this when the document has one central idea that can be reduced to a powerful symbol + ### Why dotnet-new-lib-slnx and dotnet-new-app-slnx? Starting a new .NET solution "from scratch" usually means copying from your last project, deleting half of it, and spending an hour wiring up CI, MSBuild props, versioning, and code quality tooling. Every new repo drifts slightly from the last one. Six months later, no two solutions look the same. @@ -316,6 +410,7 @@ skills/ scripts/ # Optional — executable code (Python, Bash, etc.) references/ # Optional — detailed reference docs evals/ # Required for repo-managed skills — per-skill evals/evals.json + files/ # Optional — eval fixture inputs referenced by evals/evals.json files[] ``` ## Contributing diff --git a/assets/hero.jpg b/assets/hero.jpg new file mode 100644 index 0000000..0866e7c Binary files /dev/null and b/assets/hero.jpg differ diff --git a/scripts/validate-skill-templates.ps1 b/scripts/validate-skill-templates.ps1 index 86aabee..d85f861 100644 --- a/scripts/validate-skill-templates.ps1 +++ b/scripts/validate-skill-templates.ps1 @@ -284,6 +284,29 @@ Add-ValidationResult -Results $results -Name 'All repo-managed skills include va if ([string]::IsNullOrWhiteSpace([string]$eval.expected_output)) { throw "$relativeEvalPath contains an eval without expected_output" } + if ($eval.PSObject.Properties.Name -contains 'files' -and $null -ne $eval.files) { + $fixturePaths = @($eval.files) + + foreach ($fixturePath in $fixturePaths) { + if ([string]::IsNullOrWhiteSpace([string]$fixturePath)) { + throw "$relativeEvalPath contains a blank files entry" + } + + $normalizedFixturePath = ([string]$fixturePath).Trim() -replace '\\', '/' + if ($normalizedFixturePath.StartsWith('/')) { + throw "$relativeEvalPath contains an absolute files entry '$fixturePath'" + } + if ($normalizedFixturePath -match '^[A-Za-z]:/') { + throw "$relativeEvalPath contains a drive-qualified files entry '$fixturePath'" + } + if (($normalizedFixturePath -split '/') -contains '..') { + throw "$relativeEvalPath contains a parent-directory files entry '$fixturePath'" + } + + $skillRelativeFixturePath = 'skills/{0}/{1}' -f $skillDir.Name, $normalizedFixturePath + [void](Get-FileText -RepoRoot $repoRoot -RelativePath $skillRelativeFixturePath -GitRef $Ref) + } + } } } } diff --git a/skills/markdown-illustrator/SKILL.md b/skills/markdown-illustrator/SKILL.md new file mode 100644 index 0000000..4057209 --- /dev/null +++ b/skills/markdown-illustrator/SKILL.md @@ -0,0 +1,145 @@ +--- +name: markdown-illustrator +description: > + Turn a markdown document into a visualization-first chat response consisting of one Visual Brief and one high-quality diffuser prompt generated with best-effort reasoning. Use when the user references a .md file and wants a hero image, cover image, visual digest, keynote opener, illustration, or diffuser prompt, especially for requests like "turn roadmap.md into a keynote opener image" or "create a visual digest for onboarding-notes.md". Default to zero follow-up questions, no file creation, and no style/theme/model menus; infer a compact visual strategy from the request and document, and only honor extra specificity when the user explicitly asks for a named model, aesthetic, or visual treatment such as whiteboard or blackboard. +--- + +# Markdown Illustrator + +![Markdown Illustrator](assets/hero.png) + +This skill reads a markdown file and answers directly in chat with one visualization-focused Visual Brief plus one final prompt compiled to be concrete, readable, and diffusion-ready. + +## Critical + +- Keep the workflow narrow by default: one document-wide visual interpretation, one shared summary, one final prompt, direct chat response. +- Use the Visual Brief as the anchor for the final prompt. +- Default to best-effort reasoning across diffusers. Only become narrower or model-specific when the user explicitly asks. +- Treat final prompt generation like prompt compilation: preserve intent, but strengthen clarity, structure, physical grounding, and renderability. +- Do not turn visual strategy into a questionnaire. Infer a small set of visual defaults and proceed unless the user explicitly asks to steer them. +- Default to de-duplicated composition. Show each major concept, phase, label, callout, panel, or document fragment once unless the user explicitly asks for repetition, comparison, or multi-panel variation. + +## Output Contract + +Return this structure directly in chat: + +```markdown +## Visual Brief + +**Subject:** [one-sentence description of what the document is truly about] +**Audience:** [who the document is for] +**Core narrative:** [the tension, movement, or transformation inside the document] +**Visual opportunity:** [the most compelling scene, metaphor, or spatial idea to visualize] +**Mood:** [the emotional tone the image should carry] +**Must-show elements:** [3-6 concrete visual anchors that belong in the image] +**Avoid:** [things that would misrepresent or weaken the image] + +## Final Prompt + +[one production-ready prompt paragraph] +``` + +The Visual Brief is the anchor. The final prompt is the best single visual interpretation of that shared meaning. + +## Workflow + +Follow the documented procedure in this file in order: `Reading the Markdown` -> `Writing the Visual Brief` -> `Inferring Visual Strategy` -> `Compiling the Final Prompt`. + +1. Read the referenced markdown file for meaning, not for headings. +2. Distill the whole document into a Visual Brief optimized for visualization. +3. Infer a compact visual strategy from the request and document without asking follow-up questions. +4. Generate one best-effort final prompt from that summary and strategy. +5. Return the result directly in chat. +6. If the user explicitly asked for constraints such as `Flux`, `photorealistic`, `editorial illustration`, `16:9`, `avoid people`, `whiteboard`, or `blackboard`, honor them inside the final prompt without turning the interaction into a selection flow. + +## Reading the Markdown + +Extract: + +- the real subject of the document +- the intended audience +- the central transformation, conflict, or promise +- recurring symbols, systems, environments, or objects +- the emotional energy the image should carry + +Do not mirror every section. Distill the document into one image-worthy idea. + +Ignore headings that are structural only, such as `Table of Contents`, `References`, or appendices, unless the user explicitly wants them reflected. + +## Writing the Visual Brief + +Write the summary for visualization, not for literary analysis. + +- **Subject** should identify what the image is fundamentally about. +- **Audience** should clarify how polished, technical, aspirational, or educational the image should feel. +- **Core narrative** should name the movement or tension, such as migration, orchestration, discovery, simplification, growth, launch, or coordination. +- **Visual opportunity** should identify the strongest scene or metaphor, not a style label. +- **Mood** should guide lighting, composition energy, and texture. +- **Must-show elements** should name only the elements that materially improve recognition and fidelity. +- **Avoid** should protect against misleading clichés, irrelevant detail, or flat literalism. + +## Inferring Visual Strategy + +Infer these dimensions silently. Do not present them as a menu. + +- **Intent**: choose `hero`, `digest`, `diagram`, or `cover`. + - Use `hero` for requests such as `hero image`, `breathtaking`, `cinematic`, `launch`, `raise interest`, or when no stronger signal exists. + - Use `digest` for requests such as `digest`, `overview`, `summary`, `onboarding`, or `explain`. + - Use `diagram` for requests such as `diagram`, `systems`, `process`, `architecture`, `workflow`, or `pipeline`. + - Use `cover` for requests such as `cover image`, `keynote opener`, `poster`, or `editorial cover`. +- **Visual treatment**: preserve explicit user styles such as `whiteboard`, `blackboard`, `scientific`, `hand-drawn`, `isometric`, or `minimal line art`. If the user gives no style signal, default to `cinematic editorial`. +- **Abstraction level**: choose `literal`, `balanced`, or `concept-led`. + - Use `concept-led` for `wow`, `hero`, `cinematic`, `breathtaking`, `desirable`, or `raise interest`. + - Use `balanced` for `digest`, `overview`, `onboarding`, `educational`, or `systems`. + - Use `literal` only when the user explicitly asks for faithful, exact, or direct process depiction. + - Default to `concept-led` for `hero` and `cover`, and `balanced` for `digest` and `diagram`. +- **Label density**: choose `none`, `minimal`, `light`, or `academic`. + - Use `none` or `minimal` for `hero`, `cinematic`, `minimal text`, or `infographic-first`. + - Use `academic` for `scientific`, `textbook`, `educational diagram`, or explicit labeling requests. + - Otherwise default to `minimal`. +- **Aspect ratio**: prefer wide compositions. + - Honor an explicit user ratio such as `16:9`, `3:2`, `4:3`, or `1:1`. + - Otherwise default to `16:9` for hero, cover, cinematic, keynote, and most digest requests. + - Use `3:2` when the composition feels more editorial, poster-like, or object-centered and slightly less panoramic framing improves clarity. + - Avoid defaulting to `1:1` unless the user explicitly asks for it. + +Keep the output contract unchanged: always return both `Visual Brief` and `Final Prompt` unless the user explicitly asks for prompt-only output. + +## Compiling the Final Prompt + +Write one vivid prompt paragraph that another agent or human can paste into a diffuser. + +Default behavior: + +- Optimize for broad diffuser compatibility. +- Prefer natural language over vendor-specific syntax. +- Convert abstract ideas into concrete visual elements. Represent concepts as visible objects, scenes, marks, layers, flows, or diagram components. +- Be specific about the scene, relationships, and visual anchors when that specificity strengthens the image. +- Favor one memorable visual concept over a crowded checklist. +- Use best-effort reasoning to choose the strongest visual treatment for the document. + +Compiler rules: + +- Enforce strong composition. Prefer clear spatial structure such as `on the left`, `in the center`, and `on the right` when useful. +- Show transformation explicitly. If the document is about change, process, or reasoning, visualize it through arrows, flows, transitions, intermediate stages, or layered progression. +- Use the inferred strategy to bias the image strongly instead of asking for clarification. + - `hero` and `cover`: prioritize spectacle, memorability, atmosphere, premium composition, and one dominant visual idea over procedural completeness. + - `digest`: prioritize readability, overview, and a satisfying big-picture synthesis without turning the image into a wall of cards. + - `diagram`: prioritize system legibility, visual flow, and structural clarity without becoming text-heavy. + - `concept-led`: prefer one striking scene or metaphor that captures the document's meaning, even if not every process step is shown literally. + - `balanced` or `literal`: keep more visible process fidelity, but still avoid sterile box-and-arrow overload. +- Prefer one authoritative artifact over repeated fragments. If the image resembles a document, diagram, scientific plate, dashboard, or infographic, bias strongly toward one clean instance of each structural element rather than many similar panels or echoed blocks. +- Add cognitive structure when relevant, but use embedded text sparingly. Prefer arrows, icons, blocks, and symbolic grouping over text-heavy diagrams because image models often render words poorly. +- If labels or annotations are truly necessary, keep them very short, secondary, and non-critical to success. Prefer one-to-three-word phrases over readable sentences or dense copy. +- Ground the image in a physical medium when appropriate. If the chosen treatment benefits from a medium such as whiteboard, chalkboard, blueprint, or another concrete surface, describe material details like marker strokes, chalk dust, lighting, texture, and small imperfections. +- Steer the composition toward a wide frame by default. Prefer `16:9` unless `3:2` clearly better supports the scene; avoid implying `1:1` unless the user explicitly asked for square output. +- Control complexity aggressively. Keep one dominant idea, one obvious focal path, and one readable hierarchy. +- Add negative constraints directly into the prompt to prevent failure modes, using phrases like `no clutter`, `no chaos`, `no unnecessary elements`, `clean composition`, and `intentional layout` where appropriate. +- Add anti-repetition constraints by default, especially for structured visuals. Use phrases such as `no duplicated sections`, `no repeated bullets`, `no repeated steps`, `no echoed callouts`, `no mirrored panels`, `one authoritative page or diagram`, and `one instance per concept` unless the user explicitly asked for repetition. +- When text rendering is a risk, add a direct steer such as `no gibberish text`, `no fake words`, `no dense paragraphs`, or `minimal legible labels only if essential`. +- Describe what is seen, not what is implied. Replace conceptual wording with observable visual detail. +- Preserve the user's meaning. Improve execution, not intent. + +Output rule: + +- Output a single refined prompt only under `## Final Prompt`, with no extra explanation inside that section. diff --git a/skills/markdown-illustrator/assets/hero.png b/skills/markdown-illustrator/assets/hero.png new file mode 100644 index 0000000..b49a5e7 Binary files /dev/null and b/skills/markdown-illustrator/assets/hero.png differ diff --git a/skills/markdown-illustrator/evals/evals.json b/skills/markdown-illustrator/evals/evals.json new file mode 100644 index 0000000..9b716a9 --- /dev/null +++ b/skills/markdown-illustrator/evals/evals.json @@ -0,0 +1,68 @@ +{ + "skill_name": "markdown-illustrator", + "evals": [ + { + "id": 1, + "prompt": "I have this markdown file about microservices architecture. It's super text-heavy and I want one strong hero-image prompt that captures the whole document visually. Just answer in chat.", + "expected_output": "A direct chat response containing one Visual Brief and one compiled final prompt that captures the document as a whole without file creation or follow-up selection questions.", + "files": ["evals/files/microservices-architecture.md"], + "expectations": [ + "Responds directly in chat instead of creating a companion file", + "Contains a Visual Brief section with visualization-oriented fields", + "Contains a Final Prompt section", + "Contains exactly one final prompt", + "The final prompt reflects the same document-wide microservices architecture story as the Visual Brief", + "The final prompt uses concrete visible elements rather than abstract wording alone", + "The final prompt shows strong composition or reading direction when useful", + "The final prompt includes complexity-control language such as clean composition or no clutter when appropriate", + "The final prompt avoids relying on dense embedded text and steers away from gibberish text, fake words, or excessive readable copy unless minimal labels are essential", + "The final prompt avoids repeated labels, repeated steps, duplicated sections, echoed callouts, and mirrored layout fragments unless the user explicitly asked for repetition", + "Does not introduce theme, style, scope, or model-family selection menus", + "Infers a strong visual strategy instead of asking the user to choose one", + "When the user does not specify an aspect ratio, the prompt favors a wide composition such as 16:9 or 3:2 instead of defaulting to square", + "Uses compiled best-effort reasoning instead of producing multiple prompt variants" + ] + }, + { + "id": 2, + "prompt": "Generate one hero-image prompt for product-launch.md. Keep it cinematic where appropriate and optimize it for Flux, but answer directly in chat.", + "expected_output": "A direct chat response with one Visual Brief and one compiled final prompt that honors the user's explicit cinematic and Flux constraints without file creation or branching into menus.", + "files": ["evals/files/product-launch.md"], + "expectations": [ + "Responds directly in chat instead of creating a companion file", + "Contains a Visual Brief section with visualization-oriented fields", + "Contains exactly one final prompt", + "The final prompt honors the explicit cinematic and Flux-specific request", + "The final prompt uses precise visual language and concrete scene structure", + "The final prompt includes explicit transformation or convergence cues where relevant", + "The final prompt avoids relying on dense embedded text and steers away from gibberish text, fake words, or excessive readable copy unless minimal labels are essential", + "The final prompt avoids repeated labels, repeated steps, duplicated sections, echoed callouts, and mirrored layout fragments unless the user explicitly asked for repetition", + "The final prompt preserves the user's explicit cinematic and Flux constraints while remaining compatible with a wide default frame", + "Does not ask the user to choose anything before generating the result", + "Does not generate extra family, theme, or style variants", + "Keeps the final prompt anchored to the same summary rather than section-by-section prompting" + ] + }, + { + "id": 3, + "prompt": "I've got this transformers-explained.md doc for onboarding new ML engineers. Default behavior is fine. I want one strong visual direction, but keep it anchored in one shared summary and answer in chat.", + "expected_output": "A direct chat response with a visualization-focused Visual Brief and one compiled final prompt that turns the full-document meaning into a single useful visual direction, inferring the visual strategy instead of asking the user to configure it.", + "files": ["evals/files/transformers-explained.md"], + "expectations": [ + "Responds directly in chat instead of creating a companion file", + "Contains a Visual Brief section with visualization-oriented fields", + "Contains exactly one final prompt", + "The final prompt captures the transformer architecture and onboarding context at the whole-document level", + "The final prompt translates conceptual ML ideas into observable visual structures", + "The final prompt preserves readability through controlled complexity and obvious hierarchy", + "The response infers a digest- or diagram-leaning strategy from the onboarding context instead of defaulting blindly to a spectacle-first hero treatment", + "The final prompt avoids relying on dense embedded text and steers away from gibberish text, fake words, or excessive readable copy unless minimal labels are essential", + "The final prompt avoids repeated labels, repeated steps, duplicated sections, echoed callouts, and mirrored layout fragments unless the user explicitly asked for repetition", + "When the user does not specify an aspect ratio, the prompt still favors a natural wide format such as 16:9 or 3:2 instead of square", + "Avoids section-by-section prompts, style catalogs, and theme variants", + "Uses the Visual Brief as the shared anchor for the final prompt", + "Chooses one best-effort direction rather than offering multiple options" + ] + } + ] +} diff --git a/skills/markdown-illustrator/evals/files/microservices-architecture.md b/skills/markdown-illustrator/evals/files/microservices-architecture.md new file mode 100644 index 0000000..bc64346 --- /dev/null +++ b/skills/markdown-illustrator/evals/files/microservices-architecture.md @@ -0,0 +1,48 @@ +# Microservices at Scale: A Cost-Aware Architecture + +## The Monolith-to-Microservices Spectrum + +Not every system needs microservices. A monolith serves well when the team is small, the domain is simple, and deployment cadence is weekly. Microservices earn their complexity when teams grow past 20 engineers, domains diverge, and independent deployment becomes a bottleneck. + +The spectrum runs from pure monolith through modular monolith, to service-oriented architecture, to fine-grained microservices. Most organizations land somewhere in the middle — and that is fine. + +## Service Mesh and Observability + +A service mesh (Istio, Linkerd) handles cross-cutting concerns: mTLS, retries, circuit breaking, and traffic shaping. Without it, every team reinvents the wheel — and gets it wrong in different ways. + +Observability stacks typically combine: +- **Metrics**: Prometheus + Grafana for throughput, latency percentiles (p50, p95, p99), error rates +- **Traces**: Jaeger or Zipkin for request-level flow across services +- **Logs**: Structured JSON logs shipped to Elasticsearch or Loki + +The cost of observability infrastructure is non-trivial — often 10-15% of total cloud spend. Budget for it explicitly. + +## GPU Inference as a Service + +Running ML models in production means managing GPU fleets. Key economics: +- On-demand A100: ~$3.50/GPU/hour +- Reserved instances: ~$1.80/GPU/hour (1-year commitment) +- Spot/preemptible: ~$1.05/GPU/hour (can be reclaimed) + +Utilization is the critical metric. An idle GPU at $3.50/hr is pure waste. Autoscaling based on queue depth (not CPU) keeps utilization above 70%. + +Batch inference (offline) should always use spot instances. Real-time inference needs a mix: reserved for baseline load, on-demand for spikes. + +## Data Pipeline Architecture + +Event-driven pipelines (Kafka, Pulsar) decouple producers from consumers. The pattern: +1. Service emits event to topic +2. Stream processor transforms/enriches +3. Sink writes to data warehouse (BigQuery, Snowflake) +4. Analytics layer queries aggregated data + +Backpressure handling matters — a slow consumer should not crash the pipeline. Use consumer group lag monitoring and automatic partition rebalancing. + +## Cost Optimization Strategies + +Cloud costs grow faster than revenue if left unchecked. Three levers: +- **Right-sizing**: Most VMs are over-provisioned by 40%. Use utilization data to downsize. +- **Spot/preemptible workloads**: Stateless batch jobs, CI/CD runners, and dev environments can tolerate interruption. +- **Reserved capacity**: Commit to 1-year or 3-year reservations for stable baseline workloads. Savings: 30-60%. + +Tag everything. Untagged resources are invisible to cost allocation — and invisible costs are uncontrolled costs. diff --git a/skills/markdown-illustrator/evals/files/product-launch.md b/skills/markdown-illustrator/evals/files/product-launch.md new file mode 100644 index 0000000..e41ac42 --- /dev/null +++ b/skills/markdown-illustrator/evals/files/product-launch.md @@ -0,0 +1,43 @@ +# Product Launch Playbook + +## Vision and Strategy + +We are building a developer-first API platform that makes it trivial to add authentication to any application. The market is crowded (Auth0, Clerk, Firebase Auth) but fragmented — no single solution handles mobile, web, and server-to-server auth well across all frameworks. + +Our bet: a unified SDK that works identically in React, Swift, Kotlin, and server-side Node/Python/Go. One API surface, one dashboard, one billing model. + +## Target Personas + +### Indie Developer (Solo) +Ships side projects on weekends. Needs auth that "just works" in 5 minutes. Price-sensitive — free tier is critical. Values clear docs over feature depth. + +### Startup CTO (Team of 5-15) +Scaling fast. Needs SSO, RBAC, and audit logs yesterday. Will pay $500-2000/month for reliability and compliance (SOC2, HIPAA). Cares about uptime SLA. + +### Enterprise Architect +Evaluates over 6-month cycles. Needs SAML, SCIM, custom domains, data residency. Budget is not the constraint — security review and vendor risk assessment are. + +## Go-to-Market Phases + +**Phase 1 (Months 1-3)**: Developer preview. Ship core auth (email/password, OAuth, magic link). Free tier only. Goal: 500 developers, 10 production apps. + +**Phase 2 (Months 4-6)**: Paid launch. Add SSO, RBAC, webhooks. Launch Pro tier at $29/month. Goal: 50 paying customers. + +**Phase 3 (Months 7-12)**: Enterprise. SAML, SCIM, SLA, dedicated support. Custom pricing. Goal: 5 enterprise contracts. + +## What This Is NOT + +- Not a full identity platform (no KYC, no identity verification) +- Not a user management system (no CRM, no marketing automation) +- Not a security product (no WAF, no DDoS protection) + +We do one thing — authentication — and we do it exceptionally well. + +## Risks and Mitigations + +| Risk | Impact | Mitigation | +|------|--------|------------| +| Auth0 drops pricing | High | Differentiate on DX, not price | +| Security breach | Critical | Bug bounty program, pen testing, SOC2 from day 1 | +| Slow enterprise sales | Medium | Focus on PLG (product-led growth) for steady revenue | +| SDK maintenance burden | Medium | Code generation from OpenAPI spec | diff --git a/skills/markdown-illustrator/evals/files/transformers-explained.md b/skills/markdown-illustrator/evals/files/transformers-explained.md new file mode 100644 index 0000000..64e2204 --- /dev/null +++ b/skills/markdown-illustrator/evals/files/transformers-explained.md @@ -0,0 +1,32 @@ +# Understanding Transformer Architectures + +## Attention Is All You Need + +The transformer architecture replaced recurrent networks by computing attention over all positions in parallel. The core insight: instead of processing tokens sequentially (which creates a bottleneck), let every token attend to every other token simultaneously. + +Self-attention computes three projections for each token: Query (Q), Key (K), and Value (V). The attention score between two tokens is the dot product of Q and K, scaled by the square root of the dimension, then passed through softmax to get weights. These weights are applied to V to produce the output. + +Multi-head attention runs this process multiple times in parallel with different learned projections, then concatenates the results. Each "head" can learn to attend to different types of relationships — syntactic, semantic, positional. + +## Scaling Laws and Compute Budget + +Kaplan et al. (2020) showed that model performance follows power laws across three axes: parameters (N), dataset size (D), and compute budget (C). The Chinchilla paper refined this: for a fixed compute budget, the optimal model is smaller and trained on more data than previously thought. + +Key numbers: +- GPT-3: 175B params, ~300B tokens, ~3.6E23 FLOPs +- Chinchilla: 70B params, 1.4T tokens, ~5.8E23 FLOPs (similar compute, better performance) +- LLaMA 2 70B: 2T tokens training data + +The takeaway: throwing more parameters at a problem without proportionally scaling data is wasteful. Compute-optimal training balances both. + +## Inference Optimization + +Training a model is expensive but one-time. Inference runs forever and dominates total cost. Key techniques: + +**Quantization**: Reduce precision from FP32 → FP16 → INT8 → INT4. Each step roughly halves memory and doubles throughput, with small accuracy loss. GPTQ and AWQ are popular 4-bit methods. + +**KV Cache**: During autoregressive generation, cache the Key and Value tensors from previous tokens. Without caching, every new token recomputes attention over the full context — O(n²) becomes O(n). + +**Speculative Decoding**: Use a small "draft" model to propose several tokens, then verify them in parallel with the large model. If the draft model is good enough, this provides 2-3x speedup with no quality loss. + +**Batching**: Group multiple requests and process them together. Continuous batching (rather than static batching) maximizes GPU utilization by inserting new requests as old ones complete. diff --git a/skills/skill-creator-agnostic/SKILL.md b/skills/skill-creator-agnostic/SKILL.md index 0e645e7..fe494f8 100644 --- a/skills/skill-creator-agnostic/SKILL.md +++ b/skills/skill-creator-agnostic/SKILL.md @@ -20,6 +20,7 @@ On Windows or when running from PowerShell, also read `references/windows-powers - Keep all eval workspaces under a temp root such as `$env:TEMP/-workspace/`, never inside the source repo. - For repo-managed skills, keep `skills//`, `~/.claude/skills//`, and `~/.agents/skills//` in sync before calling the work done. - Every repo-managed skill must keep a per-skill `evals/evals.json`. +- If an eval entry declares `files`, treat those paths as skill-relative fixtures and stage them into the temp workspace for both benchmark configurations. - Benchmark directories must follow `iteration-N/eval-name/{config}/run-N/` exactly; do not flatten files directly under `with_skill/` or `without_skill/`. - `grading.json` must include both `expectations` and a populated `summary` object with `passed`, `failed`, `total`, and `pass_rate`. - Generate `benchmark.json` through `skill-creator/scripts/aggregate_benchmark.py`; never hand-author it. @@ -67,6 +68,7 @@ Read or create the per-skill `evals/evals.json`, then ensure each eval has a cor iteration-N/ eval-1-name/ eval_metadata.json + fixtures/ with_skill/ run-1/ grading.json @@ -80,6 +82,7 @@ iteration-N/ ``` Keep `eval_metadata.json` at the eval-directory level. Put run artifacts under `run-N/` so `aggregate_benchmark.py` can discover them. +If `evals/evals.json` declares `files`, copy those skill-relative fixtures into `fixtures/` at the eval-directory level and make them available to both runs. ### Step 5: Run paired benchmarks @@ -93,6 +96,7 @@ For `MEASURED` runs: - save the real outputs - save transcripts or command logs when available - keep timings and token counts tied to the actual run +- use the same staged fixture files for both `with_skill` and `without_skill` runs when the eval declares `files` For `SIMULATED` runs: diff --git a/skills/skill-creator-agnostic/evals/evals.json b/skills/skill-creator-agnostic/evals/evals.json index 6138005..fa52cf8 100644 --- a/skills/skill-creator-agnostic/evals/evals.json +++ b/skills/skill-creator-agnostic/evals/evals.json @@ -57,6 +57,17 @@ "Keeps repo-managed sync and README update expectations in scope for first-party skills", "Frames the new skill as cross-runner guidance rather than a single-runner fork" ] + }, + { + "id": 6, + "prompt": "This skill's evals.json includes files like evals/files/input.md. How should the benchmark runner handle them so with_skill and without_skill stay comparable?", + "expected_output": "The response treats files as skill-relative fixtures, stages them into the temp workspace, and uses the same inputs for both configurations.", + "expectations": [ + "Treats eval files as skill-relative fixture paths rather than workspace-relative paths", + "Stages declared fixture files into the temp eval workspace before running either configuration", + "Uses the same staged fixtures for with_skill and without_skill runs", + "Keeps the guidance grounded in the existing iteration-N/eval-name/{config}/run-N/ benchmark contract" + ] } ] } diff --git a/skills/skill-creator-agnostic/references/benchmark-contract.md b/skills/skill-creator-agnostic/references/benchmark-contract.md index 9d05c05..da9dea6 100644 --- a/skills/skill-creator-agnostic/references/benchmark-contract.md +++ b/skills/skill-creator-agnostic/references/benchmark-contract.md @@ -8,6 +8,7 @@ Use this reference whenever a skill benchmark must be reproducible across differ iteration-N/ eval-1-name/ eval_metadata.json + fixtures/ with_skill/ run-1/ grading.json @@ -24,8 +25,29 @@ Key rules: - `aggregate_benchmark.py` walks `run-*` directories. If files live directly under `with_skill/` or `without_skill/`, the benchmark will discover zero runs. - `eval_metadata.json` belongs at the `eval-*` directory level, not inside each run directory. +- `fixtures/` is optional and should contain copied input files referenced by `evals/evals.json` `files[]` entries when the eval depends on attached source material. - `outputs/` may contain files, diffs, transcripts, or other evidence the reviewer should inspect. +## Optional Eval Fixture Files + +Repo-managed skills may declare optional attached input files in `evals/evals.json`: + +```json +{ + "id": 1, + "prompt": "Use the attached markdown file to generate a Visual Brief and final prompt.", + "expected_output": "A direct chat response grounded in the attached source material.", + "files": ["evals/files/example.md"] +} +``` + +Rules: + +- `files` paths are relative to `skills//`, not to the temp workspace. +- Keep fixture inputs under the skill folder, usually `evals/files/`. +- Copy declared fixtures into the eval-level `fixtures/` directory before running either configuration. +- Use the same staged fixtures for both `with_skill` and `without_skill` runs so the comparison stays fair. + ## Required Files ### eval_metadata.json diff --git a/skills/trunk-first-repo/SKILL.md b/skills/trunk-first-repo/SKILL.md index e6601dd..f0b0289 100644 --- a/skills/trunk-first-repo/SKILL.md +++ b/skills/trunk-first-repo/SKILL.md @@ -6,6 +6,8 @@ description: > # Trunk-First Repo +![Trunk-First Repo](assets/hero.jpg) + Initialize a folder as a git repository following [scaled trunk-based development](https://trunkbaseddevelopment.com/#scaled-trunk-based-development). The core principle: **main is sacred** — it starts empty and content only enters through peer-reviewed pull requests from short-lived feature branches. This matters because it prevents accidental pushes to main, establishes a clean PR-based workflow from day one, and makes the git history meaningful by design rather than as an afterthought. @@ -48,9 +50,9 @@ git remote add origin {REMOTE_URL} git push -u origin main ``` -If skipped, remind the user they can add it later: - -> When you're ready, run: `git remote add origin ` followed by `git push -u origin main` +If skipped, remind the user they can add it later: + +> When you're ready, run: `git remote add origin ` followed by `git push -u origin main` ### Step 4: Summary diff --git a/skills/trunk-first-repo/assets/hero.jpg b/skills/trunk-first-repo/assets/hero.jpg new file mode 100644 index 0000000..4a2da93 Binary files /dev/null and b/skills/trunk-first-repo/assets/hero.jpg differ