A model-aware Agent Skill that silently refines your prompt for the model currently answering.
You just ask. The active model reshapes the request for itself, preserves your language, and answers without showing the rewrite.
Project Introduction | Quick Start | Feature Demonstration | Strategies | Evaluation | Platforms | Examples
Prompt Refine is a lightweight, cross-platform Agent Skill. After activation, it detects which model is currently running the conversation and applies that model family's prompting strategy before answering.
The core design is simple but important: route by host model, not by task. If Claude is answering, Prompt Refine uses the Anthropic strategy for the whole conversation. If GPT is answering, it uses the OpenAI strategy. A coding task never switches Claude into GPT-style prompting, and a writing task never switches GPT into Claude-style XML.
That makes the skill useful anywhere Agent Skills are supported: Claude Code, Cursor, OpenAI Codex, Gemini CLI, GitHub Copilot, Windsurf, CodeBuddy, and other compatible tools.
It is context-aware: follow-up requests can inherit the relevant goal, constraints, terminology, and preferences from the conversation, while the newest user instruction still wins.
It is intentionally lightweight: no runtime dependencies, no app server, no extra optimizer call, and only a short skill file plus one selected strategy file in context. The goal is better structure without spending a pile of extra tokens.
The same user request gets a different internal shape depending on the host model. These examples show the hidden rewrite style; in normal mode the user only sees the final answer.
User request:
Help me analyze this market.
Anthropic Claude shape:
<role>You are a senior market analyst specializing in competitive intelligence.</role>
<context>
The user has not named the market, geography, customer segment, or timeframe.
Preserve uncertainty; make practical assumptions explicit instead of inventing facts.
</context>
<task>
Analyze the competitive landscape for the most likely intended market.
</task>
<constraints>
- Start by naming assumptions about market, audience, geography, and timeframe.
- Separate confident analysis from unknowns.
- Do not claim current market data unless it was provided or can be verified.
- Ask only the one or two follow-up questions that would most improve the analysis.
</constraints>
<format>
Use these sections: Assumptions, Competitive Map, Barriers And Switching Costs,
Strategic Implications, Unknowns, Next Questions.
</format>
<success_criteria>
The answer should be useful before the user clarifies the market, while making clear
which parts depend on assumptions.
</success_criteria>OpenAI GPT shape:
Goal: Turn an underspecified market-analysis request into a useful first-pass competitive landscape.
User request:
"""Help me analyze this market."""
Relevant context:
- Market, geography, audience, and timeframe are missing.
- Preserve uncertainty and make assumptions explicit.
Instructions:
1. State the assumed market scope first.
2. Identify likely player categories and competitive dynamics.
3. Compare barriers, switching costs, and strategic implications.
4. Flag unknowns instead of inventing facts.
Hard constraints:
- Do not claim current market data unless it was provided or can be verified.
- Ask only 1-2 follow-up questions.
Output format: Markdown headings for Assumptions, Competitive Map, Barriers,
Strategic Implications, Unknowns, and Next Questions.
User request:
Write a 5-item npm release checklist. Keep each item under 8 words.
Anthropic Claude shape:
<context>
The user gave a tightly constrained formatting request. Do not expand the task.
</context>
<task>Write exactly five npm release checklist items.</task>
<constraints>
- Each item must be under 8 words.
- Cover package.json, README, LICENSE, version, and dry-run publishing.
- Return checklist items only; no intro or explanation.
</constraints>
<format>Use a numbered list with one short imperative phrase per item.</format>
<success_criteria>
Exactly 5 items, each under 8 words, with all requested topics covered.
</success_criteria>OpenAI GPT shape:
Task: Write exactly five npm release checklist items.
Context: The user already provided clear hard constraints, so preserve them and do not add scope.
Hard constraints:
- Under 8 words per item.
- Cover package.json, README, LICENSE, version, and dry-run publishing.
- Return only the checklist.
Output contract:
- Numbered list.
- Exactly 5 lines.
- No intro or outro.
Quality check before answering: each item is under 8 words and covers one requested release topic.
Only the final answer. The rewrite stays silent unless /refine verbose is enabled. For clear prompts, Prompt Refine should stay minimal and protect the user's exact constraints.
The strategy always follows the host model, not the topic: Claude gets Claude-shaped structure, GPT gets GPT-shaped structure.
Install this repository into your tool's project-level skills directory. For Claude Code:
git clone https://github.com/Li-Bailiang/prompt-refine-skill.git .claude/skills/prompt-refineTo avoid copying the .git folder, use a release archive or:
npx degit Li-Bailiang/prompt-refine-skill .claude/skills/prompt-refineThe skill is also published on npm as
prompt-refine-skill (versioned
releases). npm does not auto-register an Agent Skill; use it as a versioned source and
unpack the package into your tool's skills directory:
mkdir -p .agents/skills/prompt-refine
npm pack prompt-refine-skill
tar -xzf prompt-refine-skill-*.tgz --strip-components=1 -C .agents/skills/prompt-refineThe git clone and degit commands above place the files directly in your tool's skills
directory.
Activate it in a conversation:
/prompt-refine
Available in-session controls:
/refine verbose # Show a compact original -> refined diff before each answer
/refine off # Stop refining for the rest of the conversation
/prompt-refine # Re-activate after context compaction or a new session
| Tool | Project-level skill path |
|---|---|
| Claude Code | .claude/skills/prompt-refine |
| Cursor | .cursor/skills/prompt-refine or .agents/skills/prompt-refine |
| OpenAI Codex | .agents/skills/prompt-refine |
| Gemini CLI | .gemini/skills/prompt-refine or .agents/skills/prompt-refine |
| GitHub Copilot (VS Code) | .github/skills/prompt-refine or .agents/skills/prompt-refine |
| Windsurf | .windsurf/skills/prompt-refine |
| CodeBuddy | .codebuddy/skills/prompt-refine |
Most tools also accept the shared .agents/skills/ convention. User-level paths differ by platform, so use each tool's official docs when installing globally.
| Host model | Strategy file | Source family |
|---|---|---|
| OpenAI GPT (GPT-5 family) | strategies/openai.md |
OpenAI prompting guidance |
| Anthropic Claude | strategies/anthropic.md |
Anthropic prompt engineering |
| Google Gemini | strategies/google-gemini.md |
Gemini prompt design |
| Meta Llama | strategies/meta-llama.md |
Llama prompting guidance |
| DeepSeek V4 (+ R1) | strategies/deepseek.md |
DeepSeek prompt library |
| Mistral / Codestral | strategies/mistral.md |
Mistral best practices |
| Qwen | strategies/qwen.md |
Alibaba Model Studio guidance |
| xAI Grok | strategies/xai-grok.md |
xAI Grok prompting references |
| Perplexity Sonar | strategies/perplexity.md |
Perplexity prompt guide |
| Kimi / Moonshot AI | strategies/kimi.md |
Kimi prompt best practices |
| Cohere Command | strategies/cohere.md |
Cohere docs |
| Amazon Nova | strategies/amazon-nova.md |
Nova prompt guide |
| Microsoft Phi | strategies/microsoft-phi.md |
Phi Cookbook |
| Unknown host | strategies/universal.md |
Conservative fallback |
Prompt Refine was evaluated in a blind, position-swapped A/B test on 120 vague prompts (60 English, 60 Chinese, 32 domains). The same generator model answered each prompt twice — once raw, once with Prompt Refine active — and an independent judge scored the two answers without knowing which was which. Each pair was judged twice with the answers swapped to cancel order bias.
| Result | |
|---|---|
| Refine vs raw win-rate | 74.0% (167 wins / 52 losses / 21 ties of 240 judgments) |
| 95% bootstrap CI (per prompt, n = 120) | [66.9%, 80.6%] |
| Sign test | p < 0.0001 |
| English / Chinese split | 75.0% / 72.9% |
| Length-matched win-rate | 64.7% (refine answer within ±25% of raw length) |
The length-matched figure is reported alongside the headline to rule out a length preference in the judge. On length-matched pairs the current release wins 64.7%, versus 50.5% for the previous version of the skill — evidence of a genuine quality gain, not just longer answers.
| Dimension | Δ |
|---|---|
| actionability | +0.96 |
| completeness | +0.81 |
| structure | +0.49 |
| clarification | +0.35 |
| language fidelity | +0.03 |
| Check | Result |
|---|---|
scaffold leakage (<role> / <task> / rewritten prompt in output) |
0 / 120 |
| prose-language switches on Chinese prompts (code stripped) | 0 / 60 |
| parse fallbacks · skipped prompts | 0 · 0 |
Prompt Refine also has a small non-regression suite for clear or constraint-heavy prompts: JSON/config output, word limits, language fidelity, and direct-answer tasks. On the current 6-prompt guard suite, refine wins 66.7% of 12 position-swapped judgments (8 wins / 4 losses / 0 ties). Treat this as an early guardrail, not a broad proof.
Models: generator claude-sonnet-4-6, judge claude-opus-4-8. The host-model strategy under test is Anthropic (strategies/anthropic.md); other strategy files ship with the same design but have not yet been evaluated at this scale.
The evaluation harness, prompts, rubrics, anonymized answer pairs, judge JSON, run
commands, and checked-in result summaries are available in the GitHub repository under
eval/. The eval files are kept out of the npm package so normal skill
installation stays lightweight.
Prompt Refine is deliberately simple, and it is honest about what it is not:
- Best-effort, not deterministic. It refines while the activation stays in the model's
context. On a long, compacted conversation it can lapse until you re-run
/prompt-refine. - Depends on the host model following meta-instructions. Models that do not reliably follow "silently restructure, then answer" will benefit less.
- Only the Anthropic strategy is evaluated at scale. The other strategy files ship with the same design but have not been benchmarked equivalently (see Evaluation).
- Strategies track fast-moving vendor docs. They summarize official guidance and need periodic updates as that guidance changes.
- Little benefit on already-clear prompts. By design the intervention can be none — it is most useful on vague or underspecified requests.
| Prompt Refine | Standalone prompt optimizers | |
|---|---|---|
| Form | Agent Skill | Web or desktop app |
| Model fit | Uses the currently running model's strategy | Generic or manually selected |
| Output | Silent final answer | Shows optimized prompt |
| Activation | Conversation-scoped and toggleable | Usually one-off |
| Language | Preserves original language and intent | Depends on implementation |
| Token cost | Low: short skill + one strategy | Often another full prompt pass |
| Dependencies | None | Often app-specific |
Prompt Refine follows the SKILL.md Agent Skill convention and is designed for tools that can load project-level skills, including Claude Code, Cursor, OpenAI Codex, Gemini CLI, GitHub Copilot, Windsurf, CodeBuddy, and compatible agents.
MIT License. Free to use, modify, and distribute.
Issues and pull requests are welcome. For new or improved model strategies, read CONTRIBUTING.md first.
If Prompt Refine saves you time, please consider giving the repo a ⭐ — it genuinely helps other people discover the project.