Skip to content

Li-Bailiang/prompt-refine-skill

English | 中文

Prompt Refine

A model-aware Agent Skill that silently refines your prompt for the model currently answering.

You just ask. The active model reshapes the request for itself, preserves your language, and answers without showing the rewrite.

MIT license Agent Skill npm version GitHub stars Zero dependencies No optimizer call

Project Introduction | Quick Start | Feature Demonstration | Strategies | Evaluation | Platforms | Examples


Project Introduction

Prompt Refine is a lightweight, cross-platform Agent Skill. After activation, it detects which model is currently running the conversation and applies that model family's prompting strategy before answering.

The core design is simple but important: route by host model, not by task. If Claude is answering, Prompt Refine uses the Anthropic strategy for the whole conversation. If GPT is answering, it uses the OpenAI strategy. A coding task never switches Claude into GPT-style prompting, and a writing task never switches GPT into Claude-style XML.

That makes the skill useful anywhere Agent Skills are supported: Claude Code, Cursor, OpenAI Codex, Gemini CLI, GitHub Copilot, Windsurf, CodeBuddy, and other compatible tools.

It is context-aware: follow-up requests can inherit the relevant goal, constraints, terminology, and preferences from the conversation, while the newest user instruction still wins.

It is intentionally lightweight: no runtime dependencies, no app server, no extra optimizer call, and only a short skill file plus one selected strategy file in context. The goal is better structure without spending a pile of extra tokens.

Feature Demonstration

The same user request gets a different internal shape depending on the host model. These examples show the hidden rewrite style; in normal mode the user only sees the final answer.

1. Vague Request: Add The Missing Shape

User request:

Help me analyze this market.

Anthropic Claude shape:

<role>You are a senior market analyst specializing in competitive intelligence.</role>
<context>
The user has not named the market, geography, customer segment, or timeframe.
Preserve uncertainty; make practical assumptions explicit instead of inventing facts.
</context>
<task>
Analyze the competitive landscape for the most likely intended market.
</task>
<constraints>
- Start by naming assumptions about market, audience, geography, and timeframe.
- Separate confident analysis from unknowns.
- Do not claim current market data unless it was provided or can be verified.
- Ask only the one or two follow-up questions that would most improve the analysis.
</constraints>
<format>
Use these sections: Assumptions, Competitive Map, Barriers And Switching Costs,
Strategic Implications, Unknowns, Next Questions.
</format>
<success_criteria>
The answer should be useful before the user clarifies the market, while making clear
which parts depend on assumptions.
</success_criteria>

OpenAI GPT shape:

Goal: Turn an underspecified market-analysis request into a useful first-pass competitive landscape.

User request:
"""Help me analyze this market."""

Relevant context:
- Market, geography, audience, and timeframe are missing.
- Preserve uncertainty and make assumptions explicit.

Instructions:
1. State the assumed market scope first.
2. Identify likely player categories and competitive dynamics.
3. Compare barriers, switching costs, and strategic implications.
4. Flag unknowns instead of inventing facts.

Hard constraints:
- Do not claim current market data unless it was provided or can be verified.
- Ask only 1-2 follow-up questions.

Output format: Markdown headings for Assumptions, Competitive Map, Barriers,
Strategic Implications, Unknowns, and Next Questions.

2. Clear Request: Preserve The Constraints

User request:

Write a 5-item npm release checklist. Keep each item under 8 words.

Anthropic Claude shape:

<context>
The user gave a tightly constrained formatting request. Do not expand the task.
</context>
<task>Write exactly five npm release checklist items.</task>
<constraints>
- Each item must be under 8 words.
- Cover package.json, README, LICENSE, version, and dry-run publishing.
- Return checklist items only; no intro or explanation.
</constraints>
<format>Use a numbered list with one short imperative phrase per item.</format>
<success_criteria>
Exactly 5 items, each under 8 words, with all requested topics covered.
</success_criteria>

OpenAI GPT shape:

Task: Write exactly five npm release checklist items.

Context: The user already provided clear hard constraints, so preserve them and do not add scope.

Hard constraints:
- Under 8 words per item.
- Cover package.json, README, LICENSE, version, and dry-run publishing.
- Return only the checklist.

Output contract:
- Numbered list.
- Exactly 5 lines.
- No intro or outro.

Quality check before answering: each item is under 8 words and covers one requested release topic.

What The User Sees

Only the final answer. The rewrite stays silent unless /refine verbose is enabled. For clear prompts, Prompt Refine should stay minimal and protect the user's exact constraints.

The strategy always follows the host model, not the topic: Claude gets Claude-shaped structure, GPT gets GPT-shaped structure.

Quick Start

Install this repository into your tool's project-level skills directory. For Claude Code:

git clone https://github.com/Li-Bailiang/prompt-refine-skill.git .claude/skills/prompt-refine

To avoid copying the .git folder, use a release archive or:

npx degit Li-Bailiang/prompt-refine-skill .claude/skills/prompt-refine

The skill is also published on npm as prompt-refine-skill (versioned releases). npm does not auto-register an Agent Skill; use it as a versioned source and unpack the package into your tool's skills directory:

mkdir -p .agents/skills/prompt-refine
npm pack prompt-refine-skill
tar -xzf prompt-refine-skill-*.tgz --strip-components=1 -C .agents/skills/prompt-refine

The git clone and degit commands above place the files directly in your tool's skills directory.

Activate it in a conversation:

/prompt-refine

Available in-session controls:

/refine verbose    # Show a compact original -> refined diff before each answer
/refine off        # Stop refining for the rest of the conversation
/prompt-refine     # Re-activate after context compaction or a new session

Install Paths

Tool Project-level skill path
Claude Code .claude/skills/prompt-refine
Cursor .cursor/skills/prompt-refine or .agents/skills/prompt-refine
OpenAI Codex .agents/skills/prompt-refine
Gemini CLI .gemini/skills/prompt-refine or .agents/skills/prompt-refine
GitHub Copilot (VS Code) .github/skills/prompt-refine or .agents/skills/prompt-refine
Windsurf .windsurf/skills/prompt-refine
CodeBuddy .codebuddy/skills/prompt-refine

Most tools also accept the shared .agents/skills/ convention. User-level paths differ by platform, so use each tool's official docs when installing globally.

Built-in Strategies

Host model Strategy file Source family
OpenAI GPT (GPT-5 family) strategies/openai.md OpenAI prompting guidance
Anthropic Claude strategies/anthropic.md Anthropic prompt engineering
Google Gemini strategies/google-gemini.md Gemini prompt design
Meta Llama strategies/meta-llama.md Llama prompting guidance
DeepSeek V4 (+ R1) strategies/deepseek.md DeepSeek prompt library
Mistral / Codestral strategies/mistral.md Mistral best practices
Qwen strategies/qwen.md Alibaba Model Studio guidance
xAI Grok strategies/xai-grok.md xAI Grok prompting references
Perplexity Sonar strategies/perplexity.md Perplexity prompt guide
Kimi / Moonshot AI strategies/kimi.md Kimi prompt best practices
Cohere Command strategies/cohere.md Cohere docs
Amazon Nova strategies/amazon-nova.md Nova prompt guide
Microsoft Phi strategies/microsoft-phi.md Phi Cookbook
Unknown host strategies/universal.md Conservative fallback

Evaluation

Prompt Refine was evaluated in a blind, position-swapped A/B test on 120 vague prompts (60 English, 60 Chinese, 32 domains). The same generator model answered each prompt twice — once raw, once with Prompt Refine active — and an independent judge scored the two answers without knowing which was which. Each pair was judged twice with the answers swapped to cancel order bias.

Headline results

Result
Refine vs raw win-rate 74.0% (167 wins / 52 losses / 21 ties of 240 judgments)
95% bootstrap CI (per prompt, n = 120) [66.9%, 80.6%]
Sign test p < 0.0001
English / Chinese split 75.0% / 72.9%
Length-matched win-rate 64.7% (refine answer within ±25% of raw length)

The length-matched figure is reported alongside the headline to rule out a length preference in the judge. On length-matched pairs the current release wins 64.7%, versus 50.5% for the previous version of the skill — evidence of a genuine quality gain, not just longer answers.

Per-dimension delta (refine − raw, 1–5 scale)

Dimension Δ
actionability +0.96
completeness +0.81
structure +0.49
clarification +0.35
language fidelity +0.03

Robustness

Check Result
scaffold leakage (<role> / <task> / rewritten prompt in output) 0 / 120
prose-language switches on Chinese prompts (code stripped) 0 / 60
parse fallbacks · skipped prompts 0 · 0

Guard suite

Prompt Refine also has a small non-regression suite for clear or constraint-heavy prompts: JSON/config output, word limits, language fidelity, and direct-answer tasks. On the current 6-prompt guard suite, refine wins 66.7% of 12 position-swapped judgments (8 wins / 4 losses / 0 ties). Treat this as an early guardrail, not a broad proof.

Models: generator claude-sonnet-4-6, judge claude-opus-4-8. The host-model strategy under test is Anthropic (strategies/anthropic.md); other strategy files ship with the same design but have not yet been evaluated at this scale.

The evaluation harness, prompts, rubrics, anonymized answer pairs, judge JSON, run commands, and checked-in result summaries are available in the GitHub repository under eval/. The eval files are kept out of the npm package so normal skill installation stays lightweight.

Limitations

Prompt Refine is deliberately simple, and it is honest about what it is not:

  • Best-effort, not deterministic. It refines while the activation stays in the model's context. On a long, compacted conversation it can lapse until you re-run /prompt-refine.
  • Depends on the host model following meta-instructions. Models that do not reliably follow "silently restructure, then answer" will benefit less.
  • Only the Anthropic strategy is evaluated at scale. The other strategy files ship with the same design but have not been benchmarked equivalently (see Evaluation).
  • Strategies track fast-moving vendor docs. They summarize official guidance and need periodic updates as that guidance changes.
  • Little benefit on already-clear prompts. By design the intervention can be none — it is most useful on vague or underspecified requests.

Why Prompt Refine?

Prompt Refine Standalone prompt optimizers
Form Agent Skill Web or desktop app
Model fit Uses the currently running model's strategy Generic or manually selected
Output Silent final answer Shows optimized prompt
Activation Conversation-scoped and toggleable Usually one-off
Language Preserves original language and intent Depends on implementation
Token cost Low: short skill + one strategy Often another full prompt pass
Dependencies None Often app-specific

Compatible Platforms

Prompt Refine follows the SKILL.md Agent Skill convention and is designed for tools that can load project-level skills, including Claude Code, Cursor, OpenAI Codex, Gemini CLI, GitHub Copilot, Windsurf, CodeBuddy, and compatible agents.

License

MIT License. Free to use, modify, and distribute.

Contributing

Issues and pull requests are welcome. For new or improved model strategies, read CONTRIBUTING.md first.

Show your support

If Prompt Refine saves you time, please consider giving the repo a ⭐ — it genuinely helps other people discover the project.

Star History

Star History Chart

About

Agent Skill that silently refines prompts for the currently running model

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors