- feat: 10 more battle-tested skills (RAG, embeddings, codegen, web, security) + fix Devin Review on #61
- feat: real-skills audit — accuracy fixes, 10 production rewrites, quality CI
360 skills across 17 categories. Versioned, benchmarked, and openly evolving.
Stop rediscovering. Start building on what the community has already proven.47 skills are battle-tested today. 313 are stubs waiting for a real example, real I/O, and real failure modes — see
meta/QUALITY-REPORT.mdfor the full list. PRs that turn a stub into a production-ready entry are the highest-impact contribution you can make.
🌐 Browse Live UI · 🗺️ Systems · 🏗️ Blueprints · 📊 Benchmarks · 🔬 Labs · 🤝 Contribute · 🗺 Roadmap
🐦 Share Skills Tree on X / Twitter →
🌐 Read in your language: 🇬🇧 English · 🇸🇦 العربية · 🇨🇳 中文 · 🇪🇸 Español · 🇩🇪 Deutsch · 🇫🇷 Français · 🇮🇳 हिन्दी · 🇯🇵 日本語 · 🇰🇷 한국어 · 🇧🇷 Português · 🇷🇺 Русский
Every AI agent builder rediscovers the same skills from scratch.
Someone learns RAG the hard way. Someone else figures out memory injection at 2am. A third person spends a week benchmarking ReAct vs LATS — and never shares the results. A fourth discovers the same failure modes you already hit last month.
That collective knowledge is disappearing into Slack threads, private repos, and Twitter bookmarks.
Skills Tree fixes that.
Skills Tree is the shared operating system for AI agent capabilities.
A living, versioned, community-powered index of everything an agent can do — at its best, documented with working code, real benchmarks, failure modes, and evolution history.
We don't pretend every entry is finished. Battle-tested skills (badged 🟢 verified) are production-ready and copy-paste safe. Yellow / unscanned skills are the community's TODO list — open files, real problem space, and the clearest signal of where contributions are most useful.
It's not a list. It's infrastructure being built in public.
If you're new, read these first. Each one ships with runnable code, typed I/O, failure modes, and a model-comparison table.
- ReAct — Thought → Action → Observation, the foundation of tool-using agents
- Chain of Thought — explicit step-by-step reasoning + self-consistency
- Tree of Thought — branched reasoning with scoring + beam search
- Reflection / Reflexion — critique → revise loop on top of any output
- Self-Consistency — sample N chains, majority-vote
- Planning — typed, DAG-validated plans your executor can run
- Task Decomposition — break a goal into atomic, runnable subtasks
- RAG — chunk → embed → retrieve → cite, end-to-end with confidence + threshold
- Vector Store Retrieval — typed top-k cosine search with metadata filtering
- Embedding Generation — batched, content-hash-cached, Matryoshka-truncatable
- Memory Injection — top-K user memories per turn
- Short-Term Memory — token-budgeted rolling window (the foundation for everything else)
- Function / Tool Calling — the primitive that turns an LLM into an agent
- OpenAI API — chat, structured outputs, tools, embeddings, streaming, retry
- Anthropic API — Claude with tool loop, prompt caching, streaming
- Translation — placeholder-safe MT with glossary + tone
- Paraphrasing — simplify / formalize / diversify
- OCR — VLM + classical OCR with confidence-based human-review routing
- Code Generation — spec → AST-validated source with self-repair on failure
- Bug Fixing — agentic loop: read → patch → test → repeat until green
- Code Review — automated critique with severity tiers
- Web Search — Tavily/Serper/Brave with recency + host allowlist + TTL cache
- Web Scraping — trafilatura + BS4 fallback, metadata, redirect-safe
- Input Sanitization — 4-layer defense: structural + boundary + content + isolation
- File Write — atomic, crash-safe file writes for agents
- HTTP Request — production HTTP with idempotency, retry-on-idempotent-only, header redaction
- Dependency Auditor — vulnerability + license + freshness audit
The full battle-tested set is auto-listed in
meta/QUALITY-REPORT.md. The same report names every stub that needs upgrading — those are the highest-impact PRs you can submit.
skills-tree/
│
├── skills/ → 360 atomic skill files (47 battle-tested, 313 stubs awaiting upgrade)
│ run `python3 tools/check_skill_quality.py` for the live count
├── systems/ → Multi-skill workflows (research agent, code reviewer...)
├── blueprints/ → Copy-paste production architectures
├── benchmarks/ → Head-to-head, reproducible skill comparisons
├── labs/ → Experimental & bleeding-edge capabilities
│
├── docs/ → Interactive web UI (GitHub Pages)
├── i18n/ → Localized READMEs (Arabic, Chinese, Spanish, German, French, Hindi, Japanese, Korean, Portuguese, Russian)
├── meta/ → Schema, glossary, frameworks, roadmap, changelog
└── requirements.txt → Pinned Python deps for CI workflows
| # | Category | Skills | What It Covers |
|---|---|---|---|
| 01 | 👁️ Perception | 36 | Text, images, PDFs, code, sensors, databases, screens |
| 02 | 🧠 Reasoning | 39 | Planning, deduction, abduction, causal chains, commonsense |
| 03 | 🗄️ Memory | 19 | Working, episodic, semantic, vector, injection, forgetting |
| 04 | ⚡ Action Execution | 21 | File I/O, HTTP, email, shell, database writes |
| 05 | 💻 Code | 28 | Write, run, debug, review, refactor, test, deploy |
| 06 | 💬 Communication | 15 | Summarize, translate, draft, argue, adapt tone |
| 07 | 🔧 Tool Use | 32 | APIs — GitHub, Slack, Stripe, OpenAI, MCP, A2A |
| 08 | 🎭 Multimodal | 14 | Images, audio, video, VQA, 3D, charts |
| 09 | 🤖 Agentic Patterns | 23 | ReAct, CoT, ToT, MCTS, LATS, RAG, Debate |
| 10 | 🖥️ Computer Use | 20 | Click, type, scroll, OCR, terminal, VM, a11y tree |
| 11 | 🌐 Web | 17 | Search, scrape, crawl, login, fill forms, parse RSS |
| 12 | 📊 Data | 18 | ETL, SQL, embeddings, time series, anomaly detection |
| 13 | 🎨 Creative | 14 | Copywriting, image prompts, SVG, music, scripts |
| 14 | 🔒 Security | 13 | Sandboxing, secret scanning, audit logs, rollback |
| 15 | 🎼 Orchestration | 22 | Multi-agent, state machines, retry, consensus |
| 16 | 🏺 Domain-Specific | 28 | Medical, legal, finance, DevOps, education, science |
| 17 | 🛠️ Infrastructure | 1 | Dependency auditing & supply-chain tooling (early) |
Counts above reflect skill files on disk and are auto-synced by
tools/update_readme_counts.py(run nightly viaupdate-skill-count.yml). If you spot a drift, open an issue.
Every skill file is self-contained and production-ready:
# Memory Injection
Category: memory | Level: intermediate | Stability: stable | Version: v2
## Description
Dynamically inject relevant past memories into an agent's system prompt
before each turn — giving the model user context without filling the window.
## Example
```python
client.messages.create(
system=f"{base_system}\n\n## Memory\n{top_k_memories}",
messages=[{"role": "user", "content": user_message}]
)
```
## Benchmarks → benchmarks/memory/injection-strategies.md
## Related → working-memory.md · rag.md · vector-store-retrieval.md
## Changelog → v1 (2025-03) · v2 (2026-04, added retrieval scoring)Every skill includes:
- ✅ What it does and why it matters
- ✅ Typed inputs/outputs
- ✅ Runnable Python code (
claude-opus-4-5/gpt-4o) - ✅ Frameworks table (LangChain, LangGraph, CrewAI, mem0...)
- ✅ Failure modes and edge cases
- ✅ Related skills cross-links
- ✅ Version history
Skills are not static files. They evolve as the community learns:
v1 — Initial entry: description + minimal example
v2 — Enriched: better example + failure modes + related skills
v3 — Battle-tested: benchmarks + model comparison + production notes
To upgrade a skill:
- Bump the version in frontmatter
- Add a changelog entry explaining what improved
- Open a PR titled
improve: skill-name — v1 → v2
The best versions surface naturally — through PR merge frequency and inclusion in Systems + Blueprints.
See how skills combine into real, working agent pipelines:
| System | Skills Used | Use Case |
|---|---|---|
| Research Agent | Web search + RAG + Summarize + Cite | Deep research automation |
| Coding Agent | Code reading + Write + Debug + Test | End-to-end code generation |
| Code Reviewer | Code reading + Reasoning + Comment gen | Automated PR reviews |
| Data Pipeline Agent | DB reading + ETL + Anomaly detection | Automated data ops |
| Customer Support Bot | Memory injection + Intent + Response gen | Personalized support |
| Computer Use Agent | Screen reading + OCR + Click + Type | Full GUI automation |
| Data Analyst | SQL + Charts + Summarize + Insight gen | Automated data analysis |
| Voice Agent | Audio transcription + NLU + TTS | Real-time voice interaction |
Copy-paste architectures for the most common agent patterns:
| Blueprint | Description |
|---|---|
| RAG Stack | Embed → store → retrieve → generate, fully wired |
| Multi-Agent Workflow | Sequential orchestration with handoffs |
| Multi-Agent Mesh | N specialists + orchestrator, parallel execution |
| Computer Use Browser | Browser automation via Playwright + vision |
| Human-in-the-Loop | Approval gates, escalation, audit trails |
| Self-Healing Agent | Error detection, retry logic, rollback |
| Memory-First Agent | Profile + episodic + vector memory combined |
We test so you don't have to:
| Benchmark | Winner | Margin | Link |
|---|---|---|---|
| ReAct vs LATS (HotpotQA) | LATS | +8.3% accuracy | → |
| RAG retrieval strategies | HyDE | +12% recall | → |
| Memory injection methods | Top-K semantic | Best cost/quality ratio | → |
| Function calling comparison | Claude 3.7 | +6% on tool accuracy | → |
Every benchmark includes methodology, dataset, and reproducible test scripts.
Auto-updated weekly · Full leaderboard →
🔥 Most Active Skills
skills/09-agentic-patterns/react.md— 12 community improvements this monthskills/03-memory/memory-injection.md— v2 with retrieval scoringskills/02-reasoning/causal.md— new benchmark comparison added
⚡ Battle-Tested (used in 10+ public projects)
ReAct · Chain of Thought · RAG Pipeline · Memory Injection · Tool Use
🔬 Hot in Labs
labs/reasoning/tree-of-agents.md— multi-agent tree searchlabs/memory/episodic-compression.md— lossy-but-useful memory compressionlabs/tool-use/adaptive-tool-selection.md— dynamic tool filtering for large registries
Four types of contributions — all valued:
| Type | What It Is | PR Title Format |
|---|---|---|
| New Skill | A capability not yet indexed | feat: add [skill] to [category] |
| Skill Upgrade | Bump v1→v2 with better content | improve: [skill] — v1→v2 |
| Benchmark | Head-to-head with real numbers | benchmark: [skill-a] vs [skill-b] |
| System / Blueprint | Multi-skill workflow or architecture | system: add [name] |
git clone https://github.com/SamoTech/skills-tree.git
cp meta/skill-template.md skills/05-code/my-new-skill.md
# Fill in every section → open a PR- ❌ No generic prompts or vague descriptions
- ❌ No skills without a working code example
- ✅ Must solve a real, specific problem
- ✅ Must be structured and reusable
- ✅ Must include inputs, outputs, and at least one runnable example
Full guide: CONTRIBUTING.md
# Clone
git clone https://github.com/SamoTech/skills-tree.git
# Find a skill by keyword
grep -r "memory injection" skills/ --include="*.md" -l
# Read a full system end-to-end
cat systems/research-agent.md
# See benchmark results
cat benchmarks/tool-use/function-calling-comparison.md🏗️ Agent Builders → Production skill patterns, ready to use today
🔬 AI Researchers → Benchmarks, taxonomy, and full capability coverage
📐 System Architects → Blueprints for multi-agent production systems
🎓 Learners → Structured path from basic skills → advanced systems
🤝 Contributors → A community that improves everything together
See the full plan: meta/ROADMAP.md
Near-term (v2.x):
- Skill dependency graph — visual map of how skills relate
- Skill Paths — curated learning tracks (e.g., "Build a Research Agent in 5 skills")
- JSON/YAML export of all skill metadata for programmatic use
- Community skill ratings and upvotes
- Auto-leaderboard: Top Skills This Week, Most Improved, Battle-Tested
Medium-term (v3.0):
- CLI:
skills-tree search "memory injection"→ returns ranked results - LangChain Hub / MCP registry integration
- ✅
Localization: Arabic, Chinese, Spanish READMEs— shipped in v2.1 - Automated changelog generation on PR merge
Long-term vision:
- Skills Tree becomes the canonical reference for AI agent capabilities
- Every major agent framework links here as the skill index
- 1000+ skills, all battle-tested, all benchmarked
AI agents are becoming teammates, not tools.
Skills Tree is the shared foundation they run on — a living OS of capabilities that the community builds, tests, and evolves together.
Every skill added here saves every agent builder who comes after you. Every benchmark run here prevents someone else from wasting a week. Every system documented here becomes a launchpad for the next builder.
This is not a repo. It's infrastructure for the AI-native era.
⭐ Star this repo · 🌐 Browse Skills · 🤝 Contribute · 🗺 Roadmap · 💖 Sponsor
The AI Agent Skill OS — built by the community, for the community.