Skills Tree

📆 This Week's Highlights — April 27, 2026

✨ New Skills

feat: 10 more battle-tested skills (RAG, embeddings, codegen, web, security) + fix Devin Review on #61
feat: real-skills audit — accuracy fixes, 10 production rewrites, quality CI

The AI Agent Skill OS — Build Smarter Agents, Faster

360 skills across 17 categories. Versioned, benchmarked, and openly evolving.
Stop rediscovering. Start building on what the community has already proven.

47 skills are battle-tested today. 313 are stubs waiting for a real example, real I/O, and real failure modes — see meta/QUALITY-REPORT.md for the full list. PRs that turn a stub into a production-ready entry are the highest-impact contribution you can make.

🌐 Browse Live UI · 🗺️ Systems · 🏗️ Blueprints · 📊 Benchmarks · 🔬 Labs · 🤝 Contribute · 🗺 Roadmap

🐦 Share Skills Tree on X / Twitter →

🌐 Read in your language: 🇬🇧 English · 🇸🇦 العربية · 🇨🇳 中文 · 🇪🇸 Español · 🇩🇪 Deutsch · 🇫🇷 Français · 🇮🇳 हिन्दी · 🇯🇵 日本語 · 🇰🇷 한국어 · 🇧🇷 Português · 🇷🇺 Русский

The Problem

Every AI agent builder rediscovers the same skills from scratch.

Someone learns RAG the hard way. Someone else figures out memory injection at 2am. A third person spends a week benchmarking ReAct vs LATS — and never shares the results. A fourth discovers the same failure modes you already hit last month.

That collective knowledge is disappearing into Slack threads, private repos, and Twitter bookmarks.

Skills Tree fixes that.

What This Is

Skills Tree is the shared operating system for AI agent capabilities.

A living, versioned, community-powered index of everything an agent can do — at its best, documented with working code, real benchmarks, failure modes, and evolution history.

We don't pretend every entry is finished. Battle-tested skills (badged 🟢 verified) are production-ready and copy-paste safe. Yellow / unscanned skills are the community's TODO list — open files, real problem space, and the clearest signal of where contributions are most useful.

It's not a list. It's infrastructure being built in public.

🚀 Start Here — Battle-Tested Skills

If you're new, read these first. Each one ships with runnable code, typed I/O, failure modes, and a model-comparison table.

Agent reasoning loops

ReAct — Thought → Action → Observation, the foundation of tool-using agents
Chain of Thought — explicit step-by-step reasoning + self-consistency
Tree of Thought — branched reasoning with scoring + beam search
Reflection / Reflexion — critique → revise loop on top of any output
Self-Consistency — sample N chains, majority-vote
Planning — typed, DAG-validated plans your executor can run
Task Decomposition — break a goal into atomic, runnable subtasks

Retrieval & memory

RAG — chunk → embed → retrieve → cite, end-to-end with confidence + threshold
Vector Store Retrieval — typed top-k cosine search with metadata filtering
Embedding Generation — batched, content-hash-cached, Matryoshka-truncatable
Memory Injection — top-K user memories per turn
Short-Term Memory — token-budgeted rolling window (the foundation for everything else)

Calling LLMs in production

Function / Tool Calling — the primitive that turns an LLM into an agent
OpenAI API — chat, structured outputs, tools, embeddings, streaming, retry
Anthropic API — Claude with tool loop, prompt caching, streaming

Working with text

Translation — placeholder-safe MT with glossary + tone
Paraphrasing — simplify / formalize / diversify
OCR — VLM + classical OCR with confidence-based human-review routing

Code

Code Generation — spec → AST-validated source with self-repair on failure
Bug Fixing — agentic loop: read → patch → test → repeat until green
Code Review — automated critique with severity tiers

Web

Web Search — Tavily/Serper/Brave with recency + host allowlist + TTL cache
Web Scraping — trafilatura + BS4 fallback, metadata, redirect-safe

Security

Input Sanitization — 4-layer defense: structural + boundary + content + isolation

Action execution

File Write — atomic, crash-safe file writes for agents
HTTP Request — production HTTP with idempotency, retry-on-idempotent-only, header redaction
Dependency Auditor — vulnerability + license + freshness audit

The full battle-tested set is auto-listed in meta/QUALITY-REPORT.md. The same report names every stub that needs upgrading — those are the highest-impact PRs you can submit.

What's Inside

skills-tree/
│
├── skills/          → 360 atomic skill files (47 battle-tested, 313 stubs awaiting upgrade)
│                     run `python3 tools/check_skill_quality.py` for the live count
├── systems/         → Multi-skill workflows (research agent, code reviewer...)
├── blueprints/      → Copy-paste production architectures
├── benchmarks/      → Head-to-head, reproducible skill comparisons
├── labs/            → Experimental & bleeding-edge capabilities
│
├── docs/            → Interactive web UI (GitHub Pages)
├── i18n/            → Localized READMEs (Arabic, Chinese, Spanish, German, French, Hindi, Japanese, Korean, Portuguese, Russian)
├── meta/            → Schema, glossary, frameworks, roadmap, changelog
└── requirements.txt → Pinned Python deps for CI workflows

🗂️ The 17 Skill Categories

#	Category	Skills	What It Covers
01	👁️ Perception	36	Text, images, PDFs, code, sensors, databases, screens
02	🧠 Reasoning	39	Planning, deduction, abduction, causal chains, commonsense
03	🗄️ Memory	19	Working, episodic, semantic, vector, injection, forgetting
04	⚡ Action Execution	21	File I/O, HTTP, email, shell, database writes
05	💻 Code	28	Write, run, debug, review, refactor, test, deploy
06	💬 Communication	15	Summarize, translate, draft, argue, adapt tone
07	🔧 Tool Use	32	APIs — GitHub, Slack, Stripe, OpenAI, MCP, A2A
08	🎭 Multimodal	14	Images, audio, video, VQA, 3D, charts
09	🤖 Agentic Patterns	23	ReAct, CoT, ToT, MCTS, LATS, RAG, Debate
10	🖥️ Computer Use	20	Click, type, scroll, OCR, terminal, VM, a11y tree
11	🌐 Web	17	Search, scrape, crawl, login, fill forms, parse RSS
12	📊 Data	18	ETL, SQL, embeddings, time series, anomaly detection
13	🎨 Creative	14	Copywriting, image prompts, SVG, music, scripts
14	🔒 Security	13	Sandboxing, secret scanning, audit logs, rollback
15	🎼 Orchestration	22	Multi-agent, state machines, retry, consensus
16	🏺 Domain-Specific	28	Medical, legal, finance, DevOps, education, science
17	🛠️ Infrastructure	1	Dependency auditing & supply-chain tooling (early)

Counts above reflect skill files on disk and are auto-synced by tools/update_readme_counts.py (run nightly via update-skill-count.yml). If you spot a drift, open an issue.

A Skill in 60 Seconds

Every skill file is self-contained and production-ready:

# Memory Injection
Category: memory | Level: intermediate | Stability: stable | Version: v2

## Description
Dynamically inject relevant past memories into an agent's system prompt
before each turn — giving the model user context without filling the window.

## Example
```python
client.messages.create(
    system=f"{base_system}\n\n## Memory\n{top_k_memories}",
    messages=[{"role": "user", "content": user_message}]
)
```

## Benchmarks  → benchmarks/memory/injection-strategies.md
## Related     → working-memory.md · rag.md · vector-store-retrieval.md
## Changelog   → v1 (2025-03) · v2 (2026-04, added retrieval scoring)

Every skill includes:

✅ What it does and why it matters
✅ Typed inputs/outputs
✅ Runnable Python code (claude-opus-4-5 / gpt-4o)
✅ Frameworks table (LangChain, LangGraph, CrewAI, mem0...)
✅ Failure modes and edge cases
✅ Related skills cross-links
✅ Version history

Skill Versioning — How Evolution Works

Skills are not static files. They evolve as the community learns:

v1 — Initial entry: description + minimal example
v2 — Enriched: better example + failure modes + related skills
v3 — Battle-tested: benchmarks + model comparison + production notes

To upgrade a skill:

Bump the version in frontmatter
Add a changelog entry explaining what improved
Open a PR titled improve: skill-name — v1 → v2

The best versions surface naturally — through PR merge frequency and inclusion in Systems + Blueprints.

🗺️ Systems — Multi-Skill Workflows

See how skills combine into real, working agent pipelines:

System	Skills Used	Use Case
Research Agent	Web search + RAG + Summarize + Cite	Deep research automation
Coding Agent	Code reading + Write + Debug + Test	End-to-end code generation
Code Reviewer	Code reading + Reasoning + Comment gen	Automated PR reviews
Data Pipeline Agent	DB reading + ETL + Anomaly detection	Automated data ops
Customer Support Bot	Memory injection + Intent + Response gen	Personalized support
Computer Use Agent	Screen reading + OCR + Click + Type	Full GUI automation
Data Analyst	SQL + Charts + Summarize + Insight gen	Automated data analysis
Voice Agent	Audio transcription + NLU + TTS	Real-time voice interaction

🏗️ Blueprints — Production Architectures

Copy-paste architectures for the most common agent patterns:

Blueprint	Description
RAG Stack	Embed → store → retrieve → generate, fully wired
Multi-Agent Workflow	Sequential orchestration with handoffs
Multi-Agent Mesh	N specialists + orchestrator, parallel execution
Computer Use Browser	Browser automation via Playwright + vision
Human-in-the-Loop	Approval gates, escalation, audit trails
Self-Healing Agent	Error detection, retry logic, rollback
Memory-First Agent	Profile + episodic + vector memory combined

📊 Benchmarks — Real Numbers, Reproducible

We test so you don't have to:

Benchmark	Winner	Margin	Link
ReAct vs LATS (HotpotQA)	LATS	+8.3% accuracy	→
RAG retrieval strategies	HyDE	+12% recall	→
Memory injection methods	Top-K semantic	Best cost/quality ratio	→
Function calling comparison	Claude 3.7	+6% on tool accuracy	→

Every benchmark includes methodology, dataset, and reproducible test scripts.

🏆 This Week's Highlights

Auto-updated weekly · Full leaderboard →

🔥 Most Active Skills

skills/09-agentic-patterns/react.md — 12 community improvements this month
skills/03-memory/memory-injection.md — v2 with retrieval scoring
skills/02-reasoning/causal.md — new benchmark comparison added

⚡ Battle-Tested (used in 10+ public projects) ReAct · Chain of Thought · RAG Pipeline · Memory Injection · Tool Use

🔬 Hot in Labs

labs/reasoning/tree-of-agents.md — multi-agent tree search
labs/memory/episodic-compression.md — lossy-but-useful memory compression
labs/tool-use/adaptive-tool-selection.md — dynamic tool filtering for large registries

🤝 How to Contribute

Four types of contributions — all valued:

Type	What It Is	PR Title Format
New Skill	A capability not yet indexed	`feat: add [skill] to [category]`
Skill Upgrade	Bump v1→v2 with better content	`improve: [skill] — v1→v2`
Benchmark	Head-to-head with real numbers	`benchmark: [skill-a] vs [skill-b]`
System / Blueprint	Multi-skill workflow or architecture	`system: add [name]`

git clone https://github.com/SamoTech/skills-tree.git
cp meta/skill-template.md skills/05-code/my-new-skill.md
# Fill in every section → open a PR

Quality Rules

❌ No generic prompts or vague descriptions
❌ No skills without a working code example
✅ Must solve a real, specific problem
✅ Must be structured and reusable
✅ Must include inputs, outputs, and at least one runnable example

Full guide: CONTRIBUTING.md

Quick Start

# Clone
git clone https://github.com/SamoTech/skills-tree.git

# Find a skill by keyword
grep -r "memory injection" skills/ --include="*.md" -l

# Read a full system end-to-end
cat systems/research-agent.md

# See benchmark results
cat benchmarks/tool-use/function-calling-comparison.md

Or browse the live UI →

Who This Is For

🏗️  Agent Builders       → Production skill patterns, ready to use today
🔬  AI Researchers        → Benchmarks, taxonomy, and full capability coverage
📐  System Architects     → Blueprints for multi-agent production systems
🎓  Learners              → Structured path from basic skills → advanced systems
🤝  Contributors          → A community that improves everything together

🗺️ Roadmap

See the full plan: meta/ROADMAP.md

Near-term (v2.x):

Skill dependency graph — visual map of how skills relate
Skill Paths — curated learning tracks (e.g., "Build a Research Agent in 5 skills")
JSON/YAML export of all skill metadata for programmatic use
Community skill ratings and upvotes
Auto-leaderboard: Top Skills This Week, Most Improved, Battle-Tested

Medium-term (v3.0):

CLI: skills-tree search "memory injection" → returns ranked results
LangChain Hub / MCP registry integration
✅ ~~Localization: Arabic, Chinese, Spanish READMEs~~ — shipped in v2.1
Automated changelog generation on PR merge

Long-term vision:

Skills Tree becomes the canonical reference for AI agent capabilities
Every major agent framework links here as the skill index
1000+ skills, all battle-tested, all benchmarked

Vision

AI agents are becoming teammates, not tools.

Skills Tree is the shared foundation they run on — a living OS of capabilities that the community builds, tests, and evolves together.

Every skill added here saves every agent builder who comes after you. Every benchmark run here prevents someone else from wasting a week. Every system documented here becomes a launchpad for the next builder.

This is not a repo. It's infrastructure for the AI-native era.

⭐ Star this repo · 🌐 Browse Skills · 🤝 Contribute · 🗺 Roadmap · 💖 Sponsor

The AI Agent Skill OS — built by the community, for the community.

Name		Name	Last commit message	Last commit date
Latest commit History 256 Commits
.github		.github
assets		assets
badges		badges
benchmarks		benchmarks
blueprints		blueprints
docs		docs
i18n		i18n
labs		labs
meta		meta
paths		paths
public/badges		public/badges
scripts		scripts
skills		skills
systems		systems
tools		tools
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
SPONSORS.md		SPONSORS.md
osv-scanner.toml		osv-scanner.toml
requirements.txt		requirements.txt

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Skills Tree

📆 This Week's Highlights — April 27, 2026

✨ New Skills

The AI Agent Skill OS — Build Smarter Agents, Faster

The Problem

What This Is

🚀 Start Here — Battle-Tested Skills

Agent reasoning loops

Retrieval & memory

Calling LLMs in production

Working with text

Code

Web

Security

Action execution

What's Inside

🗂️ The 17 Skill Categories

A Skill in 60 Seconds

Skill Versioning — How Evolution Works

🗺️ Systems — Multi-Skill Workflows

🏗️ Blueprints — Production Architectures

📊 Benchmarks — Real Numbers, Reproducible

🏆 This Week's Highlights

🤝 How to Contribute

Quality Rules

Quick Start

Who This Is For

🗺️ Roadmap

Vision

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages