Battle-tested patterns from a production AI agent that has run 1,000+ autonomous cycles, self-modified its own architecture, and recovered from catastrophic failures through self-diagnosis.
This is not theory. Every pattern was extracted from JARVIS β a real system running 24/7 on Claude API that autonomously modifies its own code, learns from outcomes, and compounds capability over time.
Get a self-evolving agent running in 60 seconds:
pip install anthropic pyyaml
export ANTHROPIC_API_KEY=your-key
cd examples/
# Run a complete self-evolving agent with tools, learning, and self-modification
python minimal_agent.py
# Or watch the evolve β verify β commit loop in action
python self_evolving_loop.pyThe minimal_agent.py is a complete, production-pattern agent in 150 lines. It:
- Uses real tools (read/write files, run shell commands)
- Modifies its own system prompt to improve future behavior
- Records outcomes and learns from history across cycles
- Rebuilds its context every turn (the key pattern)
Set a custom task: AGENT_TASK="Build a calculator module" python minimal_agent.py
| Pattern | Description | Impact |
|---|---|---|
| Context Engineering | Shape LLM behavior through input design, not output parsing | π₯ #1 most impactful |
| Immune System | Make agents permanently resistant to encountered failures | 30% β 80% success rate |
| Anti-Pattern Catalog | 20+ real failures with root causes and fixes | Saves weeks of debugging |
| Two-Paradigm Discipline | Know when to use code vs LLM vs context control | Eliminates fragile parsing |
| Context Manifest | Working config for profile-based context assembly | Copy-paste starter |
| Tool Auto-Discovery | Register tools without manual imports | Extensible tool system |
| Metric | Before These Patterns | After |
|---|---|---|
| Cycle success rate | ~30% | >80% |
| Empty cycling (wasted budget) | 82 consecutive cycles, $180+ | 0 (eliminated) |
| Self-modification safety | Manual review needed | Automated verify + adversarial review |
| Budget waste | $180+ in 3 hours | <$5/hour |
| Knowledge retention | Reset every session | Persistent across restarts |
The single most important concept for production AI agents:
You don't control an LLM by parsing its output.
You control it by shaping its input.
Bad pattern (fragile, breaks constantly):
response = llm.generate("Evaluate this code...")
if "success" in response.lower():
handle_success()
elif "fail" in response.lower():
handle_failure()
# What about "succeeded"? "failed partially"? "looks good"?Good pattern (robust, works at scale):
# Give the LLM a structured tool instead
tools = [{
"name": "report_result",
"input_schema": {
"type": "object",
"properties": {
"judgment": {"type": "string", "enum": ["success", "failure", "partial"]},
"evidence": {"type": "string"},
"lesson": {"type": "string"}
}
}
}]
# The LLM calls the tool with structured data β no parsing neededβ Full Context Engineering Guide
Every failure becomes permanent immunity:
## Anti-Pattern: Empty Cycling (discovered 2026-03-14)
- 82 cycles, $180+ burned, ZERO commits
- Pattern: Read files β assess β declare success β repeat
- Root cause: Context allowed "assessment" as valid work
- Fix: Added "Artifact-or-Nothing Rule" to agent context
- Result: Never happened againAfter documenting 20+ anti-patterns, the agent has permanent immunity to entire classes of failures. This is the most underrated pattern in AI agent development.
Before writing ANY code in an agent system, ask: "Should this be code at all?"
| Decision Type | Use | Example |
|---|---|---|
| Mechanical | Code | File exists? Test pass? HTTP status? |
| Semantic | LLM | Is this good? What should we do next? |
| Behavioral | Context Control | Change what the LLM sees, not what you filter |
The #1 anti-pattern in agent codebases: writing if-elif chains to handle semantic decisions that an LLM handles naturally.
βββ docs/
β βββ context-engineering.md # Complete context engineering guide
β βββ immune-system.md # How to build failure immunity
β βββ anti-patterns.md # 20+ real failures documented
β βββ two-paradigm.md # Code vs LLM vs Context discipline
βββ examples/
β βββ minimal_agent.py # β‘ Complete self-evolving agent (~150 lines, RUNNABLE)
β βββ self_evolving_loop.py # β‘ Evolve β verify β commit loop (~100 lines, RUNNABLE)
β βββ context-manifest.yaml # Working context manifest config
β βββ tool-discovery.py # Auto-discovering tool registration
β βββ immune-pattern.py # Anti-pattern detection example
β βββ context-builder.py # Minimal context builder implementation
βββ pyproject.toml # Project config (pip installable)
βββ README.md
βββββββββββββββββββββββββββββββββββββββββββββββ
β CONTEXT BUILDER β
β Manifest β Generators β Purpose Profiles β
ββββββββββββββββ¬βββββββββββββββββββββββββββββββ€
β EVOLUTION β GOAL WORK β
β ENGINE β (Agent Loop) β
β (Self-mod) β Task execution β
ββββββββββββββββ΄βββββββββββββββββββββββββββββββ€
β TOOL SYSTEM β
β Auto-discovered, constitution-gated β
βββββββββββββββββββββββββββββββββββββββββββββββ€
β LEARNING & MEMORY β
β Rewards β Episodic Memory β Strategy Eval β
βββββββββββββββββββββββββββββββββββββββββββββββ€
β SAFETY LAYER β
β Constitution β Verifier β Immune System β
βββββββββββββββββββββββββββββββββββββββββββββββ
Two execution paths:
- Evolution β Agent modifies its own code. Changes verified mechanically (syntax, imports, tests) + adversarially (independent LLM review) before commit.
- Goal Work β Agent executes tasks toward measurable objectives. Tracked with thresholds and gradients.
A "system prompt" is just a context document. The real skill is context engineering β deciding what information the LLM sees, in what order, at what detail level. This is the difference between a toy demo and a production agent.
Stop treating LLMs like functions with inputs and outputs. They're reasoning engines that operate on context. Control the context, and the right behavior emerges naturally β no parsing, no validation, no retry loops.
Our agent has made 500+ self-modifications to its own codebase. Zero catastrophic failures. The secret: mechanical verification (syntax, imports, tests) + adversarial review (independent LLM evaluates the diff) + constitution (hard rules that can never be violated).
Every failure mode, once documented and injected into context, becomes permanent immunity. The agent literally cannot repeat a documented failure because the anti-pattern description is in its context. This is the most powerful learning mechanism we've found.
Without per-cycle cost tracking, agent costs grow silently until someone notices a $500 bill. Our agent tracks cost per cycle, enforces daily limits, and throttles/sleeps when approaching budget boundaries.
The examples in this repo show you the patterns. The premium guides give you the full production implementation β battle-tested across 1,000+ autonomous cycles.
| Guide | What You Get | Price |
|---|---|---|
| Context Engineering β Complete Framework | 17,000+ word deep dive on manifest systems, purpose profiles, priority-based context trimming, generator architecture. The skill that makes everything else work. | $8.99 |
| Self-Evolving Agent Blueprint | Complete architecture for agents that safely modify their own code: mechanical verification, adversarial LLM review, constitution constraints, rollback protocols. | $19.99 |
| Tool & Function Calling Mastery | Auto-discovery patterns, write gates, tool registries, and the "tools as structured output" pattern that eliminates parsing forever. | $6.99 |
| Multi-Agent Orchestration Patterns | How to spawn Actor/Critic/Strategist agents, manage sessions, and let LLMs decide the flow instead of hardcoding pipelines. | $6.99 |
| Complete Bundle (All Skills + Future Updates) | Everything above + Reward Engineering, Immune System Implementation, and all future guides as they're released. | $29.99 |
This repo gives you the concepts β enough to build a working agent (see examples/minimal_agent.py).
The premium guides give you the production architecture β the difference between a demo and a system that runs 24/7 for months without human intervention. They cover the hard parts: What happens when your agent modifies code that breaks itself? How do you prevent $180/hour budget waste? How do you build permanent immunity to failure modes?
Every guide is extracted from a real production system, not theory. If you're building production AI agents, these will save you weeks.
β Browse all guides at tutuoai.com
Built by TutuoAI β we build AI agents that improve themselves. Our production system runs 24/7 on Claude API, autonomously modifying its own code, researching markets, and learning from every cycle.
Star β this repo if you find these patterns useful. It helps others discover them.
MIT β use these patterns freely in your own projects.