Summary
Combine ClawCode's dreaming system with Karpathy's autoresearch pattern to create a self-improving knowledge agent that autonomously researches gaps detected during the day, validates findings, and only promotes verified knowledge to long-term memory.
Current dreaming promotes memories based on recall frequency. This proposal adds an active research loop during the REM phase that fills knowledge gaps and verifies information before promotion — shifting from "remember what was searched often" to "learn what was missing and verify it's correct."
Motivation
Today's dreaming system is passive — it only consolidates what the agent already encountered. But during daily use, the agent frequently hits knowledge gaps (low-score searches, unanswered questions, partial results). These gaps are detected but never acted upon.
Karpathy's autoresearch showed that a simple propose→test→keep/discard loop can run hundreds of validated improvements overnight with zero human intervention. The same pattern applies to knowledge consolidation:
|
AutoResearch (Karpathy) |
Dreaming + AutoResearch |
| Optimizes |
val_bpb (training metric) |
Knowledge coverage & accuracy |
| Modifies |
train.py (code) |
MEMORY.md (knowledge) |
| Validation |
Did loss improve? |
Is it verifiable and correct? |
| Keep/Discard |
git commit / git revert |
promote / discard snippet |
| Loop trigger |
Continuous (while true) |
Nightly (3 AM, integrated with dreaming) |
Proposed Design
Gap Detection (during day)
Extend trackRecall to also track gaps — searches that returned zero or low-quality results:
interface KnowledgeGap {
query: string;
timestamp: string;
resultCount: number;
maxScore: number; // 0 if no results
gapType: "no_results" | "low_confidence" | "partial_match";
}
Store in memory/.dreams/knowledge-gaps.json, alongside existing short-term-recall.json.
Enhanced REM Phase — Research Loop
During the REM phase, for each detected gap:
for each gap in knowledge-gaps.json (sorted by frequency):
1. RESEARCH — gather information
├─ WebSearch (if available)
├─ Codebase search (Grep/Read relevant files)
├─ Documentation lookup
└─ Cross-reference with existing memory
2. VALIDATE — verify findings
├─ Multi-source confirmation
├─ Code cross-reference (if domain is code-related)
└─ Assign confidence_score (0.0 - 1.0)
3. EVALUATE — keep or discard
├─ confidence >= 0.7 → KEEP → stage as Deep phase candidate
└─ confidence < 0.7 → DISCARD → log reason in DREAMS.md
Research Candidate Schema
interface ResearchCandidate {
query: string; // the original gap
sources: string[]; // where info was found
snippet: string; // synthesized knowledge
confidenceScore: number; // 0-1
validationMethod:
| "code-crossref" // confirmed against codebase
| "web-confirm" // confirmed via web search
| "multi-source" // multiple sources agree
| "single-source"; // only one source (lower trust)
kept: boolean;
discardReason?: string; // why it was discarded
}
Deep Phase Integration
Research candidates that pass validation feed into the existing Deep phase scoring alongside recall-based candidates. They get an additional signal:
| Signal |
Weight |
Description |
| Research confidence |
bonus |
Verified knowledge gets a scoring boost |
DREAMS.md Output
## REM Research — 2026-04-13 03:00
### Investigated 3 knowledge gaps
1. ✅ "navitaire change flight endpoint" (confidence: 0.89)
- Sources: codebase (bookingModification.service.ts), web docs
- Learned: POST /api/nsk/v4/booking/flights with {journeyKey, fareKey}
- Validation: code cross-reference confirmed
2. ✅ "409 error re-quote" (confidence: 0.92)
- Sources: codebase (navitaire.service.ts), error logs
- Learned: 409 occurs when PNR has active lock
- Validation: code pattern confirmed
3. ❌ "ancillary pricing connections" (confidence: 0.45) — DISCARDED
- Reason: insufficient sources, web results contradictory
- Action: needs user clarification
Implementation Considerations
What stays deterministic
- Gap detection (trackRecall extension)
- Confidence scoring formula
- Keep/discard threshold
- Deep phase integration
What requires LLM
- Research synthesis (generating snippets from sources)
- Validation reasoning (why something is/isn't trustworthy)
- Dream diary narrative
Configuration
{
"dreaming": {
"autoresearch": {
"enabled": false,
"maxGapsPerNight": 5,
"confidenceThreshold": 0.7,
"sources": ["codebase", "memory", "web"],
"maxResearchTimeMinutes": 10
}
}
}
Safety constraints
- Read-only by default — research never modifies code or external systems
- Confidence gating — only verified knowledge gets promoted
- Rate limiting — cap on API calls per dream cycle
- Transparency — everything logged in DREAMS.md for human review
- Opt-in — disabled by default, like dreaming itself
Compound Effect
Day 1: Agent learns about endpoint X. Day 5: when researching related topic Y, it already has context from day 1. Knowledge compounds like interest — each night's research builds on previous nights.
Over time, the agent develops domain expertise autonomously, filling gaps the user never explicitly taught it.
References
- karpathy/autoresearch — the original propose→test→keep/discard loop
- Karpathy on the future of autoresearch — "emulate a research community"
- ClawCode's existing dreaming system (
lib/dreaming.ts) — the foundation this builds on
- OpenClaw's dreaming docs (
docs/concepts/dreaming.md) — the upstream reference
Open Questions
- Should research candidates go through the same 6-signal scoring as recall candidates, or have their own scoring pipeline?
- Should the agent be allowed to use subagents for parallel research (like OpenClaw's narrative generation)?
- How to handle research that contradicts existing MEMORY.md entries? (knowledge correction)
- Should there be a "research backlog" for gaps that couldn't be resolved, similar to
IMPORT_BACKLOG.md?
Happy to contribute an implementation PR if there's interest. This feels like a natural evolution of the dreaming system.
Summary
Combine ClawCode's dreaming system with Karpathy's autoresearch pattern to create a self-improving knowledge agent that autonomously researches gaps detected during the day, validates findings, and only promotes verified knowledge to long-term memory.
Current dreaming promotes memories based on recall frequency. This proposal adds an active research loop during the REM phase that fills knowledge gaps and verifies information before promotion — shifting from "remember what was searched often" to "learn what was missing and verify it's correct."
Motivation
Today's dreaming system is passive — it only consolidates what the agent already encountered. But during daily use, the agent frequently hits knowledge gaps (low-score searches, unanswered questions, partial results). These gaps are detected but never acted upon.
Karpathy's autoresearch showed that a simple propose→test→keep/discard loop can run hundreds of validated improvements overnight with zero human intervention. The same pattern applies to knowledge consolidation:
Proposed Design
Gap Detection (during day)
Extend
trackRecallto also track gaps — searches that returned zero or low-quality results:Store in
memory/.dreams/knowledge-gaps.json, alongside existingshort-term-recall.json.Enhanced REM Phase — Research Loop
During the REM phase, for each detected gap:
Research Candidate Schema
Deep Phase Integration
Research candidates that pass validation feed into the existing Deep phase scoring alongside recall-based candidates. They get an additional signal:
DREAMS.md Output
Implementation Considerations
What stays deterministic
What requires LLM
Configuration
{ "dreaming": { "autoresearch": { "enabled": false, "maxGapsPerNight": 5, "confidenceThreshold": 0.7, "sources": ["codebase", "memory", "web"], "maxResearchTimeMinutes": 10 } } }Safety constraints
Compound Effect
Day 1: Agent learns about endpoint X. Day 5: when researching related topic Y, it already has context from day 1. Knowledge compounds like interest — each night's research builds on previous nights.
Over time, the agent develops domain expertise autonomously, filling gaps the user never explicitly taught it.
References
lib/dreaming.ts) — the foundation this builds ondocs/concepts/dreaming.md) — the upstream referenceOpen Questions
IMPORT_BACKLOG.md?Happy to contribute an implementation PR if there's interest. This feels like a natural evolution of the dreaming system.