Feature: Dreaming + AutoResearch — autonomous knowledge consolidation with verification

## Summary

Combine ClawCode's dreaming system with Karpathy's [autoresearch](https://github.com/karpathy/autoresearch) pattern to create a **self-improving knowledge agent** that autonomously researches gaps detected during the day, validates findings, and only promotes verified knowledge to long-term memory.

Current dreaming promotes memories based on recall frequency. This proposal adds an **active research loop** during the REM phase that fills knowledge gaps and verifies information before promotion — shifting from "remember what was searched often" to "learn what was missing and verify it's correct."

## Motivation

Today's dreaming system is **passive** — it only consolidates what the agent already encountered. But during daily use, the agent frequently hits knowledge gaps (low-score searches, unanswered questions, partial results). These gaps are detected but never acted upon.

Karpathy's autoresearch showed that a simple propose→test→keep/discard loop can run hundreds of validated improvements overnight with zero human intervention. The same pattern applies to knowledge consolidation:

| | AutoResearch (Karpathy) | Dreaming + AutoResearch |
|---|---|---|
| **Optimizes** | val_bpb (training metric) | Knowledge coverage & accuracy |
| **Modifies** | train.py (code) | MEMORY.md (knowledge) |
| **Validation** | Did loss improve? | Is it verifiable and correct? |
| **Keep/Discard** | git commit / git revert | promote / discard snippet |
| **Loop trigger** | Continuous (while true) | Nightly (3 AM, integrated with dreaming) |

## Proposed Design

### Gap Detection (during day)

Extend `trackRecall` to also track **gaps** — searches that returned zero or low-quality results:

```typescript
interface KnowledgeGap {
  query: string;
  timestamp: string;
  resultCount: number;
  maxScore: number;        // 0 if no results
  gapType: "no_results" | "low_confidence" | "partial_match";
}
```

Store in `memory/.dreams/knowledge-gaps.json`, alongside existing `short-term-recall.json`.

### Enhanced REM Phase — Research Loop

During the REM phase, for each detected gap:

```
for each gap in knowledge-gaps.json (sorted by frequency):
  1. RESEARCH — gather information
     ├─ WebSearch (if available)
     ├─ Codebase search (Grep/Read relevant files)
     ├─ Documentation lookup
     └─ Cross-reference with existing memory
  
  2. VALIDATE — verify findings
     ├─ Multi-source confirmation
     ├─ Code cross-reference (if domain is code-related)
     └─ Assign confidence_score (0.0 - 1.0)
  
  3. EVALUATE — keep or discard
     ├─ confidence >= 0.7 → KEEP → stage as Deep phase candidate
     └─ confidence < 0.7  → DISCARD → log reason in DREAMS.md
```

### Research Candidate Schema

```typescript
interface ResearchCandidate {
  query: string;                // the original gap
  sources: string[];            // where info was found
  snippet: string;              // synthesized knowledge
  confidenceScore: number;      // 0-1
  validationMethod: 
    | "code-crossref"           // confirmed against codebase
    | "web-confirm"             // confirmed via web search
    | "multi-source"            // multiple sources agree
    | "single-source";          // only one source (lower trust)
  kept: boolean;
  discardReason?: string;       // why it was discarded
}
```

### Deep Phase Integration

Research candidates that pass validation feed into the existing Deep phase scoring alongside recall-based candidates. They get an additional signal:

| Signal | Weight | Description |
|---|---|---|
| Research confidence | bonus | Verified knowledge gets a scoring boost |

### DREAMS.md Output

```markdown
## REM Research — 2026-04-13 03:00

### Investigated 3 knowledge gaps

1. ✅ "navitaire change flight endpoint" (confidence: 0.89)
   - Sources: codebase (bookingModification.service.ts), web docs
   - Learned: POST /api/nsk/v4/booking/flights with {journeyKey, fareKey}
   - Validation: code cross-reference confirmed

2. ✅ "409 error re-quote" (confidence: 0.92)  
   - Sources: codebase (navitaire.service.ts), error logs
   - Learned: 409 occurs when PNR has active lock
   - Validation: code pattern confirmed

3. ❌ "ancillary pricing connections" (confidence: 0.45) — DISCARDED
   - Reason: insufficient sources, web results contradictory
   - Action: needs user clarification
```

## Implementation Considerations

### What stays deterministic
- Gap detection (trackRecall extension)
- Confidence scoring formula
- Keep/discard threshold
- Deep phase integration

### What requires LLM
- Research synthesis (generating snippets from sources)
- Validation reasoning (why something is/isn't trustworthy)
- Dream diary narrative

### Configuration

```json
{
  "dreaming": {
    "autoresearch": {
      "enabled": false,
      "maxGapsPerNight": 5,
      "confidenceThreshold": 0.7,
      "sources": ["codebase", "memory", "web"],
      "maxResearchTimeMinutes": 10
    }
  }
}
```

### Safety constraints
- **Read-only by default** — research never modifies code or external systems
- **Confidence gating** — only verified knowledge gets promoted
- **Rate limiting** — cap on API calls per dream cycle
- **Transparency** — everything logged in DREAMS.md for human review
- **Opt-in** — disabled by default, like dreaming itself

## Compound Effect

Day 1: Agent learns about endpoint X. Day 5: when researching related topic Y, it already has context from day 1. Knowledge compounds like interest — each night's research builds on previous nights.

Over time, the agent develops **domain expertise** autonomously, filling gaps the user never explicitly taught it.

## References

- [karpathy/autoresearch](https://github.com/karpathy/autoresearch) — the original propose→test→keep/discard loop
- [Karpathy on the future of autoresearch](https://x.com/karpathy/status/2030705271627284816) — "emulate a research community"
- ClawCode's existing dreaming system (`lib/dreaming.ts`) — the foundation this builds on
- OpenClaw's dreaming docs (`docs/concepts/dreaming.md`) — the upstream reference

## Open Questions

1. Should research candidates go through the same 6-signal scoring as recall candidates, or have their own scoring pipeline?
2. Should the agent be allowed to use subagents for parallel research (like OpenClaw's narrative generation)?
3. How to handle research that contradicts existing MEMORY.md entries? (knowledge correction)
4. Should there be a "research backlog" for gaps that couldn't be resolved, similar to `IMPORT_BACKLOG.md`?

---

Happy to contribute an implementation PR if there's interest. This feels like a natural evolution of the dreaming system.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Dreaming + AutoResearch — autonomous knowledge consolidation with verification #2

Summary

Motivation

Proposed Design

Gap Detection (during day)

Enhanced REM Phase — Research Loop

Research Candidate Schema

Deep Phase Integration

DREAMS.md Output

Implementation Considerations

What stays deterministic

What requires LLM

Configuration

Safety constraints

Compound Effect

References

Open Questions

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

	AutoResearch (Karpathy)	Dreaming + AutoResearch
Optimizes	val_bpb (training metric)	Knowledge coverage & accuracy
Modifies	train.py (code)	MEMORY.md (knowledge)
Validation	Did loss improve?	Is it verifiable and correct?
Keep/Discard	git commit / git revert	promote / discard snippet
Loop trigger	Continuous (while true)	Nightly (3 AM, integrated with dreaming)

Feature: Dreaming + AutoResearch — autonomous knowledge consolidation with verification #2

Description

Summary

Motivation

Proposed Design

Gap Detection (during day)

Enhanced REM Phase — Research Loop

Research Candidate Schema

Deep Phase Integration

DREAMS.md Output

Implementation Considerations

What stays deterministic

What requires LLM

Configuration

Safety constraints

Compound Effect

References

Open Questions

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions