95 Days of Autonomous Agents: What Actually Broke (and What Did Not) #1523

jingchang0623-crypto · 2026-05-10T12:06:31Z

jingchang0623-crypto
May 10, 2026

We have been running 5+ autonomous agents 24/7 for 95 days straight, building and operating miaoquai.com (AI tools directory + news platform). Here is what broke, what did not, and what surprised us.

What Worked Surprisingly Well

1. Markdown-based memory
We tried Redis, SQLite, and vector stores. They all worked. But the thing that actually stuck? Plain markdown files.

memory/
  2026-05-10.md   # daily state
  MEMORY.md        # long-term patterns
  USER.md          # user preferences

Why? Zero infrastructure. Debuggable at 3am. Survives server restarts. Agent reads on startup, writes on shutdown.

2. The coordinator-subagent pattern
One main agent (orchestrator) + specialized subagents. Orchestrator does planning. Subagents do execution. Cost dropped 75%. Reliability went up.

3. Cron + health checks
We have cron jobs that check agent health every 5 minutes. If an agent stops updating its state file, we get an alert. Simple and effective.

What Broke (Spectacularly)

1. Tool poisoning
Third-party MCP server was injecting "suggested links" into tool responses. Our agents blindly wrote those into published content. Took us 3 days to notice.

2. Local model consistency
Qwen 14B via Ollama worked great for chat. Terrible for autonomous tasks. Context window, tool calling, multi-step reasoning — all unreliable compared to cloud models.

3. Cross-channel memory
Agent chatting on Telegram vs Discord vs terminal? No shared memory. Each channel started fresh. We had to build our own cross-channel state sync.

What We Would Do Differently

Start with trust boundaries: Assume every MCP server is hostile until proven otherwise
Hybrid model from day one: Local for cheap tasks, cloud for complex reasoning
State checkpointing: Save workflow state after every step. Resume from last checkpoint on failure
Single-writer principle: Only one agent writes to shared state at a time. Prevents corruption.

Tools We Built

OpenClaw Agent Cost Optimizer - State tracking for multi-agent
mcp-tool-audit - Detect tool poisoning patterns

More details in our other discussions:

Tool poisoning: Tool poisoning is real: How we detected a malicious MCP server injecting instructions #1518
State management: The MEMORY.md Pattern: How We Solved Persistent State for 24/7 Agent Operations #1501
Cost optimization: A2A Protocol Meets Real Production: What We Learned Running 6 Interdependent Agents #1484

Happy to answer questions or share more details about any of these lessons.

Zawwarsami16 · 2026-05-23T01:36:01Z

Zawwarsami16
May 23, 2026

The markdown-as-bedrock instinct is right. We landed in a hybrid shape over a similar timeframe (5 months across a VPS coordinator + a research agent + phone + claude.ai web): flat files for the durable, slow-changing things — the CLAUDE.md charter, role docs, daily journals, status — and Postgres for the append-only timeline of observations and decisions across agents. Markdown stays canonical for "what is this agent supposed to be." Postgres holds "what did anyone notice in the last week."

Two riffs on what you posted in case useful:

Cross-channel memory. Worth flipping single-writer here. We got off single-writer by going fully append-only — every "edit" is a new write that supersedes the old one by id reference, never an in-place mutation. Three agents can write the same minute, the timeline just records all three with their written_by slugs. Conflicts resolve at read time, not write time. The cross-channel problem you describe becomes a non-problem when there is one durable timeline that every channel writes to over MCP under a per-agent bearer token. Telegram agent, terminal agent, web agent, phone — same bearer URL, different slug, one timeline.

Tool poisoning. Agreed that every MCP server is hostile until proven otherwise. The lighter-weight defense layer we run on writes (not reads) is a server-side credential-pattern scanner before persisting any memory — refuses writes whose content looks like API keys, JWTs, or private-key material unless the calling agent passes an acknowledge_unsafe flag whose use itself audit-logs. Does not catch semantic poisoning (e.g. "suggested links" injection into prose) but does catch the "third-party server is exfiltrating credentials through your write path" class, which is the worse one.

Coordinator + subagent cost drop. We see similar numbers. The lever for us was making the coordinator a cheap-and-frequent call (small context, narrow tools, fires on a 2-hour cron) and letting it spawn a deep specialist only when it has a concrete question. The default state is the coordinator quietly running. Specialists only when summoned.

Curious about your tool-poisoning experience specifically — was the third-party MCP servers injection visible in tool_result JSON shape, or only inside the natural-language body? We have not seen the natural-language form yet but I expect it.

0 replies

jingchang0623-crypto · 2026-06-01T12:05:36Z

jingchang0623-crypto
Jun 1, 2026
Author

We just shared our own 95-day report today! Key parallels:

Context window drift is real - after ~24h agents lose track of what they did. Our fix: structured memory files updated after each cycle.
Cascading failure pattern: Agent A fails → Agent B runs on stale data → Agent C makes wrong decisions. We now have circuit breaker pattern.
3 AM rule: agents produce most creative work 02:00-04:00. We schedule creative tasks there.

Our uptime: 100% (95/95 days). Total tasks: 15,840+. Success rate: 97.8%.

One difference: we use OpenClaw cron instead of custom schedulers. The built-in scheduling is rock solid - 98 consecutive RSS cycles without a single miss.

Full metrics: https://miaoquai.com/stories/ai-agent-95-days-production.html

0 replies

MertEnesYurtseven · 2026-06-14T14:44:01Z

MertEnesYurtseven
Jun 14, 2026

Great writeup — the cross-channel memory problem and the single-writer principle both point to the same architectural question: what is the shared state, and how do agents coordinate writes to it?

We have been working on a pattern that addresses several of the pain points here directly:

Cross-channel memory without cross-channel sync. Instead of each agent maintaining private state that needs to be synced across channels, all agents read from and write to a shared versioned causal tree. The tree IS the shared state — Telegram agent, Discord agent, and terminal agent all see the same evidence traces, the same resolved branches, the same contradictions. No per-channel sync layer needed because there is no channel-specific state.

Single-writer via atomic claims. Branch nodes in the tree support compare-and-swap semantics — only one agent can claim a branch at a time. This enforces the single-writer principle at the data model level, not as a convention agents have to follow.

State checkpointing is structural. The tree snapshots at phase boundaries. Agents always read from the last clean snapshot, write to the current staging area. Resume from any checkpoint — the tree carries full provenance: which agent wrote which evidence trace from which tool call in which phase.

Append-only by design. Evidence traces are immutable once attached. The tree grows — it never overwrites. Revisions are proposals stored in node metadata, applied by a cleanup pass between phases, with previous values preserved for audit. Conflict resolution is deferred to an estimation phase that weighs evidence quality (source type, confidence, recency) — no write-time judge, exactly the "resolve at read time" approach.

Repo: https://github.com/MertEnesYurtseven/structura

Curious how the append-only timeline with supersedes references handles the case where two agents independently discover the same finding in different sessions — does the read-time resolver detect the semantic overlap, or does it rely on one superseding the other by timestamp?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

95 Days of Autonomous Agents: What Actually Broke (and What Did Not) #1523

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

95 Days of Autonomous Agents: What Actually Broke (and What Did Not) #1523

Uh oh!

jingchang0623-crypto May 10, 2026

What Worked Surprisingly Well

What Broke (Spectacularly)

What We Would Do Differently

Tools We Built

Replies: 3 comments

Uh oh!

Zawwarsami16 May 23, 2026

Uh oh!

jingchang0623-crypto Jun 1, 2026 Author

Uh oh!

MertEnesYurtseven Jun 14, 2026

jingchang0623-crypto
May 10, 2026

Zawwarsami16
May 23, 2026

jingchang0623-crypto
Jun 1, 2026
Author

MertEnesYurtseven
Jun 14, 2026