snapshot export fails with "header field too long" on highly fragmented memory-ranges

## Symptom

`cocoon snapshot export` aborts on the `memory-ranges` entry for snapshots taken from VMs whose guest memory is sufficiently fragmented:

```
$ sudo cocoon snapshot export <name> -o /tmp/probe.tar
INF exporting to /tmp/probe.tar ...
Error: write archive: write header memory-ranges: archive/tar: header field too long
```

Reproduced 2026-05-05 against a Windows 11 cocoon VM running `simular-pro-agent-runtime 1.8.0` (Electron app, live Firebase WebSocket signed in). The same export against the equivalent VM with the agent **not** signed in (no live Firebase connection → fewer fragmented allocations) succeeds normally. The 1.5.0 build of the same agent also exports fine.

## Root cause

`utils/tar_sparse_linux.go:tarFileMaybeSparse` packs the entire sparse-segment list into a single tar PAX record (`COCOON.sparse.map`):

```go
hdr.PAXRecords = map[string]string{
    paxSparseMap:  string(mapJSON),                  // can be > 1MB
    paxSparseSize: strconv.FormatInt(size, 10),
}
if err := tw.WriteHeader(hdr); err != nil { ... }
```

Go's `archive/tar` caps the encoded PAX block at `maxSpecialFileSize = 1<<20` (see `archive/tar/format.go`), and `Writer.writeHeader` returns `ErrFieldTooLong` when the encoded records exceed it. For a guest with many small live allocations (V8 heap, IPC buffers, WebSocket pools, native-module mmaps), `memory-ranges` produces tens of thousands of sparse segments; the JSON-encoded segment list balloons past the 1MB cap.

Empirical limit (measured against Go 1.26 `archive/tar`):

| segments | mapJSON size | tar.Writer.WriteHeader |
|---|---|---|
| 1,000 | ~22 KB | ok |
| 10,000 | ~239 KB | ok |
| 30,000 | ~736 KB | ok |
| **50,000** | **~1.2 MB** | **`header field too long`** |
| 100,000 | ~2.5 MB | `header field too long` |

## Downstream impact

vk-cocoon's hibernate path (`Save → Push → Remove`) calls `Pusher.PushSnapshot`, which streams `cocoon snapshot export -o -` into epoch. When export fails on the memory-ranges entry, push only uploads the small metadata blobs (snapshot.json, config) before erroring; the memory layer never PUTs. vk-cocoon's workqueue then silently retries every ~30s with the same outcome, so:

- vm-service's hibernate API never observes `phase=Suspended` and times out (was 300s, raising the budget does not help — the loop is permanent).
- The CocoonSet stays in `Running` phase with `pod=ProviderFailed`.
- Hibernating any VM whose agent has a live Firebase connection is currently unreachable in production.

vk-cocoon's `Provider.UpdatePod` does return the error from `hibernate()` to the workqueue, but neither vk-cocoon nor cocoon CLI logs the wrapped error message at INF/WRN level; the only signal in operator logs is the absence of the expected `vm rm` call after the export step. That made this bug significantly harder to diagnose than it had to be — separately worth fixing the silent-retry behavior so the failing PushSnapshot error surfaces in the journal.

## Proposed fix

PR #23 falls back to a non-sparse tar entry when `len(mapJSON)` exceeds ~800 KB (well below the 1MB cap, with margin for the size record + framing). Memory-ranges can be GB-scale, so the fallback gives up the sparse-export size win on the affected file — but a successful larger push beats an indefinite hung loop. The reader path is unchanged.

Open question: should there also be an INF/WRN log on the cocoon-CLI side when the fallback fires, so operators can see "this snapshot took the slow path because the segment map was too fragmented"? Right now the fallback is silent — that's friendly to existing tooling, but operators investigating slow snapshots won't know why a particular file was emitted full-size.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

snapshot export fails with "header field too long" on highly fragmented memory-ranges #24

Symptom

Root cause

Downstream impact

Proposed fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

segments	mapJSON size	tar.Writer.WriteHeader
1,000	~22 KB	ok
10,000	~239 KB	ok
30,000	~736 KB	ok
50,000	~1.2 MB	`header field too long`
100,000	~2.5 MB	`header field too long`

snapshot export fails with "header field too long" on highly fragmented memory-ranges #24

Description

Symptom

Root cause

Downstream impact

Proposed fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions