feat(envd): support user-defined file metadata via xattrs#2732
feat(envd): support user-defined file metadata via xattrs#2732mishushakov wants to merge 22 commits into
Conversation
Adds an optional `metadata` query parameter on POST /files (deepObject style — `metadata[key]=value`) that envd persists as xattrs under the `user.` namespace. The same metadata is returned on every EntryInfo surfaced by the HTTP upload response and the filesystem gRPC service (Stat, ListDir, Move, MakeDir).
PR SummaryMedium Risk Overview Reviewed by Cursor Bugbot for commit 7dbc57c. Bugbot is set up for automated code reviews on this repo. Configure here. |
❌ 4 Tests Failed:
View the full list of 4 ❄️ flaky test(s)
To view more test analytics, go to the Test Analytics Dashboard |
There was a problem hiding this comment.
Code Review
The listxattr and getxattr functions in xattr_linux.go are susceptible to race conditions where the size of extended attributes can increase between the initial size check and the subsequent data retrieval. These calls should be wrapped in retry loops that handle unix.ERANGE by re-querying the size and re-allocating the buffer to ensure correctness in environments with concurrent file modifications.
|
@claude review |
- Retry listxattr/getxattr on ERANGE to tolerate concurrent xattr writers. - Short-circuit empty-value reads to avoid a slice-bounds panic when the kernel reinterprets a zero-length buffer as a size query. - Validate metadata keys/values (non-empty, no NUL, length caps) at the HTTP boundary and inside WriteMetadata so invalid input returns 400 instead of a bare 500. - Add unit tests for validation and xattr round-trip. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
envd writes metadata once at upload time and only reads it afterwards — there are no concurrent xattr writers in the design, so the size-then-fetch race the retry loops guarded against cannot happen in practice. Restore the simpler single-pass form. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces the deepObject `?metadata[key]=value` query parameter with `X-Metadata-<key>: <value>` request headers, matching the S3/GCS/Azure convention for object metadata. The header form works uniformly for raw and multipart uploads, avoids URL-encoding `[` / `]`, and sidesteps query string length caps. Keys are lowercased and the `X-Metadata-` prefix is stripped before they become `user.<key>` xattrs. Bumps envd to 0.5.27 and the OpenAPI spec to 0.1.3. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
revive's inefficient-map-lookup correctly flagged the loop — a direct `got[MetadataXattrPrefix+"keep"]` lookup expresses the assertion better. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a5fbaa9523
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Storage exhaustion during the metadata write was returning 500, breaking clients that key off the 507 the rest of /files already returns for ENOSPC. Map both ENOSPC and EDQUOT from filesystem.WriteMetadata to StatusInsufficientStorage so the same resource condition is reported consistently. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Uploads to /sys, /proc, and other filesystems without xattr support were succeeding for the file body but then returning 500 from the WriteMetadata call, breaking the supported case of writing to e.g. /sys/fs/cgroup that the upload tests already exercise. Match ReadMetadata's behavior: when WriteMetadata returns ENOTSUP / EOPNOTSUPP, log a warning and drop the in-flight metadata so the response EntryInfo doesn't falsely claim it was persisted, then complete the upload normally. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
processFile previously mutated the shared metadata map via clear() to hide it from the response when the FS didn't support xattrs. That broke multipart uploads: clearing on the first file's path also stripped metadata from every subsequent file in the request. Drop the clear() and the request-side maps.Copy; instead, read xattrs back from the file after writing and use that for EntryInfo.Metadata. The response always reflects what's actually on disk, regardless of FS support, partial writes, or pre-existing xattrs — matching Stat / ListDir semantics. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a878e36aff
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 79d1ea7e67
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
- Namespace xattr keys under user.e2b. to avoid clashes with foreign user.* xattrs - Lower MaxMetadataValueLen to 1 KiB to fit a single ext4 4 KiB block alongside other xattrs and the inode header - Enforce printable US-ASCII on metadata keys and values (matches the spec) - WriteMetadata: empty/nil = no-op; non-empty = replace the full user.e2b.* set so re-uploading with new metadata clears stale keys without affecting foreign xattrs - Simplify splitNullTerminated via bytes.Split - Drop the readEntryMetadata wrapper and call ReadMetadata directly - Document upload metadata semantics, size caps, and ASCII constraint in envd.yaml - Bump envd to 0.5.28 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replace the hand-rolled xattr syscall wrappers with the cross-platform github.com/pkg/xattr library: - Drop the custom listxattr/getxattr size-then-read helpers and the null-separator name parsing; the library handles them. - Unify the Linux-only implementation and the darwin no-op shims into a single cross-platform xattr.go, removing the //go:build linux constraint. Rename xattr_linux_test.go to xattr_metadata_test.go so the metadata tests run on all platforms. - Export IsXattrUnsupported so upload.go reuses it instead of duplicating the ENOTSUP/EOPNOTSUPP check. No behavioral change; the user.e2b.* namespacing, full-set replace semantics, and validation are unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The EntryInfo.metadata field comment said keys live in the `user.` namespace, but the implementation uses `user.e2b.` and filters out foreign `user.*` xattrs. Update the .proto and the two generated filesystem.pb.go files to match. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit c6e843f. Configure here.
- Log (instead of silently dropping) ReadMetadata errors when building upload responses and in GetEntryInfo, so xattr read failures are observable. Metadata stays best-effort — a read failure no longer hides itself but still doesn't fail the entry lookup. - Correct the metadataHeaderPrefix comment: the resulting xattr is user.e2b.<key>, not user.<key>. - Document in the OpenAPI spec that duplicate X-Metadata-<key> headers use the first value. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace the per-value 1 KiB limit with a single 4 KiB cap on the combined size of all metadata on a file (stored name + value, summed across entries). ext4 keeps an inode's xattrs in one filesystem block (~4 KiB), so a large set would otherwise fail late inside WriteMetadata with ENOSPC/E2BIG; validating the total up front rejects it cleanly with HTTP 400. The per-key length cap stays (it's the hard VFS xattr-name limit). Spec and tests updated. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
GetEntryInfo is a pure shared helper with no logger in scope, and zap.L() isn't used anywhere in the shared module outside the logger package, so reaching for the global logger here introduced a non-idiomatic dependency on a hot path. Revert to the best-effort silent read; the upload handler still calls ReadMetadata directly and logs failures where it has a logger. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
| // Metadata is best-effort: a read failure shouldn't fail the entry lookup | ||
| // (Size/Mode/times are still valid), and this helper has no logger to | ||
| // report it through. Callers that need the error (e.g. the upload handler) | ||
| // call ReadMetadata directly. | ||
| entry.Metadata, _ = ReadMetadata(path) |
There was a problem hiding this comment.
🟡 The new entry.Metadata, _ = ReadMetadata(path) in GetEntryInfo (entry.go:82) issues path-based listxattr/getxattr syscalls in the caller's mount namespace. The orchestrator volume service's toEntry (packages/orchestrator/pkg/volumes/service.go:136) invokes this outside fs.act/ns.Do (unlike path_stat.go:35, which correctly goes through fs.GetEntry), so for every entry returned by dir_list.go:97 (depth up to 10), dir_create.go:85, file_create.go:115, and path_update.go:90 we issue an extra path-based xattr syscall against the host's view of the chroot-relative path — typically ENOENT, but pure waste in recursive listings. Today the result is silently discarded by fromEntryInfo because the orchestrator EntryInfo proto has no metadata field, so this is only wasted syscalls plus a latent footgun: if anyone later adds metadata to that proto without rewiring toEntry through fs.act, the service will return host-side xattrs labeled as the volume entry's. Fix is either to route toEntry through fs.GetEntry/fs.act like path_stat.go already does, or to give GetEntryInfo an opt-out for callers that don't surface metadata.
Extended reasoning...
What the bug is
GetEntryInfo (packages/shared/pkg/filesystem/entry.go:82) now unconditionally calls ReadMetadata(path), which performs path-based xattr.List/xattr.Get syscalls. Unlike every other field that GetEntryInfo populates (all derived from the passed-in fileInfo), these syscalls run in whichever mount namespace the calling goroutine is in, against whatever inode the kernel resolves path to in that namespace. That is fine when the caller has already arranged for the goroutine to be inside the right chroot, but it is wrong when the caller hands in a chroot-relative path string from the outside.
Where the call goes wrong
The orchestrator volume service does exactly that. toEntry (packages/orchestrator/pkg/volumes/service.go:136-140) calls filesystem.GetEntryInfo directly, with no fs.act wrapper:
func toEntry(fullVolumePath string, fileInfo os.FileInfo) *orchestrator.EntryInfo {
entryInfo := filesystem.GetEntryInfo(fullVolumePath, fileInfo)
return fromEntryInfo(fullVolumePath, entryInfo)
}The four callers all obtain fileInfo via fs.ReadDir/fs.Stat (which correctly enter the chroot via the pinned-thread goroutine in chrooted.Chrooted.act → mountns.Do), but then call toEntry from the outer goroutine, which lives in the host mount namespace:
- packages/orchestrator/pkg/volumes/dir_list.go:97 (per item in
listRecursive, depth up to 10) - packages/orchestrator/pkg/volumes/dir_create.go:85
- packages/orchestrator/pkg/volumes/file_create.go:115
- packages/orchestrator/pkg/volumes/path_update.go:90
Contrast with packages/orchestrator/pkg/volumes/path_stat.go:35, which uses fs.GetEntry(path). That helper wraps filesystem.GetEntryFromPath inside fs.act so the ReadMetadata syscalls run in the chrooted namespace. The exact same data on Stat versus ListDir therefore goes through two different code paths today, only one of which is correct.
Addressing the refutations
The refuters note that this is the same root cause as a separate bug_005 finding, that there is no current user-visible bug because the orchestrator EntryInfo proto (packages/shared/pkg/grpc/orchestrator/volume.pb.go) has no metadata field, and that the leak scenario requires a hypothetical future proto change. All three points are correct, and they are why I'm filing this as nit rather than normal:
fromEntryInfo(service.go:142-156) does not copyentryInfo.Metadata, and the orchestrator proto has no place to surface it. Whateverxattr.Listreturns for the host's view of the path string is silently discarded.- The leak-on-collision case (e.g.
/etc/passwdexists on both host and chroot) is currently harmless for the same reason. - The wasted-syscall framing alone is small per call — one extra
listxattrperFileInforeturned, typically failing with ENOENT in the host namespace.
What pushes this above zero-cost-to-flag is the latent footgun: the wiring in toEntry is wrong in a way that will not be visible to anyone who later adds a metadata field to the orchestrator proto because the existing fromEntryInfo → entryInfo.Metadata plumbing will appear to work end-to-end — except the xattrs it returns will be from the host, not the volume. That is the kind of mistake that escapes review precisely because the path looks symmetric to the envd side (where the goroutine running GetEntryInfo is the request goroutine inside the sandbox, so xattrs resolve correctly).
The refutation that "this is just a duplicate of bug_005" is also fine — they share a root cause. But the synthesis description here adds the structural framing (chroot/namespace mismatch + which call sites lose, contrasted with the correctly-wired path_stat.go) that a code reviewer needs in order to fix it once rather than playing whack-a-mole when the proto is extended.
Step-by-step proof for the latent leak (one-line proto change away)
- A sandbox volume is rooted at
/var/data/volumes/sandbox-Xon the host; inside the chroot the same data appears at/. - Someone adds
map<string,string> metadata = 11;to packages/shared/pkg/grpc/orchestrator/volume.proto'sEntryInfoand a one-lineMetadata: entryInfo.Metadatacopy tofromEntryInfo(mirroring what the envd-sidefilesystem.pb.goalready does). No other change. - A caller invokes
ListDir("/")on the volume service.dir_list.gorunsfs.ReadDir("/")insidefs.actand obtains correctFileInfoforfoo.txtfrom the chrooted view. - Back in the outer goroutine (host namespace),
toEntry("/foo.txt", item)callsfilesystem.GetEntryInfo→ReadMetadata("/foo.txt")→unix.Listxattr("/foo.txt", ...). The kernel resolves/foo.txtin the host namespace — usually ENOENT, but on path collision (e.g. a real/foo.txtexists on the host) it succeeds and returns the host file'suser.e2b.*xattrs. fromEntryInfocopies that map into the response. The client seesmetadataattributed to the volume entry that actually came from a host inode of the same path.
This is the future bug that the present code is one-line away from. The fix is to remove the divergence between Stat (correct via fs.GetEntry) and ListDir/MakeDir/CreateFile/UpdatePath (incorrect via direct GetEntryInfo) now, before the proto grows the field that exposes it.
How to fix
Two reasonable options:
- Rewire
toEntrythroughfs.GetEntry(which already doesfs.act(filesystem.GetEntryFromPath)). Callers would need to passfstotoEntryor split the work so the chrooted call happens inside the existingfs.actblock; the latter is simpler indir_list.go/dir_create.go/file_create.go/path_update.gobecause they already holdfs. - Add a
GetEntryInfoWithoutMetadata(or aboolargument) that skips theReadMetadatacall, and use it from the orchestrator volume service since the orchestrator proto does not surface metadata anyway. This keeps the orchestrator behavior unchanged and removes the wasted syscalls today; it would still require revisiting if the proto gains ametadatafield, but at least the silent discard is replaced with a visible opt-out.

Summary
POST /files. Any request header of the formX-Metadata-<key>: <value>is persisted as an extended attribute (xattr) on the uploaded file. TheX-Metadata-prefix is stripped and the remaining header name is lowercased to form the key. Multiple files in one multipart upload receive the same metadata.EntryInfo: the HTTP upload response, and the filesystem gRPC service (Stat,ListDir,Move,MakeDir, etc.) via a newmap<string, string> metadataproto field (field 11). The wiring is centralized infilesystem.GetEntryInfo, so all callers pick it up.user.e2b.namespace —user.is required by the Linux VFS for unprivileged xattrs, ande2b.namespaces our keys so they don't collide with other tooling writing touser.*. Foreignuser.*xattrs are filtered out and not surfaced.X-Metadata-*headers clears all existing metadata (O_TRUNCpreserves xattrs, so we always rewrite).0x20–0x7E), else HTTP 400. Keys are capped at 246 bytes (255-byte VFS xattr-name limit minus theuser.e2b.prefix); values at 1024 bytes./proc,/sys) are best-effort — the body is persisted and we log a warning rather than fail;ENOSPC/EDQUOTmap to HTTP 507. The responseEntryInforeads xattrs back from disk, so it never falsely claims metadata was persisted.Implementation
packages/shared/pkg/filesystem/xattr.go, backed by thegithub.com/pkg/xattrlibrary rather than hand-rolled syscalls. The library handles the size-probe/retry and name parsing that previously lived in our ownxattr_linux.go.ReadMetadata/WriteMetadatacarry the e2b-specific policy on top of the library:user.e2b.namespacing, full-set replace semantics, andValidateMetadata.IsXattrUnsupportedis exported soupload.gocan treat xattr-less filesystems as best-effort without duplicating the errno check.pkg/xattris cross-platform, so the metadata tests run on Linux and macOS alike (onlyextractStatTimeskeeps alinux/darwinsplit, forStat_tfield differences).0.5.28and the OpenAPI spec to0.1.3.Test plan
make lintclean for envd and sharedgo test ./...passes forpackages/envdandpackages/shared/pkg/filesystem(metadata tests now run on macOS too)-H 'X-Metadata-author: mish' -H 'X-Metadata-purpose: upload'and confirmStatreturns the same mapuser.e2b.*viagetfattr -d <path>inside a sandbox🤖 Generated with Claude Code