Skip to content

feat: Phase 1 video upload support (Blossom-compliant-ish)#285

Merged
tlongwell-block merged 33 commits intomainfrom
feat/video-support
Apr 10, 2026
Merged

feat: Phase 1 video upload support (Blossom-compliant-ish)#285
tlongwell-block merged 33 commits intomainfrom
feat/video-support

Conversation

@tlongwell-block
Copy link
Copy Markdown
Collaborator

@tlongwell-block tlongwell-block commented Apr 9, 2026

Video Upload Support — Full Stack

What

End-to-end video support for Sprout: Blossom-compliant relay upload/validation/serving + desktop client upload with ffmpeg transcode + inline video playback with poster frame thumbnails.

Relay — Upload, Validation, Serving

  • Upload: Streaming upload to S3 via temp file — incremental SHA-256, size enforcement mid-stream, magic-byte validation (4 KiB sniff buffer)
  • Validation: Codec allowlist (H.264), duration cap (600s), resolution cap (3840×2160), moov-before-mdat check with bounded atom scanner (MAX_ATOMS=1024), container brand check, zero-duration rejection, timescale=0 guard (prevents div-by-zero panic in mp4 crate)
  • Serving: S3-native range GET for 206 Partial Content (video seeking), streaming full downloads, suffix range support (bytes=-N) per RFC 9110, multi-range fallback to 200 (RFC 9110 §14.2)
  • Security: Content-Type spoofing prevention (each path rejects the other via magic bytes), auth-before-body, no internal details leaked in errors, moov scanner fails closed on budget exceeded
  • imeta: Full NIP-71 validation — duration/bitrate/image fields gated to video/mp4 only, hash cross-checks (thumb keyed by parent, poster frame independent), poster frame blob+sidecar verification with MIME/extension checks, duration cross-check against sidecar, defense-in-depth on poster hash extraction

Desktop Client — Upload + Playback

  • ffmpeg transcode: All video files (including MP4) transcoded to H.264/AAC/MP4/fast-start before upload. Handles HEVC, VP9, ProRes, non-faststart MP4, 10-bit, wrong pixel format, MOV containers
  • Poster frame extraction: After transcode, ffmpeg extracts a JPEG poster frame (-ss 1 with fallback to first frame for <1s videos, scale=640:-2, q:v 2). Poster is uploaded as a separate image blob (best-effort — failure does not block video upload). Poster URL returned in BlobDescriptor.image and emitted as NIP-71 image field in imeta tags
  • All upload paths: 📎 button (file dialog), drag-and-drop, paste — all support video with transcode + poster extraction
  • File picker: Accepts mp4, mov, mkv, webm, avi (+ existing image formats)
  • Inline playback: Videos render as <video poster={url} preload="metadata"> — browser fetches only moov atom initially, shows poster thumbnail, uses range requests for playback
  • Poster rendering on received messages: MessageRow parses imeta tags via parseImetaTags(), threads imetaByUrl map to Markdown component for poster lookup (falls back to thumb for compatibility)
  • Proxy: sprout-media:// custom protocol forwards Range headers for video seeking, propagates Content-Range/Accept-Ranges
  • Duration: End-to-end — BlobDescriptor → imeta tag builder → relay cross-check
  • Safety: spawn_blocking on all sync I/O paths, UUID temp files, RAII temp cleanup (closure guard ensures cleanup on all exit paths), OsStr path handling, find_ffmpeg() with platform-specific install instructions, ffmpeg stderr captured and logged for debugging
  • Requires: ffmpeg on PATH (not bundled)

Architecture

Desktop:
  📎/Drop/Paste → is_video_file()? → ffmpeg transcode → extract poster frame
                                   → upload video → upload poster (best-effort)
                                   → BlobDescriptor { image: poster_url }
                                   → direct upload (images, no poster)

Relay:
  PUT /media/upload → Content-Type branch →
    video/ → process_video_upload() → stream to temp → validate → put_file() to S3
    image/ → process_upload() → validate_content() → put() to S3

  GET /media/{hash}.mp4 →
    Range header? → HEAD for size → get_range() → 206 Partial Content
    No range?    → get_stream()   → Body::from_stream() → 200 OK

  Message send → validate_imeta_tags() → verify_imeta_blobs() →
    Checks: video blob exists, poster blob exists, poster is image MIME,
    poster extension matches sidecar, duration cross-check

  sprout-media:// proxy → forwards Range → propagates 200/206/416

Files Changed (25 files, ~3500 lines)

  • sprout-media: storage.rs, upload.rs, validation.rs, error.rs, config.rs, types.rs, auth.rs, lib.rs, Cargo.toml
  • sprout-relay: media.rs, messages.rs, config.rs, router.rs
  • sprout-test-client: e2e_media_video.rs (7 E2E tests including poster imeta accept/reject)
  • desktop/src-tauri: commands/media.rs (ffmpeg transcode + poster extraction), lib.rs (proxy Range forwarding)
  • desktop/src: useMediaUpload.ts, markdown.tsx, parseImeta.ts, MessageComposer.tsx, MessageRow.tsx, tauri.ts, mediaUrl.ts
  • desktop/scripts: check-file-sizes.mjs (media.rs limit bump for poster helpers)
  • scripts: test-video-upload.sh (15-case live test script including poster tests)

Test Coverage

  • 761 unit tests passing across workspace (sprout-media + sprout-relay + all crates)
  • 7 E2E integration tests (upload roundtrip, Content-Type spoofing, range 206/416, auth, video+poster imeta accepted via WS, video-as-poster rejected via WS)
  • 15-case live test script verified against running relay (including poster upload, blob coexistence, sidecar metadata, Content-Type checks)
  • Zero clippy warnings, all pre-push hooks green (biome, rustfmt, cargo check, desktop build)

Review Scores (after crossfire fix iterations)

  • Opus (Claude 4.6 Opus): 9/10 APPROVE_WITH_NOTES — all prior findings addressed, remaining notes are low-severity (proxy streaming deferred to v2)
  • Codex (GPT-5.4): 9/10 APPROVE — iterated through 4 review passes total; all findings addressed including scoped auth, TOCTOU fd-pinning, proxy OOM cap, ffmpeg timeout + stderr deadlock prevention
  • No merge blockers

Fix Commits (crossfire follow-ups)

  • c06e4f2 — scoped auth window (600s images / 3600s video), TOCTOU fd-pinning restored, proxy 20 MiB OOM cap, body-limit error robustness, client auth expiry scoped
  • 34a6f7b — ffmpeg wall-clock timeout (10min transcode, 30s poster extraction)
  • 3745637-loglevel error on all ffmpeg calls to prevent stderr pipe deadlock in timeout wrapper

No Database Changes

Zero migrations, schema changes, or new tables. Video metadata stored in S3 sidecars (same pattern as images). Poster frames are independent content-addressed blobs linked only through the imeta tag.

V2 Roadmap

  • Smart skip/remux via ffprobe (~100x faster for compliant files)
  • Progress bar for ffmpeg transcode
  • Streaming upload from desktop (avoid RAM buffering for large files)
  • Streaming proxy response (avoid desktop RAM buffering on playback)
  • Bundled ffmpeg (zero-friction install)
  • Per-pubkey upload rate limiting + storage quotas
  • Cancellation for long transcodes
  • Blurhash generation for poster frames

…r tests

messages.rs:
- Add image_value tracking and hash cross-check against x field
  (same pattern as thumb — NIP-71 poster frame must reference same blob)
- image field already rejects .mp4 URLs (image-only extensions)
- 2 new tests: hash mismatch rejection, matching hash acceptance

validation.rs:
- Add MAX_ATOMS=1024 iteration limit to check_moov_before_mdat()
  (prevents DoS from crafted files with millions of tiny atoms)
- Handle extended atom size (compact_size==1): read 64-bit size and continue
  scanning instead of silently stopping
- Handle atom_size==0 (extends-to-EOF): check mdat before breaking
- 4 new tests: iteration limit, extended size, extended mdat-before-moov,
  EOF atom mdat-before-moov

upload.rs:
- build_descriptor already correctly filters empty strings to None
  (no code change needed — added 3 tests proving it)
- Tests verify JSON serialization omits empty thumb/blurhash for video
Duration validation now rejects d <= 0.0 instead of d < 0.0.
Zero-duration videos are semantically invalid — server-side
validate_video_file() also catches this via mvhd timescale,
but belt-and-suspenders at the imeta layer is cheap and safe.

Addresses Clove's re-review item #3 (low severity).
- get_range(key, start, end): S3-native range GET via bucket.get_object_range(),
  inclusive byte offsets, only transfers requested slice (never loads full blob)
- put_file(key, path, content_type): streaming upload from disk via 8 MiB BufReader,
  full file never held in RAM simultaneously
- duration_secs field on BlobMeta for video sidecar metadata
- Improved doc comments on put() method
config.rs:
- Add SPROUT_MAX_VIDEO_BYTES env var parsing (default 500 MB)
- Wires sprout-media's max_video_bytes into the relay Config

router.rs:
- Change media body limit from max_image_bytes to max(max_image_bytes, max_video_bytes)
- Ensures video uploads aren't rejected at the transport layer
- Per-MIME app-level limits still enforced in sprout-media validation
Replace full-blob load + in-memory slice with:
- HEAD to get total size (no blob data loaded)
- get_range(key, start, end) for the 206 path only
- get(key) preserved for 200 full-download path

Eliminates O(blob_size) RAM allocation per range request.
A 500 MB video range request now allocates at most 16 MiB.

Also includes rustfmt cleanup on pre-existing lines.

Closes C3 from code review.
Before: check_moov_before_mdat() returned Ok(()) when MAX_ATOMS was
exceeded, silently passing files with 1025+ junk atoms hiding mdat.

After: returns Err(MoovNotAtFront) — fail closed. A file with too many
top-level atoms is abnormal and cannot be verified as fast-start.

Updated test to assert the error instead of Ok.
The poster frame (image field) is an independent blob with its own
content hash — it cannot match the video's x hash. The cross-check
rejected all legitimate poster frames by construction.

Fix: remove the cross-check entirely. Keep URL format validation and
image-extension allowlist (jpg/png/gif/webp). The poster frame is
validated as a local media URL with an image extension only.

Also: update thumb cross-check comment to clarify it checks URL key
consistency (thumbnails are keyed by parent hash), not content identity.

Removed: image_value variable, hash cross-check block, 2 obsolete tests.
Updated: poster frame test now uses different hash to prove independence.
Remove video/mp4 from ALLOWED_MIME_TYPES in validate_content(). This
closes the Content-Type spoofing attack: an MP4 uploaded as image/jpeg
now hits the image path, infer::get() detects video/mp4, and
validate_content() rejects it as DisallowedContentType.

Video uploads use process_video_upload() which has its own independent
magic-byte check. Each path rejects the other's content — defense in depth.

Also removes dead video/mp4 branches in validate_content() (size cap,
image bomb skip) since video/mp4 can no longer reach that code.
Closes the contract mismatch between validate_video_file() (accepted
duration <= 0.0) and validate_imeta_tags() (rejected duration <= 0.0).
A zero-duration video would pass upload validation but later fail imeta
validation — inconsistent behavior.

Now both paths agree: duration must be > 0.0 and <= 600.0.
- get_stream(key): returns ByteStream (Pin<Box<Stream<Item=Result<Bytes, MediaError>>>)
- Wraps bucket.get_object_stream(), checks status_code for 404, maps S3 errors
- Full object never buffered in RAM — intended for Body::from_stream() responses
- ByteStream type alias exported from lib.rs for downstream use
When axum's RequestBodyLimitLayer rejects an oversized stream, the error
propagates as a 'length limit' error through the body stream. Previously
this was mapped to MediaError::Io → 500 Internal Server Error.

Now: detect 'length limit' / 'body limit' in the stream error message,
map to io::ErrorKind::WriteZero, catch in the read loop, and return
MediaError::FileTooLarge → 413 Payload Too Large.

This gives clients a proper 413 response instead of a confusing 500.
New test file: e2e_media_video.rs with 5 integration tests:
1. test_video_upload_and_get — upload MP4, verify descriptor + GET
2. test_video_content_type_spoofing_rejected — MP4 as image/jpeg → rejected
3. test_video_range_request_206 — Range header → 206 + correct bytes
4. test_video_range_request_416 — out-of-range → 416
5. test_video_upload_no_auth_returns_401 — no auth → 401

Includes self-contained minimal MP4 builder (hand-crafted H.264 boxes).
Tests are #[ignore] — require running relay + MinIO.
Upload: swap Bytes extractor for axum::body::Body. Video path
streams directly to disk via into_data_stream() — never fully
buffered in RAM. Image path collects to bytes with explicit limit.
Removes futures_util::stream::once() workaround.

Download: 200 path uses get_stream() + Body::from_stream() instead
of get() — streams from S3 without loading full blob into RAM.
HEAD first for Content-Length (same pattern as 206 path).

Cleanup: remove stale streaming TODOs from media.rs, update
router.rs comment to reflect streaming reality.
Add image field HEAD check to verify_imeta_blobs, same pattern as thumb.
Key difference: poster frames are independent blobs, so the hash is
extracted from the image URL itself (via extract_hash_from_media_url),
not from x_value.

This closes the gap where clients could reference nonexistent poster
images in imeta tags and the relay would accept them.

Note: unit testing requires MediaStorage (S3 HEAD). Covered by E2E
tests in e2e_media_extended.rs (WebSocket imeta validation).
Add suffix range parsing to parse_byte_range(): bytes=-N returns
the last N bytes. Clamps to file start if N > total. Rejects
bytes=-0 and suffix on empty files.

4 new/updated tests. Removes known-deviation comment.
The first network read could be as small as 1 byte (proxy fragmentation),
which is too small for infer::get() to detect MP4 magic bytes (needs 12+
bytes for ftyp header). Previously we captured only the first chunk.

Now: accumulate up to 64 bytes across reads into a sniff buffer before
passing to infer::get(). This handles tiny initial chunks from proxies,
slow clients, or chunked transfer encoding.
4 KiB is the standard sniff buffer size — infer checks signatures at
various offsets, not just the first few bytes. 64 was sufficient for
MP4 ftyp but too small for robust format detection in general.
Per Hana's architecture recommendation.
Covers the full Blossom video upload flow:
- Upload MP4 with kind:24242 auth (nak + ffmpeg-generated test file)
- GET full blob (200, size match)
- HEAD with Accept-Ranges: bytes
- Range GET (206 Partial Content, exact byte count)
- Range GET past EOF (416 Range Not Satisfiable)
- Content-Type spoofing rejection (video/mp4 header, PNG body)
- Idempotent re-upload (same hash returns 200)

Requires: ffmpeg, nak, curl, jq, shasum. Works in dev mode.
Add dependencies: mp4, tempfile, tokio-util, futures-util, futures-core
Add video error variants: WrongCodec, DurationTooLong, ResolutionTooHigh,
  MoovNotAtFront, UnsupportedContainer, InvalidVideo, Io
Add max_video_bytes config field (default 500 MB)
Add duration field to BlobDescriptor
Bump Blossom auth window from 10min to 1hr for large uploads
messages.rs:
- Poster frame verification now loads sidecar (proves upload completed)
- Verifies sidecar MIME is image type (not video/other)
- HEADs canonical blob key using sidecar extension (matches serving path)
- Without sidecar check, a poster URL could pass verification but 404 on serve

e2e_media_video.rs:
- Add X-SHA-256 header to all 4 authenticated upload requests (BUD-11)
- Without this header, uploads would get 401 instead of testing the feature
…_blobs

1. Poster frame extension: extract ext from image URL, compare against
   sidecar's canonical extension. Mismatch means the URL would 404 on
   serve (GET resolves via sidecar ext, not URL ext).

2. Duration cross-check: if sidecar has duration_secs and client claims
   a duration in imeta, compare within 0.1s tolerance (float rounding
   from mvhd timescale). Prevents clients from lying about duration.
… poster defense-in-depth

- Reject video-only NIP-71 fields (duration, bitrate, image) on non-video
  imeta tags — previously accepted silently for image blobs
- Fall back to 200 full-body response for unsupported multi-range requests
  instead of returning 416 (per RFC 9110 §14.2: server MAY ignore Range)
- Return error instead of silently skipping poster frame verification when
  hash extraction fails (defense-in-depth; syntactic validation catches
  this upstream, but fail-closed is safer)
- Drop temp file immediately after S3 upload to free disk space eagerly
  instead of waiting for function return (matters for 500MB uploads)
- Add FRAGILE marker on body-limit string-matching error detection
- Add tests: .thumb.jpg rejected as poster frame, duration on image rejected
…line playback

Desktop client now supports uploading video files through all entry points:

📎 Button (pick_and_upload_media):
- File picker accepts mp4, mov, mkv, webm, avi (+ existing image formats)
- All video files transcoded to H.264/AAC/MP4/fast-start via ffmpeg
- Sniff magic bytes → transcode if video → upload
- All sync I/O in spawn_blocking to avoid async runtime starvation

Drag-and-drop / Paste (upload_media_bytes):
- Video bytes written to temp file → ffmpeg transcode → upload
- Accepts video/mp4, video/quicktime, video/x-matroska, video/webm, video/x-msvideo
- Temp files cleaned up after upload (or on failure)

Rendering:
- Videos use ![video](url) markdown syntax
- Markdown renderer detects .mp4 URLs → renders <video> with controls
- Images continue to render as <img> as before

Infrastructure:
- find_ffmpeg() distinguishes NotFound vs broken install vs other errors
- UUID-based temp file names (no collision under concurrent uploads)
- OsStr path passing to ffmpeg (handles non-UTF-8 paths on Unix)
- BlobDescriptor gains duration field (Tauri + TS types)
- imeta tag builder includes duration for video
- parseImeta parses duration from incoming tags

Requires ffmpeg on PATH. Clear error message with install instructions if missing.

TODO(v2): smart skip/remux via ffprobe, progress bar, streaming upload,
cancellation, bundled ffmpeg.
- Extract JPEG poster frame from transcoded MP4 via ffmpeg (-ss 1,
  fallback to first frame for <1s videos, scale=640:-2, q:v 2)
- Upload video first, then poster as separate image blob (best-effort:
  poster failure does not block video upload)
- Return poster URL in BlobDescriptor.image field
- Emit NIP-71 `image` field in imeta tags (server already validates it)
- Render with <video poster={url} preload="metadata"> — browser fetches
  only moov atom initially, uses range requests for playback
- Thread imetaByUrl from MessageRow through Markdown for received messages
- Parse `image` field in parseImetaTags for poster lookup on render
- Guard timescale=0 in MP4 validation (prevents div-by-zero panic in mp4 crate)
- Add E2E tests: video+poster imeta accepted via WS, video-as-poster rejected
- Extend test-video-upload.sh with poster upload, blob coexistence, sidecar checks
- Bump media.rs file size limit (550→650) for poster extraction helpers
… cap, error robustness

Server-side:
- auth: verify_blossom_auth_event takes max_age_secs parameter;
  images use 600s (10 min), video uses 3600s (1 hr). Previously all
  uploads shared the 1-hour window.
- upload: body-limit error detection adds LengthLimitError pattern
  for belt-and-suspenders robustness. FileTooLarge.size reports honest
  bytes-received-before-cutoff instead of nonsensical total+max sum.
- relay: AuthenticatedUpload extractor uses permissive 3600s window
  (content type unknown at extraction time); upload functions re-verify
  with the correct per-type window after body consumption.

Desktop:
- media: pick_and_upload_media restores TOCTOU-safe fd-pinning. File
  opened before spawn_blocking to pin inode; sniff header read from
  pinned fd; video path resolves fd_real_path for ffmpeg. Fd kept alive
  through entire ffmpeg transcode (drop only after completion).
- media: sign_blossom_upload_auth takes expiry_secs; do_upload derives
  it from MIME (3600s video, 300s images). Previously all uploads used
  300s, so video uploads >5 min would fail with expired auth.
- lib: proxy adds 20 MiB OOM defense cap for non-range GETs. Range
  requests (≤16 MiB from server) unaffected.
- media: enhanced TODO on do_upload with streaming fix guidance for v2.
Copy link
Copy Markdown
Collaborator

@wesbillman wesbillman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SICK!

- New run_ffmpeg_with_timeout() helper: spawns child, polls try_wait()
  every 500ms, kills the process if the deadline is exceeded.
- Transcode: 10-minute timeout (FFMPEG_TIMEOUT). Generous for any
  reasonable video; pathological inputs get killed instead of blocking
  a Tokio worker thread indefinitely.
- Poster extraction: 30-second timeout. Single-frame decode should
  complete in seconds.
- All three ffmpeg invocations (transcode, poster seek-to-1s, poster
  fallback) now use the timeout wrapper.
Add -loglevel error to all three ffmpeg invocations (transcode, poster
seek-to-1s, poster fallback). Without this, ffmpeg's progress and
diagnostic output can fill the OS pipe buffer (~64 KiB), causing the
child to block on write() and never exit. The timeout wrapper only
reads stderr after exit, so a full pipe creates a deadlock that
manifests as a false timeout after 10 minutes.

-loglevel error suppresses progress spam while preserving actual error
messages (which are small and won't fill the buffer). Added a doc
comment on run_ffmpeg_with_timeout explaining the constraint.
@tlongwell-block tlongwell-block changed the title feat: Phase 1 video upload support (Blossom-compliant) feat: Phase 1 video upload support (Blossom-compliant-ish) Apr 10, 2026
@tlongwell-block tlongwell-block merged commit 61efb88 into main Apr 10, 2026
12 of 13 checks passed
@tlongwell-block tlongwell-block deleted the feat/video-support branch April 10, 2026 16:23
tlongwell-block added a commit that referenced this pull request Apr 11, 2026
…ona-migration

* origin/main:
  feat(desktop): add Pulse social notes surface (#296)
  Fix flaky desktop smoke tests (#294)
  Add agent lifecycle controls to channel members sidebar (#291)
  Update nest_agents.md tagging info (#292)
  feat: add Sprout nest — persistent agent workspace at ~/.sprout (#290)
  Fix auth and SSRF vulns (#261)
  Add per-agent MCP toolset configuration to agent setup (#279)
  feat(desktop): team & persona import/edit flows (#288)
  Remove menu item subtitles and fix persona card overflow (#289)
  feat: Phase 1 video upload support (Blossom-compliant-ish) (#285)
  Add inline subtitles to menu items and field descriptions (#276)
  Improve ephemeral channel affordances and hide archived sidebar rows (#286)
  Fix @mention search to use word-boundary prefix matching (#278)
  Allow bot owners to remove their agents from any channel (#284)
  [codex] Polish agent selectors and settings layout (#283)

# Conflicts:
#	desktop/scripts/check-file-sizes.mjs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants