Reduce private cloud sync costs: Opus encoding + batch GCS uploads + Filestore spool

## Problem

Private cloud audio sync uploads raw PCM16 chunks (80KB each) directly to GCS every 5 seconds per active session. At current scale (200 concurrent sessions), this generates 3.5M GCS Class A write ops/day (~$518/month). At 100x scale this becomes $51,840/month in ops alone, and the approach doesn't scale further.

### Root Causes
1. **No compression**: Raw PCM16 at 16KB/s stored as-is (or encrypted). Opus encoding would give ~10x reduction in storage and bandwidth.
2. **Per-chunk GCS writes**: Every 5-second chunk = 1 GCS Class A write op. No batching.
3. **In-memory only queue**: Audio chunks are buffered in memory. Pod crash = data loss.

## Proposed Solution (Phased)

### Phase 1: Opus Encoding (Immediate Win)
- Add `opuslib.Encoder(sample_rate, 1, APPLICATION_VOIP)` in pusher
- Encode PCM → Opus **before** encryption: `PCM → Opus → Encrypt`
- ~10x storage reduction: 80KB → ~8KB per 5-second chunk
- CPU cost: ~0.3-0.8% vCPU per session (32.5ms per 5s chunk)
- New extensions: `.opus` (standard) / `.opus.enc` (enhanced)
- Update `storage.py` list/delete/download to support new extensions
- Update `merge_conversations.py` extension handling
- Update `conversations.py` duration math (don't assume fixed +5s)

### Phase 2: 60-Second Batch Upload
- Accumulate 60s of chunks before single GCS upload (12x fewer ops)
- Flush triggers: `max_age=60s`, `max_bytes`, conversation end, pod shutdown
- Run batch sync in dedicated worker deployment (not on pusher hot path)
- GCS upload idempotency via deterministic object names

### Phase 3: Filestore Spool (If Durability SLO Requires)
- Write chunks to Filestore NFS mount instead of memory
- State machine: `.open → .ready → .uploading → .done`
- Atomic rename + lease timeout for crash recovery
- Filestore tier selection:
  - 1x: Basic HDD 100GiB on GKE (~$49/mo)
  - 100x: Basic SSD or Zonal (~$768/mo)
  - 10,000x: Must shard across multiple instances

## Cost Impact

| Scale | Current (5s writes) | Phase 2 (60s batches) | Phase 3 (+ Filestore) |
|-------|--------------------|-----------------------|----------------------|
| 1x (200 sessions) | $518/mo | $43/mo | ~$92/mo |
| 100x (20K sessions) | $51,840/mo | $4,320/mo | ~$5,088/mo |
| 10,000x (2M sessions) | $5,184,000/mo | $432,000/mo | + sharding cost |

Opus encoding reduces storage/bandwidth ~10x but does NOT reduce Class A ops (same number of objects unless batched).

## Compatibility Requirements
- Must handle mixed legacy (`.bin`/`.enc`) and new (`.opus`/`.opus.enc`) chunks
- Backward compatible: existing recordings remain readable
- `download_audio_chunks_and_merge()` must decode Opus back to PCM for playback
- Feature-flagged rollout per phase

## Files to Modify
- `backend/routers/pusher.py` — Opus encoder init, batch accumulation
- `backend/utils/other/storage.py` — new extensions, batch upload, Opus-aware list/download
- `backend/utils/encryption.py` — encrypt Opus bytes (same API, different input)
- `backend/database/conversations.py` — flexible duration math
- `backend/utils/conversations/merge_conversations.py` — generic extension handling
- Helm values — Filestore mount config (Phase 3)

## Acceptance Criteria
- [ ] Phase 1: Opus-encoded chunks uploaded to GCS, existing recordings unaffected
- [ ] Phase 2: 60s batch uploads reduce Class A ops by ~12x
- [ ] Phase 3: Filestore spool survives pod restart without data loss
- [ ] All phases: mixed old/new chunk formats handled transparently
- [ ] Tests: unit tests for Opus encode/decode, batch flush triggers, mixed-format retrieval

## References
- Codex analysis: Filestore tier selection, Opus CPU benchmarks, batch sync state machine
- Related: #5377 (Firestore hot-path cost reduction — deployed)
- GCS Class A pricing: $0.005/1,000 ops (regional Standard)
- Filestore pricing: Basic HDD $0.16/GiB/mo, Basic SSD $0.30/GiB/mo


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce private cloud sync costs: Opus encoding + batch GCS uploads + Filestore spool #5418

Problem

Root Causes

Proposed Solution (Phased)

Phase 1: Opus Encoding (Immediate Win)

Phase 2: 60-Second Batch Upload

Phase 3: Filestore Spool (If Durability SLO Requires)

Cost Impact

Compatibility Requirements

Files to Modify

Acceptance Criteria

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Scale	Current (5s writes)	Phase 2 (60s batches)	Phase 3 (+ Filestore)
1x (200 sessions)	$518/mo	$43/mo	~$92/mo
100x (20K sessions)	$51,840/mo	$4,320/mo	~$5,088/mo
10,000x (2M sessions)	$5,184,000/mo	$432,000/mo	+ sharding cost

Reduce private cloud sync costs: Opus encoding + batch GCS uploads + Filestore spool #5418

Description

Problem

Root Causes

Proposed Solution (Phased)

Phase 1: Opus Encoding (Immediate Win)

Phase 2: 60-Second Batch Upload

Phase 3: Filestore Spool (If Durability SLO Requires)

Cost Impact

Compatibility Requirements

Files to Modify

Acceptance Criteria

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions