Skip to content

Freebuff waiting room#509

Merged
jahooma merged 32 commits intomainfrom
freebuff-waiting-room
Apr 19, 2026
Merged

Freebuff waiting room#509
jahooma merged 32 commits intomainfrom
freebuff-waiting-room

Conversation

@jahooma
Copy link
Copy Markdown
Contributor

@jahooma jahooma commented Apr 18, 2026

No description provided.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 18, 2026

Greptile Summary

This PR introduces a freebuff waiting room — a queue-based admission system that gates free-tier CLI users before allowing them to send chat requests. It is implemented end-to-end across the server (new free_session table, admission ticker, /api/v1/freebuff/session REST endpoints, and a gate injected into the chat-completions hot path) and the CLI (a polling hook useFreebuffSession, zustand store, and two new terminal UI screens for the waiting room and the "superseded" terminal state).

Key design decisions:

  • Feature-flagged off by default (FREEBUFF_WAITING_ROOM_ENABLED=false), requiring explicit opt-in per environment.
  • Advisory lock + FOR UPDATE SKIP LOCKED in admitFromQueue prevents double-admission across pods.
  • Instance ID rotation on every POST /session enforces one-CLI-per-account.
  • Dependency injection throughout server code makes the layer fully unit-testable without DB.

Notable issues found:

  • No TTL/expiry on queued rows: If a CLI crashes without sending the cleanup DELETE, its queued row stays in the table indefinitely, inflating queue depth and wait-time estimates for all other users.
  • Null claimedInstanceIdsession_superseded: A missing instance ID in an active-session request returns the same 409 code as a genuine takeover, making the two conditions indistinguishable in logs.
  • Non-transactional position + depth reads: queuePositionFor and queueDepth() run in parallel without a transaction, so position can transiently exceed depth on the client display.
  • onDisableAds={() => {}} in WaitingRoomScreen: The no-op handler may render a dead affordance if AdBanner shows a disable button.

Confidence Score: 4/5

Safe to merge behind the feature flag; the queued-row TTL gap is real but won't manifest until the feature is enabled in production.

The implementation is thorough and well-tested with dependency injection throughout. The feature defaults to OFF so no existing request path is affected. The main gap — no TTL on queued rows — is a correctness concern once the feature goes live but does not break the primary admission or chat path. The two P2 issues are minor observability/cosmetic concerns.

web/src/server/free-session/store.ts (queued-row TTL), web/src/server/free-session/public-api.ts (null instanceId path and non-transactional reads)

Important Files Changed

Filename Overview
web/src/server/free-session/store.ts Core DB layer; uses advisory lock + FOR UPDATE SKIP LOCKED for safe concurrent admission. Main concern: no TTL on queued rows means stale entries accumulate indefinitely if CLIs crash without sending DELETE.
web/src/server/free-session/public-api.ts Session gate and state API; null claimedInstanceId is treated as session_superseded (same 409 as a rotated instance), conflating missing-field with takeover. Position + depth queries are non-transactional.
web/src/server/free-session/admission.ts Background admission ticker with overlap-prevention (inFlight guard) and Fireworks health gate. Well-structured with injectable deps for testing.
cli/src/hooks/use-freebuff-session.ts Polling hook managing POST/GET/DELETE lifecycle and bell-on-admission. Module-level activeRefreshHandle enables imperative refresh from gate-error handlers. Cleanup is well-guarded.
cli/src/components/waiting-room-screen.tsx Waiting room UI with elapsed timer, position display, and ad banner. onDisableAds is a silent no-op — acceptable if AdBanner hides the affordance, but worth verifying.
web/src/app/api/v1/chat/completions/_post.ts Gate check injected before rate-limiter; only runs for free-mode requests when waiting room is enabled. Correctly returns structured 4xx by error code.
cli/src/app.tsx AuthedSurface wrapper correctly gates WaitingRoomScreen and FreebuffSupersededScreen before rendering Chat.
packages/internal/src/db/schema.ts New free_session table with correct PK (user_id), cascade FK, and two targeted indexes for queue dequeue and expiry sweep.
web/src/server/free-session/session-view.ts Pure view-projection function with injected clock; well-tested. estimateWaitMs uses wave-based upper-bound formula.
cli/src/hooks/helpers/send-message.ts Gate errors mapped to user-readable messages; superseded is treated as terminal, others trigger a forcePost refresh.

Sequence Diagram

sequenceDiagram
    participant CLI
    participant SessionAPI as /api/v1/freebuff/session
    participant ChatAPI as /api/v1/chat/completions
    participant DB as free_session (Postgres)
    participant Ticker as AdmissionTicker

    CLI->>SessionAPI: POST /session (on startup)
    SessionAPI->>DB: joinOrTakeOver (UPSERT)
    DB-->>SessionAPI: row {status: queued, instanceId}
    SessionAPI-->>CLI: {status: queued, position, estimatedWaitMs}

    loop Poll every 5s while queued
        CLI->>SessionAPI: GET /session
        SessionAPI->>DB: getSessionRow
        DB-->>SessionAPI: row
        SessionAPI-->>CLI: {status: queued | active}
    end

    Ticker->>DB: sweepExpired() + admitFromQueue() every 5s
    Note over Ticker,DB: advisory lock prevents double-admission across pods
    DB-->>Ticker: admitted rows
    Note over CLI: bell plays on queued→active transition

    CLI->>ChatAPI: POST /completions + freebuff_instance_id
    ChatAPI->>DB: checkSessionAdmissible(userId, instanceId)
    alt session active and instance matches
        DB-->>ChatAPI: {ok: true}
        ChatAPI-->>CLI: 200 stream
    else queued / expired / superseded
        DB-->>ChatAPI: {ok: false, code}
        ChatAPI-->>CLI: 428/429/409/410
        CLI->>SessionAPI: refreshFreebuffSession() or markSuperseded()
    end

    CLI->>SessionAPI: DELETE /session (on exit)
    SessionAPI->>DB: endSession(userId)
Loading

Comments Outside Diff (1)

  1. cli/src/components/waiting-room-screen.tsx, line 386-388 (link)

    P2 onDisableAds is a no-op — may render a dead UI affordance

    onDisableAds={() => {}} passes a silent no-op to AdBanner. If AdBanner renders any "Disable ads" affordance based on this prop, clicking it will silently do nothing. Consider passing undefined if AdBanner hides the affordance in that case, or add a comment explaining why no-op is intentional here (e.g. "free users cannot disable ads in the waiting room").

Reviews (1): Last reviewed commit: "Merge branch 'main' into freebuff-waitin..." | Re-trigger Greptile

Comment thread web/src/server/free-session/store.ts Outdated
Comment on lines +155 to +163
sql`(${schema.freeSession.queued_at}, ${schema.freeSession.user_id}) <= (${params.queuedAt}, ${params.userId})`,
),
)
return Number(rows[0]?.n ?? 0)
}

/** Remove rows whose active session has expired. Safe to call repeatedly. */
export async function sweepExpired(now: Date): Promise<number> {
const deleted = await db
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 No TTL on queued rows — stale entries accumulate indefinitely

sweepExpired only removes rows where status = 'active' AND expires_at < now. Queued rows are never automatically removed. A user who joins the queue (POST /session) and then hard-crashes or loses network before their CLI can send the fire-and-forget DELETE will leave a stale status=queued row in the table forever. These phantom entries:

  1. Inflate queueDepth() for all other users, making wait-time estimates wrong.
  2. Keep would-be new users at artificially higher positions until a real admin runs cleanup.

Consider adding a queued_at + max_queue_wait expiry sweep alongside the active-session sweep, e.g.:

// In sweepExpired (store.ts): also remove queued rows idle for too long
await db.delete(schema.freeSession).where(
  and(
    eq(schema.freeSession.status, 'queued'),
    lt(schema.freeSession.queued_at, new Date(now.getTime() - MAX_QUEUE_IDLE_MS)),
  ),
)

Alternatively, track the last heartbeat time and sweep queued rows that haven't polled in N minutes.

Comment on lines +163 to +171
if (!row.expires_at || row.expires_at.getTime() <= now.getTime()) {
return {
ok: false,
code: 'session_expired',
message: 'Your free session has expired. Re-join the waiting room via POST /api/v1/freebuff/session.',
}
}

if (!params.claimedInstanceId || params.claimedInstanceId !== row.active_instance_id) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Null claimedInstanceId returns session_superseded instead of a distinct error

When claimedInstanceId is null or undefined (absent from the request), the code falls through to the session_superseded branch with the message "Another instance of freebuff has taken over this session." This conflates two separate conditions — a missing identifier vs. a rotated one — under the same 409 code.

In the current UI flow this is safe (the WaitingRoomScreen gate prevents users from sending messages before their instance ID is populated), but it makes server-side observability harder and would produce a confusing 409 for any future client that omits the field.

Consider splitting the two cases:

if (!params.claimedInstanceId) {
  return {
    ok: false,
    code: 'waiting_room_required',
    message: 'freebuff_instance_id missing from request. Call POST /api/v1/freebuff/session first.',
  }
}
if (params.claimedInstanceId !== row.active_instance_id) {
  return { ok: false, code: 'session_superseded', message: '...' }
}

Comment on lines +122 to +130
| { ok: false; code: 'session_superseded'; message: string }
| { ok: false; code: 'session_expired'; message: string }

/**
* Called from the chat/completions hot path for free-mode requests. Either
* returns `{ ok: true }` (request may proceed) or a structured rejection
* the caller translates into a 4xx response.
*
* Never trusts client timestamps. The caller supplies `claimedInstanceId`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 queuePositionFor and queueDepth are not fetched in a transaction

These two queries run in parallel outside a transaction. If users are admitted between the two reads, position could exceed depth (e.g. position=3, depth=2), which would be confusing on the client's waiting-room display. This is cosmetic rather than a correctness bug, but wrapping both in a single read-only transaction or a WITH CTE would eliminate the inconsistency.

jahooma and others added 13 commits April 18, 2026 15:52
Replicas=0 or no replicas metric at all (the deployment has been scaled
to zero or dropped from the scrape) now flips that deployment's health
to unhealthy unconditionally, so admission fails closed instead of
funneling users to a backend that cannot serve traffic. Also drop
generationQueueMs degraded 5000 -> 400 and ttftMs degraded 8000 -> 2000,
and comment out the kimi deployment since only glm-5.1 is in production.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
FREEBUFF_MAX_CONCURRENT_SESSIONS is gone. Admission now runs purely
as a drip (MAX_ADMITS_PER_TICK=1 every 15s) gated by the Fireworks
health monitor — utilisation ramps up slowly and pauses the moment
metrics degrade, so a static cap is redundant.

Renamed SessionDeps' getMaxConcurrentSessions/getSessionLengthMs to
getAdmissionTickMs/getMaxAdmitsPerTick (those are what the wait-time
estimate actually needs now). estimateWaitMs is rewritten from the
session-cycle model to the drip model:
  waitMs = ceil((position - 1) / maxAdmitsPerTick) * admissionTickMs
Dropped the 'full' branch of AdmissionTickResult and the full-capacity
admission test — the only reason admission skips now is health.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stdin is in raw mode on these screens, so SIGINT never fires — Ctrl+C
had no effect and users had to kill the process. Now both screens hook
Ctrl+C via OpenTUI's useKeyboard, flush analytics with a 1s cap, and
exit. The waiting-room screen additionally sends a best-effort DELETE
/api/v1/freebuff/session before exit so the seat frees up immediately
instead of waiting on the server-side expiry sweep.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
p50 TTFT degraded 1000 → 1500ms and p50 generation queue degraded 200 →
300ms, so a healthy deployment running at steady-state 1s TTFT does not
trip the admission gate.

scripts/scrape-check.ts pulls the live Fireworks metrics and prints the
same per-deployment health the admission gate sees — useful for tuning
thresholds without guessing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Keep admitting requests for FREEBUFF_SESSION_GRACE_MS (default 30m) after a
session's expires_at so in-flight agent runs can drain; hard cutoff past that.
Also: replicas=0 → unhealthy, hoist chat/completions gate status map, fix
stale threshold comment and a pre-existing free-mode test missing the
checkSessionAdmissible override.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When the server flips the seat into `draining` (past `expires_at`) or
past the hard cutoff, the chat input is replaced by a "Session ended"
banner. While an agent is still streaming under the grace window, Enter
is disabled and the banner shows "Agent is wrapping up. Rejoin the wait
room after it's finished." — Esc still interrupts. Once idle, Enter
re-POSTs /session to drop back into the waiting room.

Adds a small countdown to the far right of the status bar (muted, turning
soft warning in the final minute — no red) and schedules the next poll
just after expires_at / gracePeriodEndsAt so the draining transition
shows up promptly instead of stalling at 0 for a full interval.

Moves getFreebuffInstanceId onto the session hook's module handle and
deletes the now-vestigial freebuff-session-store.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Delete the 1.5k-LOC fireworks-monitor package (Prometheus scrape, health
computation, admin endpoint, CLI scripts) in favor of a single-function
reachability probe inline in free-session/admission.ts: GET the account
metrics endpoint with a 5s timeout and fail closed on non-OK. The
full-health-scoring machinery was load-bearing on nothing — admission only
ever read the boolean gate, and reachability is what actually matters for
halting during an outage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When the waiting-room gate rejects a free-mode request and no
freebuff_instance_id was sent, return 426 with a "please restart to
upgrade" message. Old CLI versions render the message verbatim in
their error banner; new clients still get the normal gate responses.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Collapse client-side draining/ended into a single ended state, move freebuff
session state into zustand (replacing the module-level handle singleton),
host the Fireworks probe inside admitFromQueue, and share the wire types
between server and CLI. Drops ~150 lines net.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jahooma and others added 6 commits April 18, 2026 18:45
The CLI used to layer two client-only states (`superseded`, `ended`) on
top of the server's response and translate `draining` → `ended` in a
mapper. Replaces all of that with a single union the server emits
directly:

  - GET /session now reads `X-Freebuff-Instance-Id` and returns
    `{ status: 'superseded' }` when the active row's id no longer
    matches. Previously this was inferred client-side from the chat
    gate's 409.
  - The wire's `draining` is renamed to `ended` and carries the same
    grace fields. The CLI's post-grace synthetic `ended` (no
    instanceId) reuses that shape.

Also drops the zustand `driver` indirection — imperative session
controls (refresh / mark superseded / mark ended) live as module-level
functions on the hook, talking to a private controller ref. Combines
`refreshFreebuffSession` and `rejoinFreebuffSession` into one with an
optional `{ resetChat }` flag. Inlines the constant getters in
`SessionDeps`/`AdmissionDeps` so tests pass plain numbers, drops the
`limit` parameter from `admitFromQueue` (always 1), and consolidates
the three 1-Hz UI tickers into a shared `useNow` hook.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Ambient bar fills proportionally to time remaining so users
have a passive sense of session progress; a bold warning-colored
"X:XX" readout appears only for the final five minutes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jahooma jahooma merged commit 7e07b1a into main Apr 19, 2026
32 of 34 checks passed
@jahooma jahooma deleted the freebuff-waiting-room branch April 19, 2026 05:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants