Skip to content

M016: Enterprise Scale#115

Merged
TerrifiedBug merged 78 commits intomainfrom
feat/m016-enterprise-scale
Mar 27, 2026
Merged

M016: Enterprise Scale#115
TerrifiedBug merged 78 commits intomainfrom
feat/m016-enterprise-scale

Conversation

@TerrifiedBug
Copy link
Copy Markdown
Owner

Summary

Enterprise-scale features for VectorFlow — enabling corporate platform teams to manage hundreds of pipelines across multi-environment fleets of 100+ nodes.

7 phases, 31 requirements, 19 plans executed:

  • Phase 1: Fleet Performance — Alert eval batched in poll loop (not per-heartbeat), SSE connection limit (1000 default), lazy catalog singleton
  • Phase 2: Fleet Organization — Node groups with label enforcement + auto-enrollment, nested pipeline groups (3-level), bulk tag add/remove
  • Phase 3: Fleet Health Dashboard — Group-level health cards with expand/collapse drill-down, filter by group/label/compliance, 30s polling
  • Phase 4: Outbound Webhooks — WebhookEndpoint + WebhookDelivery models, Standard-Webhooks HMAC-SHA256 signing, retry with dead-letter separation, management UI
  • Phase 5: Cross-Env Promotion (UI) — PromotionRequest model, secret pre-flight validation, 5-step promotion wizard with substitution diff, approval workflow, promotion history
  • Phase 6: OpenAPI Specification — 31-operation OpenAPI 3.1 spec (16 REST v1 + 15 tRPC), generation script, serving endpoint, API reference docs
  • Phase 7: Cross-Env Promotion (GitOps) — @octokit/rest PR creation, merge-triggered auto-deploy, setup wizard with connection validation

Stats: 69 files changed, ~11,000 lines added, 935 tests passing (up from 792)

Test plan

  • pnpm test — all 935 tests pass
  • pnpm lint — no new errors
  • pnpm build — production build succeeds
  • Fleet health dashboard renders at /fleet/health with node group cards
  • Pipeline promotion wizard opens from "Promote to..." dropdown
  • Webhook management accessible at /settings/webhooks
  • OpenAPI spec accessible at /api/v1/openapi.json
  • GitOps promotion mode appears in environment GitSync settings

…fold (TDD RED)

- Add MAX_SSE_CONNECTIONS constant (default 1000, configurable via SSE_MAX_CONNECTIONS env var)
- Return 503 + Retry-After: 30 header when limit is reached, before ReadableStream construction
- Create src/lib/vector/__tests__/catalog.test.ts with getVectorCatalog singleton tests (RED until Task 2)
- Delete evaluateAndDeliverAlerts function and its call from heartbeat route
- Remove dead imports: evaluateAlerts, deliverSingleWebhook, deliverToChannels, trackWebhookDelivery
- Update test to assert evaluateAlerts is NOT called (PERF-01 traceability)
- Annotates existing "keepalive removes dead connections" test with PERF-02 marker
- Confirms ghost connection eviction within 30s keepalive interval is already covered
…g() (PERF-04)

- Replace eager export const VECTOR_CATALOG with lazy _catalog singleton
- Add getVectorCatalog() function: builds catalog on first access, returns same reference on repeat calls
- Update findComponentDef() to call getVectorCatalog() internally
- Update component-palette.tsx: import + 3 usages migrated to getVectorCatalog()
- Update library/shared-components/new/page.tsx: import + 2 usages migrated to getVectorCatalog()
- All 4 catalog tests pass (singleton reference equality, findComponentDef lookup)
- Add NodeGroup model with criteria, labelTemplate, requiredLabels JSON fields
- Add parentId self-reference to PipelineGroup (GroupChildren relation)
- Remove PipelineGroup unique(environmentId, name) constraint
- Add @@index([parentId]) to PipelineGroup for efficient child queries
- Add nodeGroups NodeGroup[] relation to Environment model
- Create migration 20260326400000_phase2_fleet_organization
- Regenerate Prisma client with NodeGroup model
- Create nodeGroupRouter with list, create, update, delete operations
- All mutations use withTeamAccess(ADMIN) authorization
- Audit logging via withAudit for created/updated/deleted events
- Unique name validation per environment with CONFLICT error
- NOT_FOUND errors for missing groups on update/delete
- Register nodeGroupRouter in appRouter as trpc.nodeGroup.*
- 12 unit tests covering all CRUD behaviors including error cases
…ent + tests

- Add labelCompliant field to fleet.list response (NODE-02)
  - Queries all NodeGroup requiredLabels for the environment
  - Sets labelCompliant=true when node has all required label keys
  - Vacuously compliant when no NodeGroups have required labels
- Add NODE-03 label template auto-assignment in enrollment route
  - After node creation, finds matching NodeGroups by criteria
  - Merges labelTemplate fields from matching groups into node labels
  - Non-fatal: enrollment succeeds even if template application fails
- Add 3 new fleet.list label compliance tests
- Add 3 enrollment auto-assignment unit tests (match, non-match, empty)
…pth guard

- Add parentId to create/update input schemas
- Replace findUnique compound key check with findFirst for application-layer uniqueness per (environmentId, name, parentId)
- Add depth guard: rejects nesting beyond 3 levels (BAD_REQUEST)
- Update list to include children count in _count
- Update update to support parentId changes with depth enforcement
- Add 11 new tests covering nesting, depth guard, and duplicate name scenarios
…e router

- bulkAddTags: validates tags against team.availableTags before loop, deduplicates via Set, handles partial failures, max 100 pipelines
- bulkRemoveTags: filters specified tags from each pipeline, handles partial failures, max 100 pipelines
- Both procedures return { results, total, succeeded } summary
- 11 tests covering all behaviors including partial failures, deduplication, and validation
- Add NodeGroup model with criteria, labelTemplate, requiredLabels JSON fields
- Add parentId self-reference to PipelineGroup (GroupChildren relation)
- Remove PipelineGroup unique(environmentId, name) constraint
- Add @@index([parentId]) to PipelineGroup for efficient child queries
- Add nodeGroups NodeGroup[] relation to Environment model
- Create migration 20260326400000_phase2_fleet_organization
- Regenerate Prisma client with NodeGroup model
- Create nodeGroupRouter with list, create, update, delete operations
- All mutations use withTeamAccess(ADMIN) authorization
- Audit logging via withAudit for created/updated/deleted events
- Unique name validation per environment with CONFLICT error
- NOT_FOUND errors for missing groups on update/delete
- Register nodeGroupRouter in appRouter as trpc.nodeGroup.*
- 12 unit tests covering all CRUD behaviors including error cases
…ent + tests

- Add labelCompliant field to fleet.list response (NODE-02)
  - Queries all NodeGroup requiredLabels for the environment
  - Sets labelCompliant=true when node has all required label keys
  - Vacuously compliant when no NodeGroups have required labels
- Add NODE-03 label template auto-assignment in enrollment route
  - After node creation, finds matching NodeGroups by criteria
  - Merges labelTemplate fields from matching groups into node labels
  - Non-fatal: enrollment succeeds even if template application fails
- Add 3 new fleet.list label compliance tests
- Add 3 enrollment auto-assignment unit tests (match, non-match, empty)
…pth guard

- Add parentId to create/update input schemas
- Replace findUnique compound key check with findFirst for application-layer uniqueness per (environmentId, name, parentId)
- Add depth guard: rejects nesting beyond 3 levels (BAD_REQUEST)
- Update list to include children count in _count
- Update update to support parentId changes with depth enforcement
- Add 11 new tests covering nesting, depth guard, and duplicate name scenarios
…e router

- bulkAddTags: validates tags against team.availableTags before loop, deduplicates via Set, handles partial failures, max 100 pipelines
- bulkRemoveTags: filters specified tags from each pipeline, handles partial failures, max 100 pipelines
- Both procedures return { results, total, succeeded } summary
- 11 tests covering all behaviors including partial failures, deduplication, and validation
- Create NodeGroupManagement card component with full CRUD (list/create/update/delete)
- Key-value pair editor for criteria and label template fields
- Tag input for required labels
- Warning banner when criteria is empty (matches all enrolling nodes)
- Delete confirmation via ConfirmDialog
- Add NodeGroupManagement section to fleet-settings.tsx
- Add Non-compliant amber badge to fleet node list when labelCompliant === false
- Fix pre-existing rawNodes useMemo dependency warning in fleet page
- Add "Node groups" section with field reference table and enrollment hint
- Add "Label compliance" section explaining Non-compliant badge behavior
…e-to-group menu

- Create PipelineGroupTree component with recursive collapsible tree, expand/collapse, folder icons, colored dots, pipeline counts
- Export buildGroupTree and buildBreadcrumbs helpers for reuse in pipelines page
- Add parent group selector to ManageGroupsDialog create form (filters eligible parents to depth < 3)
- Integrate PipelineGroupTree as sidebar in pipelines page with group selection
- Add breadcrumb navigation above pipeline list when a group is selected
- Replace flat move-to-group dropdown with recursive nested hierarchy via renderGroupMenuItems
- Add bulkAddTags and bulkRemoveTags mutations using the Plan 02 tRPC endpoints
- Show tag selection dialog for each operation (checkbox list when team has availableTags, text input otherwise)
- Loading toast during mutation via toast.loading, dismissed on settle
- Partial failure display reuses existing resultSummary dialog pattern
- Separate dialogs per plan decision: "Separate add-tags and remove-tags operations"
…seline

- NodeGroup Prisma model + PipelineGroup parentId migration
- NodeGroup tRPC router with CRUD + enrollment auto-assignment
- Fleet label compliance, node group management UI
- Pipeline group tree, bulk tags, nested groups
…ed nodeMatchesGroup util

- Extract nodeMatchesGroup to src/lib/node-group-utils.ts (shared util)
- Update enrollment route to use shared util instead of inline logic
- Add groupHealthStats procedure: per-group onlineCount/alertCount/complianceRate/totalNodes in 3 parallel queries
- Add nodesInGroup procedure: per-node drill-down sorted by status (worst first) with cpuLoad and labelCompliant
- Synthetic '__ungrouped__' entry for nodes matching no group criteria
- 27 tests passing: 15 for new procedures + 12 existing tests unchanged
# Conflicts:
#	.planning/STATE.md
#	src/app/api/agent/enroll/route.ts
#	src/server/routers/__tests__/node-group.test.ts
#	src/server/routers/node-group.ts
…seline

- NodeGroup Prisma model + PipelineGroup parentId migration
- NodeGroup tRPC router with CRUD + enrollment auto-assignment
- Fleet label compliance, node group management UI
- Pipeline group tree, bulk tags, nested groups
…ed nodeMatchesGroup util

- Extract nodeMatchesGroup to src/lib/node-group-utils.ts (shared util)
- Update enrollment route to use shared util instead of inline logic
- Add groupHealthStats procedure: per-group onlineCount/alertCount/complianceRate/totalNodes in 3 parallel queries
- Add nodesInGroup procedure: per-node drill-down sorted by status (worst first) with cpuLoad and labelCompliant
- Synthetic '__ungrouped__' entry for nodes matching no group criteria
- 27 tests passing: 15 for new procedures + 12 existing tests unchanged
…and filter toolbar

- Add FleetHealthDashboard: group-level summary cards with polling (30s)
- Add NodeGroupHealthCard: collapsible with online/alert/compliance metrics
- Add NodeGroupDetailTable: per-node drill-down with status, CPU, last seen, compliance
- Add FleetHealthToolbar: group filter, label filter, compliance toggle pills
- Wire URL query param state (group, label, compliance) for shareable links
- Add Health tab to fleet page navigation
- Create /fleet/health route page
- Document group summary cards (online, alerts, compliance metrics)
- Document drill-down per-node table and sort order
- Document group/label/compliance filter toolbar with URL param sharing
- Document Ungrouped card behavior
- New page at operations/outbound-webhooks.md covering setup,
  payload format, signature verification, retry schedule, delivery
  history, and endpoint management
- Add to docs/public/SUMMARY.md nav
… tRPC router

- Add PromotionRequest Prisma model with PENDING/APPROVED/DEPLOYED/REJECTED/CANCELLED statuses
- Add migration 20260327000000_add_promotion_request with FK constraints and indexes
- Add relation fields to Pipeline, Environment, and User models
- Create promotion-service.ts: preflightSecrets, executePromotion, generateDiffPreview
- Create promotionRouter with 7 procedures: preflight, diffPreview, initiate, approve, reject, cancel, history
- Wire approval workflow: self-review guard, atomic updateMany race prevention
- executePromotion preserves SECRET[name] refs (no transformConfig stripping)
- fires promotion_completed outbound webhook after execute
- Register promotionRouter on appRouter as "promotion"
- Add PromotionRequest team resolution in withTeamAccess middleware
…l paths

- 22 tests across preflight, diffPreview, initiate, approve, reject, cancel, history, SECRET refs
- Tests: preflight blocks when secrets missing, passes when all present, passes with no refs
- Tests: initiate creates PENDING (approval required), auto-executes (no approval), same-env guard, cross-team guard, name collision, missing secrets
- Tests: approve self-review blocked, atomic race guard, succeeds for different user
- Tests: reject sets REJECTED with note, cancel only allows promoter
- Tests: history ordered by createdAt desc with take 20
- Tests: clone preserves SECRET refs (no stripping), diffPreview shows env var placeholders
…wizard

- 5-step state machine: target -> preflight -> diff -> confirm -> result
- Step 2: preflight check with missing secrets list, blocks promotion if canProceed=false
- Step 2: name collision warning with amber alert
- Step 3: ConfigDiff showing source vs target YAML with env var substitution note
- Step 4: fires promotion.initiate mutation with spinner
- Step 5: pending-approval (Clock) vs auto-deployed (CheckCircle) result messages
- Invalidates pipeline.list and promotion.history query caches on success
- Component export name and props interface unchanged - pipelines/page.tsx unaffected
…docs

- PromotionHistory component queries promotion.history, renders table with
  source env, target env, promoted by, date, and status badge
  (DEPLOYED=default, PENDING/APPROVED=secondary, REJECTED=destructive, CANCELLED=outline)
- Returns null when no promotion history to avoid empty section clutter
- Rendered at bottom of pipeline editor layout after logs panel
- Docs: added Cross-Environment Promotion section covering workflow,
  approval, secret pre-flight validation, and promotion history
… v1 endpoints

- Install @asteasolutions/zod-to-openapi 8.5.0 (Zod v4 compatible)
- Create generateOpenAPISpec() covering all 16 REST v1 operations
- BearerAuth security scheme registered; every operation references it
- Schemas match exact wire shapes from route handlers (dates as ISO strings)
- TDD: 7 tests verify structure, security, request/response schemas
…ion script

- Add GET /api/v1/openapi.json (public, no auth) with CORS headers
- Add OPTIONS preflight handler for CORS
- Create scripts/generate-openapi.ts that writes public/openapi.json
- Add generate:openapi script to package.json (tsx scripts/generate-openapi.ts)
- Spec generates 12 paths / 16 operations
- Register CookieAuth security scheme (apiKey in cookie)
- Add 15 tRPC procedures: pipeline.list/get/create/update/delete,
  deploy.agent/undeploy, fleet.list/get, environment.list,
  secret.list/create, alert.listRules, serviceAccount.list/create
- Queries map to GET with ?input= SuperJSON param, mutations to POST
  with {"json": <input>} body
- All tRPC ops tagged "tRPC" and secured with CookieAuth
- 8 new TDD tests verify tRPC paths, methods, tags, security, count
…ction

- generate-openapi.ts: log REST v1 vs tRPC operation counts separately,
  add duplicate operationId and empty-path validation checks
- docs/public/reference/api.md: add OpenAPI Specification section with
  fetch/import/client-generation instructions and surface comparison table
…ps promotion

- Install @octokit/rest 22.0.1 for GitHub API interactions
- Add prUrl and prNumber fields to PromotionRequest model
- Update gitOpsMode comment to document "promotion" as valid value
- Add AWAITING_PR_MERGE and DEPLOYING to status comment
- Create migration 20260327100000_add_gitops_promotion_fields
…sing @octokit/rest

- Implements createPromotionPR() that creates branch, commits YAML, opens PR
- Parses owner/repo from both HTTPS and SSH GitHub URL formats
- Embeds promotion request ID in PR body for merge webhook correlation
- Branch name includes requestId prefix to prevent collision
- Unit tests covering all PR creation steps and URL parsing (14 tests)
- Load gitOpsMode, gitRepoUrl, gitToken, gitBranch from target environment
- When gitOpsMode=promotion: generate pipeline YAML, call createPromotionPR,
  update PromotionRequest with prUrl/prNumber/AWAITING_PR_MERGE status
- Existing UI path (Phase 5) unchanged when gitOpsMode != promotion
- Add 4 new tests: AWAITING_PR_MERGE return, prUrl/prNumber update, fallthrough
  to UI path for off and push modes (26 total tests pass)
- Handle X-GitHub-Event header: ping returns pong, pull_request routes to merge handler
- Update HMAC lookup to include both bidirectional and promotion gitOpsMode environments
- PR merge handler: checks action=closed AND merged=true (not-merged PRs ignored)
- Extracts VF promotion ID from PR body comment
- Atomic updateMany AWAITING_PR_MERGE->DEPLOYING for idempotency (GitHub retry safe)
- Calls executePromotion with original promoter as audit actor
- 11 unit tests covering merge, ignore cases, idempotency, HMAC validation
- Extend gitOpsMode Zod enum to include "promotion" value
- Auto-generate webhook secret when switching to "promotion" mode (same as bidirectional)
- Clear webhook secret when switching away from webhook-based modes
…uide

- Add "Promotion (PR-based)" option to gitOpsMode dropdown
- Show inline step-by-step setup guide when promotion mode is selected
- Display webhook URL and one-time webhook secret with copy buttons
- Guide explains GitHub webhook configuration: pull_request events, payload URL, secret
- Fix handleSave type cast to accept "promotion" value
- Add 07-03-PLAN.md describing GitOps promotion mode UI implementation
- Add 07-03-SUMMARY.md with implementation details and decisions
- Update STATE.md: advance plan counter, record metrics, add decisions
- Update ROADMAP.md: phase 07 progress updated
These are local planning artifacts that should not be committed.
The .planning/ directory is already in .gitignore.
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 27, 2026

Greptile Summary

This PR delivers 7 phases of enterprise-scale features: fleet node groups with label enforcement, a fleet health dashboard, outbound webhooks (Standard-Webhooks signed), cross-environment pipeline promotion (UI + GitOps PR flow), an OpenAPI 3.1 spec, and several performance improvements. The scope is large (~11k lines) but well-structured — each feature has its own router, service, and tests. Three issues need attention before merge.

  • P1 — msgId mismatch in outbound webhook delivery (outbound-webhook.ts): dispatchWithTracking generates a msgId for the WebhookDelivery DB record, while deliverOutboundWebhook independently generates its own msgId for the webhook-id HTTP header and HMAC signing string. These are always different UUIDs, which breaks Standard-Webhooks deduplication for receivers and makes VectorFlow's delivery history UI uncorrelatable with received webhook IDs.
  • P1 — Cross-team node data exposure in nodesInGroup (node-group.ts): The withTeamAccess("VIEWER") check is scoped to input.environmentId, but the node group is fetched by groupId alone (no environment scope). Nodes are then queried against group.environmentId, which can point to a different team's environment if a crafted groupId is passed.
  • P1 — testDelivery mutation missing withAudit (webhook-endpoint.ts): Every mutation must carry withAudit per the project security policy; testDelivery is the only new mutation without it.
  • P1 (verify) — withTeamAccess may not resolve from requestId (promotion.ts): approve and reject pass { requestId } to withTeamAccess("EDITOR"). The middleware resolves team context from known field names (id, teamId, environmentId, pipelineId); confirm it handles requestId, or rename the field to align with the convention.

Confidence Score: 3/5

Not safe to merge as-is — cross-team node data exposure is a live security bug, and the msgId mismatch breaks the Standard-Webhooks contract on day one.

Three confirmed P1 issues: a cross-team data exposure vulnerability in the node-group router, a correctness bug in outbound webhook ID tracking that will manifest on every first delivery, and a missing audit trail on the test-delivery mutation. Additionally, withTeamAccess resolution from requestId needs explicit verification. The rest of the code is well-written and structurally sound.

src/server/routers/node-group.ts (cross-team exposure), src/server/services/outbound-webhook.ts (msgId mismatch), src/server/routers/webhook-endpoint.ts (missing audit), src/server/routers/promotion.ts (withTeamAccess field name)

Important Files Changed

Filename Overview
src/server/services/outbound-webhook.ts New outbound webhook delivery service — has a msgId mismatch bug where the DB record ID and the webhook-id HTTP header are always different UUIDs
src/server/routers/node-group.ts New node group router — nodesInGroup fetches nodes using group.environmentId without verifying the group belongs to input.environmentId, enabling cross-team node data exposure
src/server/routers/webhook-endpoint.ts New webhook endpoint management router — solid auth/ownership checks throughout, but testDelivery mutation is missing withAudit middleware
src/server/routers/promotion.ts New promotion router — good atomic race-condition guards and self-review block; approve/reject use requestId as input which may not be resolvable by withTeamAccess
src/server/services/gitops-promotion.ts New GitOps PR-creation service — correctly decrypts token, parses GitHub URLs, creates unique branch names, and embeds promotion ID in PR body for webhook correlation
src/app/api/webhooks/git/route.ts Extended git webhook handler — adds PR merge trigger for GitOps promotions with HMAC validation, idempotency guard, and atomic status transition; overall sound
src/server/services/retry-service.ts Extended retry service to handle outbound webhook retries — correctly excludes dead_letter records, uses atomic claim pattern, and handles disabled endpoints
src/app/api/sse/route.ts Adds per-instance SSE connection limit (default 1000) with 503 + Retry-After response — clean and correct
prisma/schema.prisma Adds NodeGroup, PromotionRequest, WebhookEndpoint, WebhookDelivery models with correct cascade deletes, indexes, and enum additions
src/app/api/agent/enroll/route.ts Adds auto-enrollment label template application from matching NodeGroups — correctly non-fatal, well-scoped to the new node's environment

Sequence Diagram

sequenceDiagram
    participant UI as Browser
    participant tRPC as tRPC Router
    participant Octokit as GitHub API
    participant GHWebhook as /api/webhooks/git
    participant PromSvc as promotion-service
    participant OutWH as outbound-webhook

    Note over UI,OutWH: Phase 5/7 – Pipeline Promotion (GitOps path)

    UI->>tRPC: promotion.initiate(pipelineId, targetEnvId)
    tRPC->>PromSvc: preflightSecrets()
    PromSvc-->>tRPC: {canProceed, missing[]}
    tRPC->>Octokit: createPromotionPR(encryptedToken, yaml)
    Octokit-->>tRPC: {prNumber, prUrl}
    tRPC-->>UI: {status: AWAITING_PR_MERGE, prUrl}

    Note over Octokit,GHWebhook: Developer merges PR on GitHub
    Octokit->>GHWebhook: POST pull_request (merged)
    GHWebhook->>GHWebhook: HMAC verify + parse requestId from PR body
    GHWebhook->>GHWebhook: atomic updateMany(AWAITING_PR_MERGE→DEPLOYING)
    GHWebhook->>PromSvc: executePromotion(requestId, promotedById)
    PromSvc-->>GHWebhook: {pipelineId, pipelineName}
    GHWebhook->>OutWH: fireOutboundWebhooks(promotion_completed)
    GHWebhook-->>Octokit: {deployed: true}

    Note over UI,OutWH: Phase 4 – Outbound Webhooks (event delivery)
    PromSvc->>OutWH: fireOutboundWebhooks(metric, teamId, payload)
    OutWH->>OutWH: dispatchWithTracking() — creates WebhookDelivery record
    OutWH->>OutWH: deliverOutboundWebhook() — HMAC sign + POST
    Note right of OutWH: ⚠ msgId in DB ≠ webhook-id header
    OutWH-->>PromSvc: success / retryable / dead_letter
Loading

Comments Outside Diff (4)

  1. src/server/services/outbound-webhook.ts, line 1476-1494 (link)

    P1 msgId stored in DB doesn't match webhook-id HTTP header

    dispatchWithTracking generates a msgId at line 1481 and stores it in the WebhookDelivery record. But deliverOutboundWebhook (called on line 1494) internally generates its own fresh msgId (via crypto.randomUUID()) at runtime, which is what actually goes into the webhook-id header and the HMAC signing string.

    These two UUIDs are always different. This breaks the Standard-Webhooks contract: the webhook-id that receivers use for idempotency/deduplication has no relationship to the msgId stored in WebhookDelivery. Anyone inspecting delivery history in the management UI cannot correlate it with the webhook-id they received.

    The fix is to generate msgId once in dispatchWithTracking and pass it into deliverOutboundWebhook as a parameter, so both the DB record and the HTTP header share the same ID.

    // In dispatchWithTracking — pass msgId to the delivery function
    const msgId = crypto.randomUUID();
    const delivery = await prisma.webhookDelivery.create({ data: { ..., msgId } });
    const result = await deliverOutboundWebhook(endpoint, payload, msgId);
    // In deliverOutboundWebhook — accept an optional msgId instead of always creating one
    export async function deliverOutboundWebhook(
      endpoint: EndpointLike,
      payload: OutboundPayload,
      msgId = crypto.randomUUID(), // default for one-off/test calls
    ): Promise<OutboundResult> {

    Rule Used: ## Security & Cryptography Review Rules

    When revi... (source)

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: src/server/services/outbound-webhook.ts
    Line: 1476-1494
    
    Comment:
    **`msgId` stored in DB doesn't match `webhook-id` HTTP header**
    
    `dispatchWithTracking` generates a `msgId` at line 1481 and stores it in the `WebhookDelivery` record. But `deliverOutboundWebhook` (called on line 1494) internally generates its own fresh `msgId` (via `crypto.randomUUID()`) at runtime, which is what actually goes into the `webhook-id` header and the HMAC signing string.
    
    These two UUIDs are always different. This breaks the Standard-Webhooks contract: the `webhook-id` that receivers use for idempotency/deduplication has no relationship to the `msgId` stored in `WebhookDelivery`. Anyone inspecting delivery history in the management UI cannot correlate it with the `webhook-id` they received.
    
    The fix is to generate `msgId` once in `dispatchWithTracking` and pass it into `deliverOutboundWebhook` as a parameter, so both the DB record and the HTTP header share the same ID.
    
    ```typescript
    // In dispatchWithTracking — pass msgId to the delivery function
    const msgId = crypto.randomUUID();
    const delivery = await prisma.webhookDelivery.create({ data: { ..., msgId } });
    const result = await deliverOutboundWebhook(endpoint, payload, msgId);
    ```
    
    ```typescript
    // In deliverOutboundWebhook — accept an optional msgId instead of always creating one
    export async function deliverOutboundWebhook(
      endpoint: EndpointLike,
      payload: OutboundPayload,
      msgId = crypto.randomUUID(), // default for one-off/test calls
    ): Promise<OutboundResult> {
    ```
    
    **Rule Used:** ## Security & Cryptography Review Rules
    
    When revi... ([source](https://app.greptile.com/review/custom-context?memory=7cb20c56-ca6a-40aa-8660-7fa75e6e3db2))
    
    How can I resolve this? If you propose a fix, please make it concise.
  2. src/server/routers/node-group.ts, line 426-461 (link)

    P1 Cross-team node data exposure via unchecked groupId

    withTeamAccess("VIEWER") resolves the team check from input.environmentId. However, the group is fetched without verifying it belongs to that same environment:

    const group = await prisma.nodeGroup.findUnique({
      where: { id: groupId },  // no environmentId check
    });
    // ...
    const allNodes = await prisma.vectorNode.findMany({
      where: { environmentId: group.environmentId },  // uses the group's env, not input.environmentId
    });

    An attacker can pass their own valid environmentId (to pass the VIEWER check) alongside a groupId belonging to a different team's environment. The team auth check passes, but the node query returns data scoped to the other team's group.environmentId.

    Fix: scope the group lookup to input.environmentId:

    const group = await prisma.nodeGroup.findFirst({
      where: { id: groupId, environmentId: input.environmentId },
    });

    Rule Used: ## Security & Cryptography Review Rules

    When revi... (source)

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: src/server/routers/node-group.ts
    Line: 426-461
    
    Comment:
    **Cross-team node data exposure via unchecked `groupId`**
    
    `withTeamAccess("VIEWER")` resolves the team check from `input.environmentId`. However, the group is fetched without verifying it belongs to that same environment:
    
    ```typescript
    const group = await prisma.nodeGroup.findUnique({
      where: { id: groupId },  // no environmentId check
    });
    // ...
    const allNodes = await prisma.vectorNode.findMany({
      where: { environmentId: group.environmentId },  // uses the group's env, not input.environmentId
    });
    ```
    
    An attacker can pass their own valid `environmentId` (to pass the VIEWER check) alongside a `groupId` belonging to a different team's environment. The team auth check passes, but the node query returns data scoped to the other team's `group.environmentId`.
    
    Fix: scope the group lookup to `input.environmentId`:
    
    ```typescript
    const group = await prisma.nodeGroup.findFirst({
      where: { id: groupId, environmentId: input.environmentId },
    });
    ```
    
    **Rule Used:** ## Security & Cryptography Review Rules
    
    When revi... ([source](https://app.greptile.com/review/custom-context?memory=7cb20c56-ca6a-40aa-8660-7fa75e6e3db2))
    
    How can I resolve this? If you propose a fix, please make it concise.
  3. src/server/routers/webhook-endpoint.ts, line 1123-1152 (link)

    P1 testDelivery mutation is missing withAudit

    Every mutation must use withAudit(action, entityType) per the project's security policy. testDelivery is a mutation that triggers an outbound HTTP request and creates/updates a WebhookDelivery record, but it has no audit middleware:

    testDelivery: protectedProcedure
      .input(z.object({ id: z.string(), teamId: z.string() }))
      .use(withTeamAccess("ADMIN"))
      // ← missing: .use(withAudit("webhookEndpoint.testDelivery", "WebhookEndpoint"))
      .mutation(async ({ input }) => {

    Add .use(withAudit("webhookEndpoint.testDelivery", "WebhookEndpoint")) after withTeamAccess.

    Rule Used: ## Security & Cryptography Review Rules

    When revi... (source)

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: src/server/routers/webhook-endpoint.ts
    Line: 1123-1152
    
    Comment:
    **`testDelivery` mutation is missing `withAudit`**
    
    Every mutation must use `withAudit(action, entityType)` per the project's security policy. `testDelivery` is a mutation that triggers an outbound HTTP request and creates/updates a `WebhookDelivery` record, but it has no audit middleware:
    
    ```typescript
    testDelivery: protectedProcedure
      .input(z.object({ id: z.string(), teamId: z.string() }))
      .use(withTeamAccess("ADMIN"))
      // ← missing: .use(withAudit("webhookEndpoint.testDelivery", "WebhookEndpoint"))
      .mutation(async ({ input }) => {
    ```
    
    Add `.use(withAudit("webhookEndpoint.testDelivery", "WebhookEndpoint"))` after `withTeamAccess`.
    
    **Rule Used:** ## Security & Cryptography Review Rules
    
    When revi... ([source](https://app.greptile.com/review/custom-context?memory=7cb20c56-ca6a-40aa-8660-7fa75e6e3db2))
    
    How can I resolve this? If you propose a fix, please make it concise.
  4. src/server/routers/promotion.ts, line 784-788 (link)

    P1 approve and reject use EDITOR role — withTeamAccess may not resolve from requestId

    Both approve and reject declare .use(withTeamAccess("EDITOR")) with requestId as the sole input field. The architecture docs list the fields withTeamAccess recognises for team resolution: teamId, environmentId, pipelineId, and id. The field here is named requestId, not id.

    If withTeamAccess only matches on the exact field name id, the middleware won't be able to resolve a team context from requestId, which could mean the authorization check silently passes regardless of the caller's actual team membership.

    Check the withTeamAccess implementation to confirm it handles requestId — or rename the input field to id for consistency with the convention, or explicitly pass a teamId alongside it.

    Rule Used: ## Code Style & Conventions

    TypeScript Conven... (source)

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: src/server/routers/promotion.ts
    Line: 784-788
    
    Comment:
    **`approve` and `reject` use `EDITOR` role — `withTeamAccess` may not resolve from `requestId`**
    
    Both `approve` and `reject` declare `.use(withTeamAccess("EDITOR"))` with `requestId` as the sole input field. The architecture docs list the fields `withTeamAccess` recognises for team resolution: `teamId`, `environmentId`, `pipelineId`, and `id`. The field here is named `requestId`, not `id`.
    
    If `withTeamAccess` only matches on the exact field name `id`, the middleware won't be able to resolve a team context from `requestId`, which could mean the authorization check silently passes regardless of the caller's actual team membership.
    
    Check the `withTeamAccess` implementation to confirm it handles `requestId` — or rename the input field to `id` for consistency with the convention, or explicitly pass a `teamId` alongside it.
    
    **Rule Used:** ## Code Style & Conventions
    
    ### TypeScript Conven... ([source](https://app.greptile.com/review/custom-context?memory=6ae51394-d0b6-4686-bc4c-1ad840c2e310))
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/server/services/outbound-webhook.ts
Line: 1476-1494

Comment:
**`msgId` stored in DB doesn't match `webhook-id` HTTP header**

`dispatchWithTracking` generates a `msgId` at line 1481 and stores it in the `WebhookDelivery` record. But `deliverOutboundWebhook` (called on line 1494) internally generates its own fresh `msgId` (via `crypto.randomUUID()`) at runtime, which is what actually goes into the `webhook-id` header and the HMAC signing string.

These two UUIDs are always different. This breaks the Standard-Webhooks contract: the `webhook-id` that receivers use for idempotency/deduplication has no relationship to the `msgId` stored in `WebhookDelivery`. Anyone inspecting delivery history in the management UI cannot correlate it with the `webhook-id` they received.

The fix is to generate `msgId` once in `dispatchWithTracking` and pass it into `deliverOutboundWebhook` as a parameter, so both the DB record and the HTTP header share the same ID.

```typescript
// In dispatchWithTracking — pass msgId to the delivery function
const msgId = crypto.randomUUID();
const delivery = await prisma.webhookDelivery.create({ data: { ..., msgId } });
const result = await deliverOutboundWebhook(endpoint, payload, msgId);
```

```typescript
// In deliverOutboundWebhook — accept an optional msgId instead of always creating one
export async function deliverOutboundWebhook(
  endpoint: EndpointLike,
  payload: OutboundPayload,
  msgId = crypto.randomUUID(), // default for one-off/test calls
): Promise<OutboundResult> {
```

**Rule Used:** ## Security & Cryptography Review Rules

When revi... ([source](https://app.greptile.com/review/custom-context?memory=7cb20c56-ca6a-40aa-8660-7fa75e6e3db2))

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/server/routers/node-group.ts
Line: 426-461

Comment:
**Cross-team node data exposure via unchecked `groupId`**

`withTeamAccess("VIEWER")` resolves the team check from `input.environmentId`. However, the group is fetched without verifying it belongs to that same environment:

```typescript
const group = await prisma.nodeGroup.findUnique({
  where: { id: groupId },  // no environmentId check
});
// ...
const allNodes = await prisma.vectorNode.findMany({
  where: { environmentId: group.environmentId },  // uses the group's env, not input.environmentId
});
```

An attacker can pass their own valid `environmentId` (to pass the VIEWER check) alongside a `groupId` belonging to a different team's environment. The team auth check passes, but the node query returns data scoped to the other team's `group.environmentId`.

Fix: scope the group lookup to `input.environmentId`:

```typescript
const group = await prisma.nodeGroup.findFirst({
  where: { id: groupId, environmentId: input.environmentId },
});
```

**Rule Used:** ## Security & Cryptography Review Rules

When revi... ([source](https://app.greptile.com/review/custom-context?memory=7cb20c56-ca6a-40aa-8660-7fa75e6e3db2))

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/server/routers/webhook-endpoint.ts
Line: 1123-1152

Comment:
**`testDelivery` mutation is missing `withAudit`**

Every mutation must use `withAudit(action, entityType)` per the project's security policy. `testDelivery` is a mutation that triggers an outbound HTTP request and creates/updates a `WebhookDelivery` record, but it has no audit middleware:

```typescript
testDelivery: protectedProcedure
  .input(z.object({ id: z.string(), teamId: z.string() }))
  .use(withTeamAccess("ADMIN"))
  // ← missing: .use(withAudit("webhookEndpoint.testDelivery", "WebhookEndpoint"))
  .mutation(async ({ input }) => {
```

Add `.use(withAudit("webhookEndpoint.testDelivery", "WebhookEndpoint"))` after `withTeamAccess`.

**Rule Used:** ## Security & Cryptography Review Rules

When revi... ([source](https://app.greptile.com/review/custom-context?memory=7cb20c56-ca6a-40aa-8660-7fa75e6e3db2))

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/server/routers/promotion.ts
Line: 784-788

Comment:
**`approve` and `reject` use `EDITOR` role — `withTeamAccess` may not resolve from `requestId`**

Both `approve` and `reject` declare `.use(withTeamAccess("EDITOR"))` with `requestId` as the sole input field. The architecture docs list the fields `withTeamAccess` recognises for team resolution: `teamId`, `environmentId`, `pipelineId`, and `id`. The field here is named `requestId`, not `id`.

If `withTeamAccess` only matches on the exact field name `id`, the middleware won't be able to resolve a team context from `requestId`, which could mean the authorization check silently passes regardless of the caller's actual team membership.

Check the `withTeamAccess` implementation to confirm it handles `requestId` — or rename the input field to `id` for consistency with the convention, or explicitly pass a `teamId` alongside it.

**Rule Used:** ## Code Style & Conventions

### TypeScript Conven... ([source](https://app.greptile.com/review/custom-context?memory=6ae51394-d0b6-4686-bc4c-1ad840c2e310))

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "chore: remove .planning files from track..." | Re-trigger Greptile

TerrifiedBug and others added 2 commits March 27, 2026 10:23
- Fix cross-team node data exposure in nodesInGroup by scoping group
  lookup to input.environmentId (IDOR prevention)
- Fix msgId mismatch between WebhookDelivery record and webhook-id
  HTTP header by passing msgId through to deliverOutboundWebhook
- Add missing withAudit middleware to testDelivery mutation
@TerrifiedBug TerrifiedBug merged commit 6995ae5 into main Mar 27, 2026
2 of 3 checks passed
@TerrifiedBug TerrifiedBug deleted the feat/m016-enterprise-scale branch March 29, 2026 19:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant