Skip to content

feat(tools): Langfuse trace-export tool — API integration + sanitised report synthesis [NES-1690]#9240

Open
jaco-brink wants to merge 20 commits into
mainfrom
jacobusbrink/nes-1690-langfuse-trace-export
Open

feat(tools): Langfuse trace-export tool — API integration + sanitised report synthesis [NES-1690]#9240
jaco-brink wants to merge 20 commits into
mainfrom
jacobusbrink/nes-1690-langfuse-trace-export

Conversation

@jaco-brink
Copy link
Copy Markdown
Collaborator

@jaco-brink jaco-brink commented May 21, 2026

Summary

Engineer-run TypeScript CLI at tools/langfuse-export/ that reads Apologist chat traces from the Langfuse Public API, scrubs free-text user PII, computes deterministic usage stats, and uses an OpenRouter LLM to synthesise a shareable HTML report (optional Playwright PDF). The engineer manually uploads the artifact to Google Drive for Aaron.

Implements the NES-1656 export-path spike. Plan: docs/plans/2026-05-21-001-feat-langfuse-trace-export-tool-plan.md.

Layout

src/ splits on the boundary that matters: pipeline/ (pure, unit-tested transforms) vs clients/ (external I/O, manual-verified). Only sanitised pipeline/ output crosses into clients/openrouter, enforced by a branded SanitisedConversation type.

Key decisions

  • Reads via raw fetch, not the SDK. The legacy list endpoints (/api/public/traces, /api/public/observations) that the langfuse SDK v3 wraps time out on Langfuse Cloud. So the tool pages the cursor-based v2 observations index to enumerate traceIds, then GET /api/public/traces/{id} per trace — a by-id read that returns trace context (sessionId, metadata, tags) and the full nested observations (input/output/usage/cost) in one call. Neither call scans a list, so neither times out. (Confirmed by curl against the live API; verified end-to-end — 265 traces fetched, load-test excluded, report rendered.)
  • LLM labels + groups; code computes numbers AND renders excerpt text verbatim — the model never emits figures or quotes, so it can't fabricate or leak. The branded SanitisedConversation type makes "scrubbed before any LLM call" a compile error if violated.
  • --environment is intentionally absent — the chat tracing code sets no environment on traces (NES-1688), so it can't filter anything; load-test exclusion is --discriminator instead.
  • Doppler home = journeys/dev (the chat app already has all four keys there); the tool fetches them into a gitignored tool-local .env. Output is gitignored (chat-derived content).

Test plan

  • 80 unit tests pass (npx vitest run --config tools/langfuse-export/vitest.config.mts --coverage=false).
  • Typecheck clean (npx tsc -p tools/langfuse-export/tsconfig.json --noEmit).
  • CLI smoke: no-args usage, --help, bad-arg, env-missing error, --throttle/--days validation.
  • Manual end-to-end against live Langfuse — 3-day window, v2 index paged (265 traces), per-trace detail fetched, default discriminator excluded 259/265 load-test turns, report rendered. The observation shape (input = [{role, content:[{type,text}]}], output = string) is now confirmed in code, not assumed.

Code review

Reviewed via ce-code-review (Tier 2, autofix) — correctness/security/adversarial/kieran-typescript/reliability/testing. Fixes applied: arg validation, debug-output PII echo removal, LLM resilience + timeouts, bounded pagination, dead-code removal, parseThemes extraction. Then a structure regroup (clients//pipeline/) and the v2 read-layer correction above.

Known residuals (accepted)

  • [P1, mitigated] ReDoS via engineer-supplied --discriminator message:<regex> — mitigated by capping the tested input to 2 KB + a comment; a full re2/worker-timeout sandbox is deferred. Bounded: default/named discriminators are linear, and a pathological custom regex is the same footgun as grep.
  • [P3] No 429/5xx retry-with-backoff — a transient rate-limit aborts the run with no partial save. Acceptable for a manual tool; add backoff + checkpoint/resume if runs prove painful.
  • [P3] regexScrub misses some PII classes (SSN, card numbers, surnames, lowercase names) — documented best-effort; deferred to NES-1562. --llm-scrub is the stronger pass.

Post-Deploy Monitoring & Validation

No additional operational monitoring required — this is a manual engineer-run CLI under tools/, not wired into any app, service, CI, or deploy. It only reads from Langfuse + OpenRouter (egress) and writes local gitignored files.

🤖 Generated with Claude Code — Opus 4.7 (1M context), compound-engineering /ce-plan → /ce-work

Summary by CodeRabbit

  • New Features

    • Engineer-run CLI to export chat traces, sanitize user text (regex + optional LLM scrub), compute usage statistics, and synthesize AI-generated theme groupings; renders an HTML report with optional PDF.
    • Configurable time windows, environment & load-test filtering, throttling, and debug output options.
  • Documentation

    • Detailed README and run/operational guidance, including env setup and output/upload workflow.
  • Tests

    • Extensive tests covering CLI, env parsing, normalization, sanitization, aggregation, theming, and report rendering.
  • Chores

    • Per-run tool outputs are excluded from source control.

Review Change Stack

@linear
Copy link
Copy Markdown

linear Bot commented May 21, 2026

NES-1690

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 21, 2026

Warning

Rate limit exceeded

@jaco-brink has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 10 minutes and 39 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 9ec33c4a-8f3a-4460-9215-a4feaf423410

📥 Commits

Reviewing files that changed from the base of the PR and between c551cda and 011a018.

📒 Files selected for processing (7)
  • tools/langfuse-export/fetch-env.sh
  • tools/langfuse-export/src/cli.spec.ts
  • tools/langfuse-export/src/cli.ts
  • tools/langfuse-export/src/clients/openrouter.spec.ts
  • tools/langfuse-export/src/clients/openrouter.ts
  • tools/langfuse-export/src/pipeline/aggregate.spec.ts
  • tools/langfuse-export/src/pipeline/aggregate.ts

Walkthrough

This PR adds a TypeScript CLI under tools/langfuse-export that fetches Langfuse traces, normalizes and sanitizes user PII, computes deterministic usage statistics, optionally synthesizes themes via OpenRouter, and renders shareable HTML (optionally PDF). It includes environment loading, CLI parsing, paginated fetching, test coverage, and per-run gitignored outputs.

Changes

Langfuse Trace Export Implementation

Layer / File(s) Summary
Engineering plan, operational documentation, and build scaffolding
docs/plans/2026-05-21-001-feat-langfuse-trace-export-tool-plan.md, tools/langfuse-export/README.md, tools/langfuse-export/.env.example, tools/langfuse-export/fetch-env.sh, tools/langfuse-export/tsconfig.json, tools/langfuse-export/vitest.config.mts, .gitignore
Adds the implementation plan and README, example env and fetch script, tool-local tsconfig and Vitest config, and gitignore entry to exclude per-run output artifacts.
Shared type definitions
tools/langfuse-export/src/types.ts
Defines TraceRecord/ObservationRecord, Conversation/ConversationTurn, branded SanitisedConversation, ReportStats, Theme, and related types.
Environment validation
tools/langfuse-export/src/env.ts, tools/langfuse-export/src/env.spec.ts
Zod schema and parseEnv/loadEnvFile with structured errors and DEFAULT_OPENROUTER_MODEL; tests for missing/invalid values and defaulting behavior.
CLI parsing and window/discriminator
tools/langfuse-export/src/cli.ts, tools/langfuse-export/src/cli.spec.ts
USAGE and parseArgs with --environment, resolveWindow (--days or --from/--to) and parseDiscriminator supporting default/none/message:<regex>/journey:/tag: forms; validation tests included.
Langfuse client and fetch orchestration
tools/langfuse-export/src/clients/langfuse.ts, tools/langfuse-export/src/clients/langfuse.spec.ts
Creates authenticated client, normalizes raw payloads, cursor-paginates observations index to list trace IDs, fetches per-trace details to avoid list timeouts, applies environment guard, and returns { traces, observations }; mapping tests added.
Normalization pipeline
tools/langfuse-export/src/pipeline/normalize.ts, tools/langfuse-export/src/pipeline/normalize.spec.ts
Converts traces/observations into Conversation/ConversationTurn, extracts user/assistant text across content shapes, groups by session/trace/orphan, supports exclusion by regex/journey/tag, and exposes helpers; tests validate grouping, exclusions, and edge cases.
PII scrubbing pipeline
tools/langfuse-export/src/pipeline/sanitize.ts, tools/langfuse-export/src/pipeline/sanitize.spec.ts
Ordered regex redaction rules for URLs/emails/phones/names, regexScrub, optional injected LLM scrub callback, and sanitize producing SanitisedConversation[]; tests for redaction and llmScrub behavior plus type-level guard.
OpenRouter integration and LLM helpers
tools/langfuse-export/src/clients/openrouter.ts, tools/langfuse-export/src/clients/openrouter.spec.ts
Model factory, extractJsonObject/parseThemes, synthesizeThemes to label/group conversations with JSON output parsing and validation, createLlmScrub with graceful degradation; tests for parsing/filtering.
PDF rendering helper
tools/langfuse-export/src/clients/pdf.ts
renderPdf using Playwright Chromium with targeted error messaging when Chromium is missing.
Aggregation and report rendering
tools/langfuse-export/src/pipeline/aggregate.ts, tools/langfuse-export/src/pipeline/aggregate.spec.ts, tools/langfuse-export/src/pipeline/report.ts
Deterministic buildStats (counts, costs, per-model/day, latency percentiles, top questions) and renderReport producing escaped HTML with conditional theme grouping or fallback excerpts; tests for metrics and HTML output.
CLI entrypoint and orchestration
tools/langfuse-export/run.ts
Main run.ts orchestrates arg parsing, env loading, fetch->normalize->sanitize->aggregate, optional theme synthesis and PDF, writes timestamped outputs and optional debug ndjson, prints upload instructions, and handles top-level errors.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Suggested reviewers

  • jianwei1
  • edmonday
  • csiyang
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.71% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding a Langfuse trace-export CLI tool with API integration and sanitized report synthesis, aligned with PR objectives and all file changes.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch jacobusbrink/nes-1690-langfuse-trace-export

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 21, 2026

Fails
🚫 Please ensure your PR title matches commitlint convention.
Warnings
⚠️ ❗ Big PR (3409 changes)

(change count - 3409): Pull Request size seems relatively large. If Pull Request contains multiple changes, split each into separate PR will helps faster, easier review.

(pr title - feat(tools): Langfuse trace-export tool — API integration + sanitised report synthesis [NES-1690]):

subject must not be sentence-case, start-case, pascal-case, upper-case

header must not be longer than 100 characters, current length is 105

Generated by 🚫 dangerJS against 011a018

@nx-cloud
Copy link
Copy Markdown

nx-cloud Bot commented May 21, 2026

View your CI Pipeline Execution ↗ for commit b1cad85

Command Status Duration Result
nx run journeys-admin-e2e:e2e ✅ Succeeded 27s View ↗
nx run-many --target=vercel-alias --projects=jo... ✅ Succeeded 1s View ↗
nx run-many --target=upload-sourcemaps --projec... ✅ Succeeded 13s View ↗
nx run-many --target=deploy --projects=journeys... ✅ Succeeded 31s View ↗

☁️ Nx Cloud last updated this comment at 2026-05-21 23:37:44 UTC

Comment thread tools/langfuse-export/src/cli.ts Fixed
@github-actions github-actions Bot temporarily deployed to Preview - journeys-admin May 21, 2026 05:27 Inactive
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 21, 2026

The latest updates on your projects.

Name Status Preview Updated (UTC)
journeys-admin ✅ Ready journeys-admin preview Fri May 22 11:35:26 NZST 2026

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (1)
tools/langfuse-export/src/cli.spec.ts (1)

44-60: ⚡ Quick win

Add a regression test for valued string-flag swallowing.

Please add a parseArgs(['--model', '--debug']) case asserting --model requires a value so this parse bug doesn’t regress.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/langfuse-export/src/cli.spec.ts` around lines 44 - 60, Add a regression
test that ensures a valued string flag cannot swallow the next flag: in the
existing spec near the other throttle tests, add an it(...) that calls
parseArgs(['--model', '--debug']) and expects it to throw matching the message
`--model requires a value` (e.g., expect(() =>
parseArgs(['--model','--debug'])).toThrow(/--model requires a value/)). This
uses the parseArgs function and mirrors the style of the other tests to prevent
regressions where a missing string value is interpreted as the next flag.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tools/langfuse-export/src/aggregate.ts`:
- Around line 13-16: The dayKey function currently slices any string with length
>= 10 which can produce bogus keys for malformed timestamps; update dayKey to
validate the input is a well-formed ISO timestamp (e.g., parseable by Date or
matches /^\d{4}-\d{2}-\d{2}/) and return 'unknown' for invalid or unparseable
values, otherwise return the YYYY-MM-DD prefix; modify the dayKey(iso: string)
implementation accordingly so callers get 'unknown' instead of arbitrary slices.

In `@tools/langfuse-export/src/cli.ts`:
- Around line 34-37: requireValue is treating any token (including flags like
"--debug") as a valid value, allowing option values to swallow the next CLI
flag; update requireValue(argv, index, flag) to validate that argv[index] exists
and does not start with '-' (treat tokens beginning with '-' as missing),
throwing the same Error(`${flag} requires a value`) when it does; apply the same
validation logic where similar parsing occurs (the other call sites around lines
noted) so functions that parse flags such as requireValue and the code paths
handling options at indices 68-70 and 82-88 all reject tokens that look like
flags rather than accepting them as values.

In `@tools/langfuse-export/src/langfuse.ts`:
- Around line 151-157: Wrap the paginated Langfuse API calls (the
client.fetchTraces(...) call at the shown block and the similar paginated call
at lines 187-193) with a bounded retry/backoff helper: implement a
retryWithBackoff(asyncFn) that retries on transient errors (HTTP 429, 5xx, and
network errors) using exponential backoff with jitter, a small maxRetries (e.g.,
3-5) and a maxDelay cap, and rethrows the error after max retries; replace the
direct client.fetchTraces(...) invocation with calls to retryWithBackoff(() =>
client.fetchTraces({...})) passing the same parameters (from: window.from, to:
window.to, orderBy, limit: pageSize, page) so page iteration logic is unchanged.
Ensure the helper only retries idempotent paginated GET-style requests and logs
each retry attempt for observability.

In `@tools/langfuse-export/src/normalize.ts`:
- Around line 127-131: The exclusion check uses
options.excludeMessageRegex.test(...) which is stateful for global/sticky
regexes; make it deterministic by resetting the regex before each test or
testing a cloned regex instance. Update the code where
options.excludeMessageRegex is used with turn.userMessage and
MAX_DISCRIMINATOR_TEST_CHARS to either set options.excludeMessageRegex.lastIndex
= 0 prior to calling test(...) or replace the call with
RegExp(options.excludeMessageRegex).test(...) so each iteration uses a fresh,
non-sticky regex.

In `@tools/langfuse-export/src/openrouter.ts`:
- Around line 58-69: When building the parsed theme objects in the map callback
(the block that creates label and sessionIds from record), deduplicate
sessionIds before returning so repeated IDs in the same theme are removed;
update the sessionIds assignment (which currently filters using validIds) to
produce a unique array (e.g., use a Set or Array.from(new Set(...))) while
preserving only ids that pass the existing (id): id is string &&
validIds.has(id) filter, then return that deduplicated sessionIds so the final
returned { themes } contains no duplicate IDs per theme.

---

Nitpick comments:
In `@tools/langfuse-export/src/cli.spec.ts`:
- Around line 44-60: Add a regression test that ensures a valued string flag
cannot swallow the next flag: in the existing spec near the other throttle
tests, add an it(...) that calls parseArgs(['--model', '--debug']) and expects
it to throw matching the message `--model requires a value` (e.g., expect(() =>
parseArgs(['--model','--debug'])).toThrow(/--model requires a value/)). This
uses the parseArgs function and mirrors the style of the other tests to prevent
regressions where a missing string value is interpreted as the next flag.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 1464379c-0538-4516-a60b-0668ade9f057

📥 Commits

Reviewing files that changed from the base of the PR and between 64b1ad2 and c07814b.

⛔ Files ignored due to path filters (1)
  • apps/journeys-admin/__generated__/NewCloudflareImage.ts is excluded by !**/__generated__/**
📒 Files selected for processing (25)
  • .gitignore
  • docs/plans/2026-05-21-001-feat-langfuse-trace-export-tool-plan.md
  • tools/langfuse-export/.env.example
  • tools/langfuse-export/README.md
  • tools/langfuse-export/fetch-env.sh
  • tools/langfuse-export/run.ts
  • tools/langfuse-export/src/aggregate.spec.ts
  • tools/langfuse-export/src/aggregate.ts
  • tools/langfuse-export/src/cli.spec.ts
  • tools/langfuse-export/src/cli.ts
  • tools/langfuse-export/src/env.spec.ts
  • tools/langfuse-export/src/env.ts
  • tools/langfuse-export/src/langfuse.spec.ts
  • tools/langfuse-export/src/langfuse.ts
  • tools/langfuse-export/src/normalize.spec.ts
  • tools/langfuse-export/src/normalize.ts
  • tools/langfuse-export/src/openrouter.spec.ts
  • tools/langfuse-export/src/openrouter.ts
  • tools/langfuse-export/src/pdf.ts
  • tools/langfuse-export/src/report.ts
  • tools/langfuse-export/src/sanitize.spec.ts
  • tools/langfuse-export/src/sanitize.ts
  • tools/langfuse-export/src/types.ts
  • tools/langfuse-export/tsconfig.json
  • tools/langfuse-export/vitest.config.mts

Comment thread tools/langfuse-export/src/pipeline/aggregate.ts
Comment thread tools/langfuse-export/src/cli.ts
Comment thread tools/langfuse-export/src/langfuse.ts Outdated
Comment thread tools/langfuse-export/src/pipeline/normalize.ts
Comment thread tools/langfuse-export/src/clients/openrouter.ts
…ads; verified-shape parsing [NES-1690]

Legacy list endpoints (/api/public/traces, /api/public/observations — what the
langfuse SDK v3 wraps) time out on Langfuse Cloud. Switch reads to raw fetch:
v2 observations index (cursor) enumerates traceIds, then GET traces/{id} per
trace returns context + full nested observations in one by-id call. Confirmed
input/output shape via curl, so the tolerant multi-shape parser is replaced with
a precise extractor. Doppler home corrected to journeys/dev. Verified end-to-end
against live data (265 traces, load-test excluded, report rendered).
…race-export' into jacobusbrink/nes-1690-langfuse-trace-export

# Conflicts:
#	tools/langfuse-export/README.md
#	tools/langfuse-export/src/langfuse.ts
#	tools/langfuse-export/src/pipeline/sanitize.spec.ts
#	tools/langfuse-export/src/pipeline/sanitize.ts
@github-actions github-actions Bot requested a deployment to Preview - journeys-admin May 21, 2026 20:11 Pending
@github-actions github-actions Bot temporarily deployed to Preview - journeys-admin May 21, 2026 20:15 Inactive
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tools/langfuse-export/src/clients/openrouter.ts (1)

58-67: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Guard parsed theme entries before reading properties.

theme can be null/non-object from model output. Accessing record.label then throws and drops the entire synthesis result instead of filtering just the bad entry.

Suggested patch
   const themes = parsed.themes
     .map((theme) => {
-      const record = theme as Record<string, unknown>
-      const label = typeof record.label === 'string' ? record.label : ''
-      const sessionIds = Array.isArray(record.sessionIds)
+      if (theme == null || typeof theme !== 'object') {
+        return { label: '', sessionIds: [] as string[] }
+      }
+      const record = theme as { label?: unknown; sessionIds?: unknown }
+      const label = typeof record.label === 'string' ? record.label : ''
+      const sessionIds = Array.isArray(record.sessionIds)
         ? record.sessionIds.filter(
             (id): id is string => typeof id === 'string' && validIds.has(id)
           )
         : []
       return { label, sessionIds }
     })
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/langfuse-export/src/clients/openrouter.ts` around lines 58 - 67, The
mapping over theme entries reads properties from possibly null/non-object
`theme`, causing a crash; update the mapper to first guard that `theme` is an
object (e.g. typeof theme === 'object' && theme !== null) before casting to
`Record<string, unknown>` and accessing `record.label`/`record.sessionIds`, and
skip or return a safe default for entries that fail the guard so only valid
theme objects are processed (refer to the existing mapper that uses `record`,
`label`, and `validIds` for guidance).
♻️ Duplicate comments (2)
tools/langfuse-export/src/cli.ts (1)

37-40: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Prevent option values from swallowing the next flag token.

At Line 39, requireValue only checks for null, so inputs like --model --debug are parsed as a model value and drop --debug entirely.

Suggested patch
 function requireValue(argv: string[], index: number, flag: string): string {
   const value = argv[index]
-  if (value == null) throw new Error(`${flag} requires a value`)
+  if (
+    value == null ||
+    value.length === 0 ||
+    value === '-h' ||
+    value === '--help' ||
+    value.startsWith('--')
+  ) {
+    throw new Error(`${flag} requires a value`)
+  }
   return value
 }

Also applies to: 71-77

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/langfuse-export/src/cli.ts` around lines 37 - 40, The requireValue
function currently only checks for null and thus will accept the next token even
if it is another flag (e.g., "--model --debug"); update requireValue to also
reject tokens that look like flags by checking if the candidate value starts
with "-" (or "--") and throw the same Error(`${flag} requires a value`) when it
does. Apply the same change to the duplicate parsing logic referenced (the code
block around the other option parsing at lines 71-77) so all option value checks
(use the function name requireValue or its duplicate logic) consistently
validate that a value is present and not another flag token.
tools/langfuse-export/src/pipeline/aggregate.ts (1)

13-16: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate day format before bucketing.

This still accepts any 10+ char string as a day key, which can create bogus buckets instead of falling back to unknown.

Suggested patch
 function dayKey(iso: string): string {
   // YYYY-MM-DD from an ISO timestamp; '' if unparseable.
-  return iso.length >= 10 ? iso.slice(0, 10) : ''
+  if (iso.length < 10) return ''
+  const day = iso.slice(0, 10)
+  return /^\d{4}-\d{2}-\d{2}$/.test(day) ? day : ''
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/langfuse-export/src/pipeline/aggregate.ts` around lines 13 - 16, The
dayKey helper currently accepts any string >=10 chars; update dayKey(iso:
string) to strictly validate the first 10 chars match the YYYY-MM-DD pattern
(four digits, hyphen, two digits, hyphen, two digits) and return that slice only
when it matches and represents a plausible date (e.g., month 01–12, day 01–31);
otherwise return the fallback "unknown". Locate and change the dayKey function
to perform a regex/date-component check before slicing so buckets are only
created for valid ISO date prefixes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tools/langfuse-export/fetch-env.sh`:
- Around line 22-25: The current redirect using > "$OUT" truncates the target
before doppler runs and leaves the secrets file created with default umask; fix
by writing to a secure temporary file and only replacing $OUT after a successful
download: run doppler secrets download --no-file --format=env-no-quotes
--project journeys --config dev redirecting into a temp file (e.g., "$OUT.tmp"
or mktemp), set restrictive permissions (chmod 0600) on that temp file, verify
the command succeeded, then atomically move the temp into place with mv -f to
overwrite $OUT, and emit the existing message; ensure any temp file is removed
on failure.

---

Outside diff comments:
In `@tools/langfuse-export/src/clients/openrouter.ts`:
- Around line 58-67: The mapping over theme entries reads properties from
possibly null/non-object `theme`, causing a crash; update the mapper to first
guard that `theme` is an object (e.g. typeof theme === 'object' && theme !==
null) before casting to `Record<string, unknown>` and accessing
`record.label`/`record.sessionIds`, and skip or return a safe default for
entries that fail the guard so only valid theme objects are processed (refer to
the existing mapper that uses `record`, `label`, and `validIds` for guidance).

---

Duplicate comments:
In `@tools/langfuse-export/src/cli.ts`:
- Around line 37-40: The requireValue function currently only checks for null
and thus will accept the next token even if it is another flag (e.g., "--model
--debug"); update requireValue to also reject tokens that look like flags by
checking if the candidate value starts with "-" (or "--") and throw the same
Error(`${flag} requires a value`) when it does. Apply the same change to the
duplicate parsing logic referenced (the code block around the other option
parsing at lines 71-77) so all option value checks (use the function name
requireValue or its duplicate logic) consistently validate that a value is
present and not another flag token.

In `@tools/langfuse-export/src/pipeline/aggregate.ts`:
- Around line 13-16: The dayKey helper currently accepts any string >=10 chars;
update dayKey(iso: string) to strictly validate the first 10 chars match the
YYYY-MM-DD pattern (four digits, hyphen, two digits, hyphen, two digits) and
return that slice only when it matches and represents a plausible date (e.g.,
month 01–12, day 01–31); otherwise return the fallback "unknown". Locate and
change the dayKey function to perform a regex/date-component check before
slicing so buckets are only created for valid ISO date prefixes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 37ba98bb-8db6-47f4-8e46-151ad522f7cc

📥 Commits

Reviewing files that changed from the base of the PR and between c07814b and 6820e40.

📒 Files selected for processing (17)
  • tools/langfuse-export/README.md
  • tools/langfuse-export/fetch-env.sh
  • tools/langfuse-export/run.ts
  • tools/langfuse-export/src/cli.ts
  • tools/langfuse-export/src/clients/langfuse.spec.ts
  • tools/langfuse-export/src/clients/langfuse.ts
  • tools/langfuse-export/src/clients/openrouter.spec.ts
  • tools/langfuse-export/src/clients/openrouter.ts
  • tools/langfuse-export/src/clients/pdf.ts
  • tools/langfuse-export/src/env.ts
  • tools/langfuse-export/src/pipeline/aggregate.spec.ts
  • tools/langfuse-export/src/pipeline/aggregate.ts
  • tools/langfuse-export/src/pipeline/normalize.spec.ts
  • tools/langfuse-export/src/pipeline/normalize.ts
  • tools/langfuse-export/src/pipeline/report.ts
  • tools/langfuse-export/src/pipeline/sanitize.spec.ts
  • tools/langfuse-export/src/pipeline/sanitize.ts
💤 Files with no reviewable changes (2)
  • tools/langfuse-export/src/clients/pdf.ts
  • tools/langfuse-export/src/clients/openrouter.spec.ts

Comment thread tools/langfuse-export/fetch-env.sh
…tion) [NES-1690]

NES-1688 now tags Langfuse traces with their deployment environment
(production | stage | preview | development), so reports can be scoped.

- --environment defaults to `production` (the share-worthy deliverable);
  `all` disables filtering and includes pre-NES-1688 untagged history.
- Filters server-side via the v2 observations index query param (fewer
  per-trace fetches) plus an authoritative re-check against each trace's
  own environment.
- Prints the selected env and a hint when a filter matches nothing.
- Also renames the report's share target from "Aaron" to "Leadership".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot temporarily deployed to Preview - journeys-admin May 21, 2026 22:44 Inactive
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tools/langfuse-export/src/clients/langfuse.ts (1)

22-27: ⚡ Quick win

Narrow FetchOptions.environment to a deployment-environment type.

On Line 26, environment?: string is too permissive and allows invalid values if this client is reused outside run.ts. Prefer a shared union type (for example, production | stage | preview | development) for compile-time safety.

As per coding guidelines, **/*.{ts,tsx}: Define a type if possible.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/langfuse-export/src/clients/langfuse.ts` around lines 22 - 27, Replace
the permissive environment?: string in the FetchOptions interface with a narrow
union type (e.g., DeploymentEnvironment = 'production' | 'stage' | 'preview' |
'development') and use that type for the environment property; either import the
shared DeploymentEnvironment union if it already exists or declare and export it
from this module so callers get compile-time safety when passing environment to
the functions that accept FetchOptions (look for the FetchOptions interface and
its environment property in langfuse.ts).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tools/langfuse-export/src/clients/langfuse.ts`:
- Around line 22-27: Replace the permissive environment?: string in the
FetchOptions interface with a narrow union type (e.g., DeploymentEnvironment =
'production' | 'stage' | 'preview' | 'development') and use that type for the
environment property; either import the shared DeploymentEnvironment union if it
already exists or declare and export it from this module so callers get
compile-time safety when passing environment to the functions that accept
FetchOptions (look for the FetchOptions interface and its environment property
in langfuse.ts).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 7172cfdb-5196-4b2a-88a2-15e8500ff915

📥 Commits

Reviewing files that changed from the base of the PR and between 6820e40 and c551cda.

📒 Files selected for processing (9)
  • tools/langfuse-export/README.md
  • tools/langfuse-export/run.ts
  • tools/langfuse-export/src/cli.spec.ts
  • tools/langfuse-export/src/cli.ts
  • tools/langfuse-export/src/clients/langfuse.spec.ts
  • tools/langfuse-export/src/clients/langfuse.ts
  • tools/langfuse-export/src/pipeline/normalize.spec.ts
  • tools/langfuse-export/src/pipeline/normalize.ts
  • tools/langfuse-export/src/types.ts
✅ Files skipped from review due to trivial changes (2)
  • tools/langfuse-export/src/types.ts
  • tools/langfuse-export/README.md

@jaco-brink jaco-brink self-assigned this May 21, 2026
@github-actions github-actions Bot temporarily deployed to Preview - journeys-admin May 21, 2026 23:09 Inactive
- cli: requireValue rejects flag-shaped tokens so `--model --debug` no longer
  swallows the next flag as a value
- cli: parseDiscriminator wraps RegExp construction with a friendly error
  (pattern is engineer-supplied, not untrusted input)
- aggregate: dayKey validates the YYYY-MM-DD shape so a malformed timestamp
  buckets under 'unknown' instead of producing a bogus day key
- openrouter: parseThemes trims labels and dedups sessionIds per theme
- fetch-env.sh: write secrets to a chmod 600 temp file then mv into place, so a
  failed download can't truncate .env and secrets aren't world-readable

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jaco-brink
Copy link
Copy Markdown
Collaborator Author

Review feedback addressed (2fbd5f7)

Fixed:

  • src/pipeline/aggregate.ts (dayKey) — validate the YYYY-MM-DD prefix shape so a malformed timestamp buckets under unknown instead of producing a bogus day key. (CodeRabbit)
  • src/cli.ts (requireValue) — reject flag-shaped tokens (---prefixed / -h / empty) as values, so --model --debug no longer swallows the next flag. (CodeRabbit)
  • src/clients/openrouter.ts (parseThemes) — trim labels and dedup sessionIds per theme so a repeated id can't render the same conversation twice. (CodeRabbit)
  • fetch-env.sh — write secrets to a chmod 600 temp file then mv into place: a failed download can't truncate an existing .env, and secrets never land in a world-readable file. (CodeRabbit)

Fixed (adjusted) + challenged:

  • src/cli.ts (parseDiscriminator) — CodeQL regex injection: the --discriminator message:<regex> pattern is engineer-supplied in an internal CLI tool, not untrusted input, so it isn't an injection vector; matched text is also length-bounded as defence-in-depth. Hardened anyway by wrapping new RegExp to emit a friendly error on a bad pattern. Thread resolved; the code-scanning alert can be dismissed as won't-fix if desired. (see inline reply)

Challenged (no change):

  • src/pipeline/normalize.ts — CodeRabbit's lastIndex reset is unreachable: the regex is only ever built with a fixed 'i' flag or the no-flag default literal, never g/y, so .test() is stateless. (see inline reply)

Outdated (won't-fix):

  • src/langfuse.ts retry/backoff — thread is stale (file rewritten to raw fetch/getJson; the patched SDK methods no longer exist). Deliberately no retry: this is an internal, manually-run, idempotent tool — a non-2xx signals a request-construction bug to surface loudly, and a genuine downstream incident is handled by re-running / waiting it out. (see inline reply)

All 7 threads resolved. Tool suite green: 89 Vitest tests pass, tsc --noEmit clean.

Note: the failing Danger check is pre-existing and unrelated to this feedback — it flags the PR title (105 chars > 100; subject not lowercase per commitlint) plus a big-PR warning.

Comment thread tools/langfuse-export/src/cli.ts Dismissed
@github-actions github-actions Bot requested a deployment to Preview - journeys-admin May 21, 2026 23:29 Pending
@jaco-brink
Copy link
Copy Markdown
Collaborator Author

Review feedback addressed — round 2

CodeQL regex-injection (recurring): dealt with at the alert level — dismissed alerts #340 + #341 (js/regex-injection) as won't fix with a documented rationale, and resolved the review thread.

Root cause of the repeat: resolving the PR review thread is cosmetic — CodeQL re-raises the alert on each scan (round 1's try/catch only shifted the line, spawning #341). Dismissing the alert is the durable fix.

Rationale: internal CLI tool; --discriminator message:<regex> is engineer-supplied (not untrusted input), so it's not an injection vector. Matched text is length-bounded (MAX_DISCRIMINATOR_TEST_CHARS) and new RegExp construction fails safely on a bad pattern. No code change this round.

@jaco-brink jaco-brink requested a review from edmonday May 21, 2026 23:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants