v2.0.7: Langfuse LLM Observability (Optional) + Unified Stack Management#37
Merged
Hidden-History merged 31 commits intomainfrom Feb 24, 2026
Merged
v2.0.7: Langfuse LLM Observability (Optional) + Unified Stack Management#37Hidden-History merged 31 commits intomainfrom
Hidden-History merged 31 commits intomainfrom
Conversation
…4 Session Tracing (PLAN-008) Add self-hosted Langfuse v3 integration for LLM observability with two-tier tracing architecture. Phase 1 deploys 7 Docker services (langfuse-web, langfuse-worker, postgres, clickhouse, redis, minio, trace-flush-worker) as an opt-in profile. Phase 4 adds session-level Tier 1 tracing via Claude Code Stop hook with Parzival tagging and project_id multi-tenancy. Key changes: - docker-compose.langfuse.yml: 7 services with security hardening, health checks, Langfuse v3 headless auto-initialization - langfuse_setup.sh: One-command setup with secret generation, MinIO bucket creation, health check, custom model registration (Basic Auth) - config.py: 8 Langfuse config fields with validation (SecretStr for secret key, enabled+missing keys = error) - langfuse_stop_hook.py: Session-level tracing with dual kill-switches, 2s flush timeout (SIGALRM), Parzival tagging, project_id scoping - langfuse_config.py: Thread-safe client factory (no None caching) - install.sh: Interactive Langfuse menu with RAM check (<32 GiB warning) - generate_settings.py + merge_settings.py: TRACE_TO_LANGFUSE and 6 Langfuse env vars injected when enabled - 18 unit tests (all pass), zero regressions on 1994-test suite Reviewed by: 2 adversarial reviewers (Opus + Sonnet), 11 issues found and fixed across 2 review rounds, verified clean by Opus re-reviewer. SPEC-019 v1.2 | SPEC-022 v1.2 (§2 only) | PLAN-008 v1.2.1 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…Metrics (SPEC-020) Implements SPEC-020 (Langfuse SDK Integration) for PLAN-008 v2.0.7: - trace_buffer.py: Fire-and-forget atomic file-based trace event writer (<10ms) with incremental buffer size tracking and MB-based overflow guard (DEC-PLAN008-004) - trace_flush_worker.py: Buffer-to-Langfuse flush daemon with SIGTERM graceful shutdown, oldest-first eviction, and Prometheus metrics push - langfuse_config.py: Added is_langfuse_enabled() and is_hook_tracing_enabled() kill-switch helpers for hook subprocess contexts - config.py: Added langfuse_trace_buffer_max_mb field (default=100, ge=10, le=1000) - metrics_push.py: Added push_langfuse_buffer_metrics_async() with 4 Prometheus metrics (flush events, errors, buffer size, evictions) - process_classification_queue.py: AnthropicInstrumentor auto-instrumentation - requirements.txt: Added langfuse>=3.0 and opentelemetry-instrumentation-anthropic - langfuse_stop_hook.py: Fixed token count estimate (SPEC-022 §2.6) and project_id default Tests: 45/45 pass (6 buffer + 9 flush worker + 20 config + 11 client factory) Review: 2 rounds (Opus + Sonnet), 15 issues found and fixed, verified CLEAN Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…across 10 hook scripts Add emit_trace_event() instrumentation to all capture and store-async hook scripts, completing the full pipeline observability layer for Langfuse. Capture hooks (4 files): 1_capture span with trace_id generation and env propagation to store-async subprocesses. Store-async hooks (4 files): Full pipeline spans 2_log, 3_detect, 4_scan, 5_chunk, 6_embed, 7_store, 8_enqueue with accurate tracking: - scan_actually_ran flag gates 4_scan span (no phantom events when disabled) - scan_input_length captured before masking for accurate content_length - classification_enqueued boolean tracks actual enqueue outcome (not hardcoded) - BLOCKED path: 4_scan + pipeline_terminated in independent try/except blocks - scan_action defaults to "skipped" (not "passed") when scanning disabled Special hooks (2 files): - pre_compact_save.py: Full pipeline 2_log through 8_enqueue. Phase 4 @observe() migration noted per SPEC-021 §3.2. - context_injection_tier2.py: context_retrieval span on 3 paths (success, search failure, outer catch-all failure). 3-round Opus code review: 20 issues found and fixed (8 HIGH, 6 MEDIUM, 6 LOW), final verification CLEAN across all 6 reviewed files. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…classifier latency alert Add Langfuse integration to Grafana dashboard and alerting: - "LLM Observability" collapsed row with 3 link panels (Traces, Sessions, Filter by Project) in memory-overview.json - $project_id template variable for Langfuse deeplink filtering - Classifier p99 latency >5s alert rule with Langfuse trace deeplink annotation in new ai-memory-alerts.yaml provisioning file - PromQL uses sum by (le) for correct multi-label histogram aggregation 8-agent BMAD team: 3 Sonnet devs, 2 reviewers (Opus+Sonnet), 2 Sonnet fixers, 1 Opus re-reviewer. 2 review rounds: 6 issues found (2M+2M+2L), 4 fixed, 2 accepted. Round 2: CLEAN (0 issues). PLAN-008 / SPEC-022 §3 / AC-7, AC-8, AC-9, AC-10, AC-12 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tall setup-collections.py calls get_config() before Langfuse container starts, causing hard failure when LANGFUSE_ENABLED=true but API keys aren't set yet. Changed validator from raising ValueError to logging warning. Runtime code in langfuse_config.py already handles missing keys gracefully. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
docker-compose.langfuse.yml had build context but no dockerfile key, causing Docker to look for Dockerfile at repo root (doesn't exist). Reuses existing Dockerfile.worker which has identical dependencies. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…use, Redis These database/cache containers require CHOWN/SETUID/SETGID capabilities to switch from root to their service user on startup. cap_drop: ALL blocks this, causing restart loops. Security hardening kept on stateless app containers (langfuse-web, langfuse-worker, trace-flush-worker, minio). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…loyment Langfuse v3 defaults to ReplicatedMergeTree which requires Zookeeper/Keeper. Single-node self-hosted deployments must set CLICKHOUSE_CLUSTER_ENABLED=false on both langfuse-web and langfuse-worker per Langfuse docs. Without this, ClickHouse migrations fail with "no Zookeeper configuration" error. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Next.js 15 binds to the container hostname IP instead of 0.0.0.0, causing the healthcheck wget to localhost:3000 to fail with "Connection refused". Setting HOSTNAME=0.0.0.0 forces binding to all interfaces. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Alpine BusyBox wget resolves localhost to IPv6 ::1 first, but Node.js/Next.js only listens on IPv4 0.0.0.0. Changed all healthcheck URLs from localhost to 127.0.0.1 to avoid the IPv6 resolution issue. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- trace-flush-worker was missing LANGFUSE_ENABLED=true env var, causing get_langfuse_client() to return None and crash-loop - Remove PLAN-008/SPEC-019 references from installer headers and docker-compose (internal planning refs don't belong in shipped code) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Worker resolves BUFFER_DIR from AI_MEMORY_INSTALL_DIR (defaults to ~/.ai-memory inside container → /home/classifier/.ai-memory) but the volume is mounted at /app/trace_buffer. Setting the env var to /app aligns the code path with the mount point. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Chainguard distroless MinIO image has no wget or curl. Use bash built-in /dev/tcp for TCP connectivity check instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Same Next.js binding issue as langfuse-web — worker binds to container IP instead of 0.0.0.0, causing healthcheck on 127.0.0.1:3030 to fail with "Connection refused". Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Langfuse v3 LANGFUSE_INIT_* env vars are silently ignored when the Postgres DB already exists from a previous install attempt. Added verify_bootstrap() that checks project existence via API after health check. On failure, auto-cleans volumes and restarts (max 1 retry). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Auto-fixed lint violations: unused imports, unsorted imports, unused variables, contextlib.suppress pattern, black formatting. Resolves CI lint gate failure (10 violations in 4 Langfuse files). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Langfuse frontend rejects .local TLD emails and hex-only passwords. Changed admin email to admin@example.com and password generation to include uppercase prefix + special char (meets Langfuse complexity). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Langfuse INIT creates the admin user with email_verified=NULL and admin=false. The NULL email_verified blocks browser login even though API auth works. Add _fixup_init_user() to set both fields after bootstrap verification. Also fix volume names in recovery path (underscore separator, not dash). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
langfuse_setup.sh wrote LANGFUSE_ENABLED, PUBLIC_KEY, SECRET_KEY to .env but not BASE_URL, TRACE_HOOKS, or TRACE_SESSIONS. install.sh exported all 6 vars from .env, setting the missing ones to empty strings which overrode generate_settings.py defaults. Result: hooks installed with empty Langfuse config — no traces produced. Fix: (1) Write all 3 missing vars in langfuse_setup.sh setup_project_keys (2) Only export non-empty values in install.sh loop Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…roject.toml Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace removed v2 methods (trace(), trace.span()) with v3 equivalents (start_span(), span.update_trace(), span.end()) in trace_flush_worker.py and langfuse_stop_hook.py. Use Langfuse.create_trace_id(seed=...) for valid 32-hex trace IDs. Store historical start_time in metadata; convert end_time to nanoseconds for span.end(). Add child spans with parent_span_id for session turn hierarchy in stop hook. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…L to trace-flush-worker Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Apply stashed installer improvements from PM#79-87 testing rounds. Most stash changes (env-var reads, timeouts, arithmetic fixes, safe re-install, base-wins merge) were already in the langfuse branch. Only remaining delta: add -L symlink checks to deploy_parzival_commands to handle broken symlinks, plus conditional deployment logging. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add scripts/stack.sh v1.1.0 for start/stop/restart/status/nuke of the full AI Memory Docker stack. Handles both compose files in correct order (core first for start, Langfuse first for stop) to prevent the network conflict identified in BUG-148. Two rounds of adversarial Opus code review — all findings resolved: - Token masking (never leak GITHUB_TOKEN to stdout) - Conditional --env-file (graceful when .env is absent) - All profiles covered in stop/nuke (monitoring, github, testing) - Non-interactive safety guard on nuke confirmation - Partial-start user guidance when Langfuse fails Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…USE_ENABLED BUG-149: trace-flush-worker runs as UID 1001 (Dockerfile.worker USER classifier) but buffer files are written by host hooks as UID 1000. Add user: directive to match classifier-worker and github-sync pattern. BUG-150: classifier-worker missing LANGFUSE_ENABLED env var. The emit_trace_event() kill-switch in trace_buffer.py defaults to false, so 9_classify spans (BUG-146 fix) never fire. Pass through host setting with false default for backward compatibility. Reviewed by: Opus adversarial (0B/0H/0M/0L) + Sonnet functional (7/7 PASS) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tests 1, 2, 5 mocked old v2 API (client.trace()) but BUG-145 migrated production code to v3 (start_span/update_trace/end). Aligns test mocks with actual API usage in trace_flush_worker.py:145-164. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The minio/mc Docker image sets mc as its ENTRYPOINT, so passing sh -c "..." as arguments was silently failing. Added --entrypoint sh to override. Also removed grep -v pipe that caused false-positive warnings on successful bucket creation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
BUG-150 originally added emit_trace_event("9_classify") to
classification_worker.py, but the classifier container runs
process_classification_queue.py. This commit fixes 3 issues:
- Add emit_trace_event to process_classification_queue.py (the actual
entry point) covering success, skipped, and error code paths
- Add trace_buffer volume mount + AI_MEMORY_INSTALL_DIR env var to
classifier-worker in docker-compose.yml so it can write to the
shared trace buffer directory
- Fix langfuse_setup.sh model registration idempotency check to
paginate through all API pages instead of only reading page 1
Verified: 9_classify span confirmed in Langfuse API after classifier
processes a queue item. All 9 pipeline steps now flow end-to-end:
1_capture → 2_log → 3_detect → 4_scan → 5_chunk → 6_embed → 7_store
→ 8_enqueue → 9_classify
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… guide - Update version badge to 2.0.7, add Langfuse badge - Add LLM Observability to Key Features section - Add V2.0.7 release section with 8 feature highlights - Add Langfuse integration section with quick-start guide - Update architecture diagram with Langfuse service tree - Create docs/LANGFUSE-INTEGRATION.md with full setup, architecture, pipeline spans, session grouping, troubleshooting, and security docs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…lity callouts - CHANGELOG.md: v2.0.7 section with Langfuse phases, 20+ bug fixes, stack.sh - INSTALL.md: stack.sh as primary stack management, Langfuse optional callouts, updated uninstall for Langfuse volumes - README.md: stack.sh in Quick Start, Langfuse marked optional, service count fix - docker/README.md: Langfuse services section, compose files docs, replaced stale "Monitoring (Future)" with current state Adversarial review: 12 findings (1B+3H+3M+4L), all blocking/high/medium fixed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
scripts/stack.shv1.1.0 — start/stop/restart/status/nuke for the full 16-container stack with correct network orderingKey stats
Langfuse is entirely optional
LANGFUSE_ENABLED=true|falsekill-switchTest plan
🤖 Generated with Claude Code