v2.0.7: Langfuse LLM Observability (Optional) + Unified Stack Management by Hidden-History · Pull Request #37 · Hidden-History/ai-memory

Hidden-History · 2026-02-23T10:10:04Z

Summary

Langfuse LLM Observability (optional): 9-step pipeline tracing, session grouping, file-based trace buffer (~5ms overhead), kill-switch control, custom model registration, Grafana integration — 4 specs (SPEC-019 through SPEC-022), 5 phases
Unified Stack Management: scripts/stack.sh v1.1.0 — start/stop/restart/status/nuke for the full 16-container stack with correct network ordering
20 bug fixes: BUG-131 through BUG-151 (8 install bugs, 3 auth/config, 3 runtime, 2 pipeline, 3 deployment, 1 stack management)
Documentation: CHANGELOG v2.0.7, LANGFUSE-INTEGRATION.md guide, stack.sh references across INSTALL/README/docker docs, Langfuse optionality callouts

Key stats

92 commits, 263 files changed, +49,557/-1,691 lines
Includes v2.0.6 foundation merge (18 specs, 42 bug fixes, Parzival integration)
0 open bugs, 0 active blockers
Multiple Opus adversarial review rounds throughout development

Langfuse is entirely optional

Controlled by LANGFUSE_ENABLED=true|false kill-switch
Core AI Memory system works fully without it (8 services, 16 GiB)
Adds 7 services when enabled (32 GiB recommended)

Test plan

V207 comprehensive test: 153/195 PASS (PM #100)
BUG-149/150 fixes committed and verified (PM #101)
4 spec corrections applied to TESTING-SOURCE-OF-TRUTH.md
Adversarial doc review: 12 findings, all blocking/high/medium fixed
Agent 9 retest (trace flush + 9_classify) — deferred to v2.0.8
Gate 10 live Parzival round-trip — deferred to v2.0.8

🤖 Generated with Claude Code

…4 Session Tracing (PLAN-008) Add self-hosted Langfuse v3 integration for LLM observability with two-tier tracing architecture. Phase 1 deploys 7 Docker services (langfuse-web, langfuse-worker, postgres, clickhouse, redis, minio, trace-flush-worker) as an opt-in profile. Phase 4 adds session-level Tier 1 tracing via Claude Code Stop hook with Parzival tagging and project_id multi-tenancy. Key changes: - docker-compose.langfuse.yml: 7 services with security hardening, health checks, Langfuse v3 headless auto-initialization - langfuse_setup.sh: One-command setup with secret generation, MinIO bucket creation, health check, custom model registration (Basic Auth) - config.py: 8 Langfuse config fields with validation (SecretStr for secret key, enabled+missing keys = error) - langfuse_stop_hook.py: Session-level tracing with dual kill-switches, 2s flush timeout (SIGALRM), Parzival tagging, project_id scoping - langfuse_config.py: Thread-safe client factory (no None caching) - install.sh: Interactive Langfuse menu with RAM check (<32 GiB warning) - generate_settings.py + merge_settings.py: TRACE_TO_LANGFUSE and 6 Langfuse env vars injected when enabled - 18 unit tests (all pass), zero regressions on 1994-test suite Reviewed by: 2 adversarial reviewers (Opus + Sonnet), 11 issues found and fixed across 2 review rounds, verified clean by Opus re-reviewer. SPEC-019 v1.2 | SPEC-022 v1.2 (§2 only) | PLAN-008 v1.2.1 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…Metrics (SPEC-020) Implements SPEC-020 (Langfuse SDK Integration) for PLAN-008 v2.0.7: - trace_buffer.py: Fire-and-forget atomic file-based trace event writer (<10ms) with incremental buffer size tracking and MB-based overflow guard (DEC-PLAN008-004) - trace_flush_worker.py: Buffer-to-Langfuse flush daemon with SIGTERM graceful shutdown, oldest-first eviction, and Prometheus metrics push - langfuse_config.py: Added is_langfuse_enabled() and is_hook_tracing_enabled() kill-switch helpers for hook subprocess contexts - config.py: Added langfuse_trace_buffer_max_mb field (default=100, ge=10, le=1000) - metrics_push.py: Added push_langfuse_buffer_metrics_async() with 4 Prometheus metrics (flush events, errors, buffer size, evictions) - process_classification_queue.py: AnthropicInstrumentor auto-instrumentation - requirements.txt: Added langfuse>=3.0 and opentelemetry-instrumentation-anthropic - langfuse_stop_hook.py: Fixed token count estimate (SPEC-022 §2.6) and project_id default Tests: 45/45 pass (6 buffer + 9 flush worker + 20 config + 11 client factory) Review: 2 rounds (Opus + Sonnet), 15 issues found and fixed, verified CLEAN Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

@observe

…across 10 hook scripts Add emit_trace_event() instrumentation to all capture and store-async hook scripts, completing the full pipeline observability layer for Langfuse. Capture hooks (4 files): 1_capture span with trace_id generation and env propagation to store-async subprocesses. Store-async hooks (4 files): Full pipeline spans 2_log, 3_detect, 4_scan, 5_chunk, 6_embed, 7_store, 8_enqueue with accurate tracking: - scan_actually_ran flag gates 4_scan span (no phantom events when disabled) - scan_input_length captured before masking for accurate content_length - classification_enqueued boolean tracks actual enqueue outcome (not hardcoded) - BLOCKED path: 4_scan + pipeline_terminated in independent try/except blocks - scan_action defaults to "skipped" (not "passed") when scanning disabled Special hooks (2 files): - pre_compact_save.py: Full pipeline 2_log through 8_enqueue. Phase 4 @observe() migration noted per SPEC-021 §3.2. - context_injection_tier2.py: context_retrieval span on 3 paths (success, search failure, outer catch-all failure). 3-round Opus code review: 20 issues found and fixed (8 HIGH, 6 MEDIUM, 6 LOW), final verification CLEAN across all 6 reviewed files. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…classifier latency alert Add Langfuse integration to Grafana dashboard and alerting: - "LLM Observability" collapsed row with 3 link panels (Traces, Sessions, Filter by Project) in memory-overview.json - $project_id template variable for Langfuse deeplink filtering - Classifier p99 latency >5s alert rule with Langfuse trace deeplink annotation in new ai-memory-alerts.yaml provisioning file - PromQL uses sum by (le) for correct multi-label histogram aggregation 8-agent BMAD team: 3 Sonnet devs, 2 reviewers (Opus+Sonnet), 2 Sonnet fixers, 1 Opus re-reviewer. 2 review rounds: 6 issues found (2M+2M+2L), 4 fixed, 2 accepted. Round 2: CLEAN (0 issues). PLAN-008 / SPEC-022 §3 / AC-7, AC-8, AC-9, AC-10, AC-12 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…tall setup-collections.py calls get_config() before Langfuse container starts, causing hard failure when LANGFUSE_ENABLED=true but API keys aren't set yet. Changed validator from raising ValueError to logging warning. Runtime code in langfuse_config.py already handles missing keys gracefully. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

docker-compose.langfuse.yml had build context but no dockerfile key, causing Docker to look for Dockerfile at repo root (doesn't exist). Reuses existing Dockerfile.worker which has identical dependencies. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…use, Redis These database/cache containers require CHOWN/SETUID/SETGID capabilities to switch from root to their service user on startup. cap_drop: ALL blocks this, causing restart loops. Security hardening kept on stateless app containers (langfuse-web, langfuse-worker, trace-flush-worker, minio). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…loyment Langfuse v3 defaults to ReplicatedMergeTree which requires Zookeeper/Keeper. Single-node self-hosted deployments must set CLICKHOUSE_CLUSTER_ENABLED=false on both langfuse-web and langfuse-worker per Langfuse docs. Without this, ClickHouse migrations fail with "no Zookeeper configuration" error. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Next.js 15 binds to the container hostname IP instead of 0.0.0.0, causing the healthcheck wget to localhost:3000 to fail with "Connection refused". Setting HOSTNAME=0.0.0.0 forces binding to all interfaces. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Alpine BusyBox wget resolves localhost to IPv6 ::1 first, but Node.js/Next.js only listens on IPv4 0.0.0.0. Changed all healthcheck URLs from localhost to 127.0.0.1 to avoid the IPv6 resolution issue. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- trace-flush-worker was missing LANGFUSE_ENABLED=true env var, causing get_langfuse_client() to return None and crash-loop - Remove PLAN-008/SPEC-019 references from installer headers and docker-compose (internal planning refs don't belong in shipped code) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Worker resolves BUFFER_DIR from AI_MEMORY_INSTALL_DIR (defaults to ~/.ai-memory inside container → /home/classifier/.ai-memory) but the volume is mounted at /app/trace_buffer. Setting the env var to /app aligns the code path with the mount point. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Chainguard distroless MinIO image has no wget or curl. Use bash built-in /dev/tcp for TCP connectivity check instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Same Next.js binding issue as langfuse-web — worker binds to container IP instead of 0.0.0.0, causing healthcheck on 127.0.0.1:3030 to fail with "Connection refused". Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Langfuse v3 LANGFUSE_INIT_* env vars are silently ignored when the Postgres DB already exists from a previous install attempt. Added verify_bootstrap() that checks project existence via API after health check. On failure, auto-cleans volumes and restarts (max 1 retry). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Auto-fixed lint violations: unused imports, unsorted imports, unused variables, contextlib.suppress pattern, black formatting. Resolves CI lint gate failure (10 violations in 4 Langfuse files). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Langfuse frontend rejects .local TLD emails and hex-only passwords. Changed admin email to admin@example.com and password generation to include uppercase prefix + special char (meets Langfuse complexity). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Langfuse INIT creates the admin user with email_verified=NULL and admin=false. The NULL email_verified blocks browser login even though API auth works. Add _fixup_init_user() to set both fields after bootstrap verification. Also fix volume names in recovery path (underscore separator, not dash). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

langfuse_setup.sh wrote LANGFUSE_ENABLED, PUBLIC_KEY, SECRET_KEY to .env but not BASE_URL, TRACE_HOOKS, or TRACE_SESSIONS. install.sh exported all 6 vars from .env, setting the missing ones to empty strings which overrode generate_settings.py defaults. Result: hooks installed with empty Langfuse config — no traces produced. Fix: (1) Write all 3 missing vars in langfuse_setup.sh setup_project_keys (2) Only export non-empty values in install.sh loop Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…roject.toml Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace removed v2 methods (trace(), trace.span()) with v3 equivalents (start_span(), span.update_trace(), span.end()) in trace_flush_worker.py and langfuse_stop_hook.py. Use Langfuse.create_trace_id(seed=...) for valid 32-hex trace IDs. Store historical start_time in metadata; convert end_time to nanoseconds for span.end(). Add child spans with parent_span_id for session turn hierarchy in stop hook. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…L to trace-flush-worker Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Apply stashed installer improvements from PM#79-87 testing rounds. Most stash changes (env-var reads, timeouts, arithmetic fixes, safe re-install, base-wins merge) were already in the langfuse branch. Only remaining delta: add -L symlink checks to deploy_parzival_commands to handle broken symlinks, plus conditional deployment logging. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add scripts/stack.sh v1.1.0 for start/stop/restart/status/nuke of the full AI Memory Docker stack. Handles both compose files in correct order (core first for start, Langfuse first for stop) to prevent the network conflict identified in BUG-148. Two rounds of adversarial Opus code review — all findings resolved: - Token masking (never leak GITHUB_TOKEN to stdout) - Conditional --env-file (graceful when .env is absent) - All profiles covered in stop/nuke (monitoring, github, testing) - Non-interactive safety guard on nuke confirmation - Partial-start user guidance when Langfuse fails Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…USE_ENABLED BUG-149: trace-flush-worker runs as UID 1001 (Dockerfile.worker USER classifier) but buffer files are written by host hooks as UID 1000. Add user: directive to match classifier-worker and github-sync pattern. BUG-150: classifier-worker missing LANGFUSE_ENABLED env var. The emit_trace_event() kill-switch in trace_buffer.py defaults to false, so 9_classify spans (BUG-146 fix) never fire. Pass through host setting with false default for backward compatibility. Reviewed by: Opus adversarial (0B/0H/0M/0L) + Sonnet functional (7/7 PASS) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Tests 1, 2, 5 mocked old v2 API (client.trace()) but BUG-145 migrated production code to v3 (start_span/update_trace/end). Aligns test mocks with actual API usage in trace_flush_worker.py:145-164. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The minio/mc Docker image sets mc as its ENTRYPOINT, so passing sh -c "..." as arguments was silently failing. Added --entrypoint sh to override. Also removed grep -v pipe that caused false-positive warnings on successful bucket creation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

BUG-150 originally added emit_trace_event("9_classify") to classification_worker.py, but the classifier container runs process_classification_queue.py. This commit fixes 3 issues: - Add emit_trace_event to process_classification_queue.py (the actual entry point) covering success, skipped, and error code paths - Add trace_buffer volume mount + AI_MEMORY_INSTALL_DIR env var to classifier-worker in docker-compose.yml so it can write to the shared trace buffer directory - Fix langfuse_setup.sh model registration idempotency check to paginate through all API pages instead of only reading page 1 Verified: 9_classify span confirmed in Langfuse API after classifier processes a queue item. All 9 pipeline steps now flow end-to-end: 1_capture → 2_log → 3_detect → 4_scan → 5_chunk → 6_embed → 7_store → 8_enqueue → 9_classify Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… guide - Update version badge to 2.0.7, add Langfuse badge - Add LLM Observability to Key Features section - Add V2.0.7 release section with 8 feature highlights - Add Langfuse integration section with quick-start guide - Update architecture diagram with Langfuse service tree - Create docs/LANGFUSE-INTEGRATION.md with full setup, architecture, pipeline spans, session grouping, troubleshooting, and security docs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…lity callouts - CHANGELOG.md: v2.0.7 section with Langfuse phases, 20+ bug fixes, stack.sh - INSTALL.md: stack.sh as primary stack management, Langfuse optional callouts, updated uninstall for Langfuse volumes - README.md: stack.sh in Quick Start, Langfuse marked optional, service count fix - docker/README.md: Langfuse services section, compose files docs, replaced stale "Monitoring (Future)" with current state Adversarial review: 12 findings (1B+3H+3M+4L), all blocking/high/medium fixed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

WB Solutions and others added 29 commits February 23, 2026 02:09

fix: BUG-139 use bash TCP probe for MinIO healthcheck

109e06c

Chainguard distroless MinIO image has no wget or curl. Use bash built-in /dev/tcp for TCP connectivity check instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: BUG-136 add HOSTNAME=0.0.0.0 to langfuse-worker

d2b38b7

Same Next.js binding issue as langfuse-web — worker binds to container IP instead of 0.0.0.0, causing healthcheck on 127.0.0.1:3030 to fail with "Connection refused". Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: BUG-143 pre-create trace_buffer dir, BUG-144 add langfuse to pyp…

a3e6967

…roject.toml Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: BUG-146 add 9_classify Langfuse span, BUG-147 add PUSHGATEWAY_UR…

557a428

…L to trace-flush-worker Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: address 3 review findings for BUG-146 9_classify span

1c2f7e3

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Hidden-History changed the title ~~feat: Langfuse v3 Phase 1+4 — Infrastructure + Session Tracing~~ feat: v2.0.7 Langfuse LLM Observability — All 5 Phases + 22 Bug Fixes Feb 24, 2026

WB Solutions and others added 2 commits February 24, 2026 05:14

Hidden-History changed the title ~~feat: v2.0.7 Langfuse LLM Observability — All 5 Phases + 22 Bug Fixes~~ v2.0.7: Langfuse LLM Observability (Optional) + Unified Stack Management Feb 24, 2026

Hidden-History merged commit a68a6a0 into main Feb 24, 2026
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v2.0.7: Langfuse LLM Observability (Optional) + Unified Stack Management#37

v2.0.7: Langfuse LLM Observability (Optional) + Unified Stack Management#37
Hidden-History merged 31 commits intomainfrom
feature/langfuse-phase1-phase4

Hidden-History commented Feb 23, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Hidden-History commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key stats

Langfuse is entirely optional

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Hidden-History commented Feb 23, 2026 •

edited

Loading