Skip to content

v2.0.7: Langfuse LLM Observability (Optional) + Unified Stack Management#37

Merged
Hidden-History merged 31 commits intomainfrom
feature/langfuse-phase1-phase4
Feb 24, 2026
Merged

v2.0.7: Langfuse LLM Observability (Optional) + Unified Stack Management#37
Hidden-History merged 31 commits intomainfrom
feature/langfuse-phase1-phase4

Conversation

@Hidden-History
Copy link
Owner

@Hidden-History Hidden-History commented Feb 23, 2026

Summary

  • Langfuse LLM Observability (optional): 9-step pipeline tracing, session grouping, file-based trace buffer (~5ms overhead), kill-switch control, custom model registration, Grafana integration — 4 specs (SPEC-019 through SPEC-022), 5 phases
  • Unified Stack Management: scripts/stack.sh v1.1.0 — start/stop/restart/status/nuke for the full 16-container stack with correct network ordering
  • 20 bug fixes: BUG-131 through BUG-151 (8 install bugs, 3 auth/config, 3 runtime, 2 pipeline, 3 deployment, 1 stack management)
  • Documentation: CHANGELOG v2.0.7, LANGFUSE-INTEGRATION.md guide, stack.sh references across INSTALL/README/docker docs, Langfuse optionality callouts

Key stats

  • 92 commits, 263 files changed, +49,557/-1,691 lines
  • Includes v2.0.6 foundation merge (18 specs, 42 bug fixes, Parzival integration)
  • 0 open bugs, 0 active blockers
  • Multiple Opus adversarial review rounds throughout development

Langfuse is entirely optional

  • Controlled by LANGFUSE_ENABLED=true|false kill-switch
  • Core AI Memory system works fully without it (8 services, 16 GiB)
  • Adds 7 services when enabled (32 GiB recommended)

Test plan

  • V207 comprehensive test: 153/195 PASS (PM #100)
  • BUG-149/150 fixes committed and verified (PM #101)
  • 4 spec corrections applied to TESTING-SOURCE-OF-TRUTH.md
  • Adversarial doc review: 12 findings, all blocking/high/medium fixed
  • Agent 9 retest (trace flush + 9_classify) — deferred to v2.0.8
  • Gate 10 live Parzival round-trip — deferred to v2.0.8

🤖 Generated with Claude Code

WB Solutions and others added 29 commits February 23, 2026 02:09
…4 Session Tracing (PLAN-008)

Add self-hosted Langfuse v3 integration for LLM observability with
two-tier tracing architecture. Phase 1 deploys 7 Docker services
(langfuse-web, langfuse-worker, postgres, clickhouse, redis, minio,
trace-flush-worker) as an opt-in profile. Phase 4 adds session-level
Tier 1 tracing via Claude Code Stop hook with Parzival tagging and
project_id multi-tenancy.

Key changes:
- docker-compose.langfuse.yml: 7 services with security hardening,
  health checks, Langfuse v3 headless auto-initialization
- langfuse_setup.sh: One-command setup with secret generation, MinIO
  bucket creation, health check, custom model registration (Basic Auth)
- config.py: 8 Langfuse config fields with validation (SecretStr for
  secret key, enabled+missing keys = error)
- langfuse_stop_hook.py: Session-level tracing with dual kill-switches,
  2s flush timeout (SIGALRM), Parzival tagging, project_id scoping
- langfuse_config.py: Thread-safe client factory (no None caching)
- install.sh: Interactive Langfuse menu with RAM check (<32 GiB warning)
- generate_settings.py + merge_settings.py: TRACE_TO_LANGFUSE and 6
  Langfuse env vars injected when enabled
- 18 unit tests (all pass), zero regressions on 1994-test suite

Reviewed by: 2 adversarial reviewers (Opus + Sonnet), 11 issues found
and fixed across 2 review rounds, verified clean by Opus re-reviewer.

SPEC-019 v1.2 | SPEC-022 v1.2 (§2 only) | PLAN-008 v1.2.1

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…Metrics (SPEC-020)

Implements SPEC-020 (Langfuse SDK Integration) for PLAN-008 v2.0.7:

- trace_buffer.py: Fire-and-forget atomic file-based trace event writer (<10ms)
  with incremental buffer size tracking and MB-based overflow guard (DEC-PLAN008-004)
- trace_flush_worker.py: Buffer-to-Langfuse flush daemon with SIGTERM graceful
  shutdown, oldest-first eviction, and Prometheus metrics push
- langfuse_config.py: Added is_langfuse_enabled() and is_hook_tracing_enabled()
  kill-switch helpers for hook subprocess contexts
- config.py: Added langfuse_trace_buffer_max_mb field (default=100, ge=10, le=1000)
- metrics_push.py: Added push_langfuse_buffer_metrics_async() with 4 Prometheus
  metrics (flush events, errors, buffer size, evictions)
- process_classification_queue.py: AnthropicInstrumentor auto-instrumentation
- requirements.txt: Added langfuse>=3.0 and opentelemetry-instrumentation-anthropic
- langfuse_stop_hook.py: Fixed token count estimate (SPEC-022 §2.6) and
  project_id default

Tests: 45/45 pass (6 buffer + 9 flush worker + 20 config + 11 client factory)
Review: 2 rounds (Opus + Sonnet), 15 issues found and fixed, verified CLEAN

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…across 10 hook scripts

Add emit_trace_event() instrumentation to all capture and store-async hook
scripts, completing the full pipeline observability layer for Langfuse.

Capture hooks (4 files): 1_capture span with trace_id generation and env
propagation to store-async subprocesses.

Store-async hooks (4 files): Full pipeline spans 2_log, 3_detect, 4_scan,
5_chunk, 6_embed, 7_store, 8_enqueue with accurate tracking:
- scan_actually_ran flag gates 4_scan span (no phantom events when disabled)
- scan_input_length captured before masking for accurate content_length
- classification_enqueued boolean tracks actual enqueue outcome (not hardcoded)
- BLOCKED path: 4_scan + pipeline_terminated in independent try/except blocks
- scan_action defaults to "skipped" (not "passed") when scanning disabled

Special hooks (2 files):
- pre_compact_save.py: Full pipeline 2_log through 8_enqueue. Phase 4
  @observe() migration noted per SPEC-021 §3.2.
- context_injection_tier2.py: context_retrieval span on 3 paths (success,
  search failure, outer catch-all failure).

3-round Opus code review: 20 issues found and fixed (8 HIGH, 6 MEDIUM,
6 LOW), final verification CLEAN across all 6 reviewed files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…classifier latency alert

Add Langfuse integration to Grafana dashboard and alerting:

- "LLM Observability" collapsed row with 3 link panels (Traces, Sessions,
  Filter by Project) in memory-overview.json
- $project_id template variable for Langfuse deeplink filtering
- Classifier p99 latency >5s alert rule with Langfuse trace deeplink
  annotation in new ai-memory-alerts.yaml provisioning file
- PromQL uses sum by (le) for correct multi-label histogram aggregation

8-agent BMAD team: 3 Sonnet devs, 2 reviewers (Opus+Sonnet), 2 Sonnet fixers,
1 Opus re-reviewer. 2 review rounds: 6 issues found (2M+2M+2L), 4 fixed,
2 accepted. Round 2: CLEAN (0 issues).

PLAN-008 / SPEC-022 §3 / AC-7, AC-8, AC-9, AC-10, AC-12

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tall

setup-collections.py calls get_config() before Langfuse container starts,
causing hard failure when LANGFUSE_ENABLED=true but API keys aren't set yet.
Changed validator from raising ValueError to logging warning. Runtime code in
langfuse_config.py already handles missing keys gracefully.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
docker-compose.langfuse.yml had build context but no dockerfile key,
causing Docker to look for Dockerfile at repo root (doesn't exist).
Reuses existing Dockerfile.worker which has identical dependencies.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…use, Redis

These database/cache containers require CHOWN/SETUID/SETGID capabilities
to switch from root to their service user on startup. cap_drop: ALL
blocks this, causing restart loops. Security hardening kept on stateless
app containers (langfuse-web, langfuse-worker, trace-flush-worker, minio).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…loyment

Langfuse v3 defaults to ReplicatedMergeTree which requires Zookeeper/Keeper.
Single-node self-hosted deployments must set CLICKHOUSE_CLUSTER_ENABLED=false
on both langfuse-web and langfuse-worker per Langfuse docs. Without this,
ClickHouse migrations fail with "no Zookeeper configuration" error.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Next.js 15 binds to the container hostname IP instead of 0.0.0.0,
causing the healthcheck wget to localhost:3000 to fail with
"Connection refused". Setting HOSTNAME=0.0.0.0 forces binding to
all interfaces.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Alpine BusyBox wget resolves localhost to IPv6 ::1 first, but
Node.js/Next.js only listens on IPv4 0.0.0.0. Changed all
healthcheck URLs from localhost to 127.0.0.1 to avoid the IPv6
resolution issue.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- trace-flush-worker was missing LANGFUSE_ENABLED=true env var,
  causing get_langfuse_client() to return None and crash-loop
- Remove PLAN-008/SPEC-019 references from installer headers and
  docker-compose (internal planning refs don't belong in shipped code)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Worker resolves BUFFER_DIR from AI_MEMORY_INSTALL_DIR (defaults to
~/.ai-memory inside container → /home/classifier/.ai-memory) but
the volume is mounted at /app/trace_buffer. Setting the env var
to /app aligns the code path with the mount point.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Chainguard distroless MinIO image has no wget or curl. Use bash
built-in /dev/tcp for TCP connectivity check instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Same Next.js binding issue as langfuse-web — worker binds to
container IP instead of 0.0.0.0, causing healthcheck on
127.0.0.1:3030 to fail with "Connection refused".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Langfuse v3 LANGFUSE_INIT_* env vars are silently ignored when the
Postgres DB already exists from a previous install attempt. Added
verify_bootstrap() that checks project existence via API after health
check. On failure, auto-cleans volumes and restarts (max 1 retry).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Auto-fixed lint violations: unused imports, unsorted imports, unused
variables, contextlib.suppress pattern, black formatting. Resolves
CI lint gate failure (10 violations in 4 Langfuse files).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Langfuse frontend rejects .local TLD emails and hex-only passwords.
Changed admin email to admin@example.com and password generation to
include uppercase prefix + special char (meets Langfuse complexity).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Langfuse INIT creates the admin user with email_verified=NULL and
admin=false. The NULL email_verified blocks browser login even though
API auth works. Add _fixup_init_user() to set both fields after
bootstrap verification. Also fix volume names in recovery path
(underscore separator, not dash).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
langfuse_setup.sh wrote LANGFUSE_ENABLED, PUBLIC_KEY, SECRET_KEY to
.env but not BASE_URL, TRACE_HOOKS, or TRACE_SESSIONS. install.sh
exported all 6 vars from .env, setting the missing ones to empty
strings which overrode generate_settings.py defaults. Result: hooks
installed with empty Langfuse config — no traces produced.

Fix: (1) Write all 3 missing vars in langfuse_setup.sh setup_project_keys
(2) Only export non-empty values in install.sh loop

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…roject.toml

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace removed v2 methods (trace(), trace.span()) with v3 equivalents
(start_span(), span.update_trace(), span.end()) in trace_flush_worker.py
and langfuse_stop_hook.py. Use Langfuse.create_trace_id(seed=...) for
valid 32-hex trace IDs. Store historical start_time in metadata; convert
end_time to nanoseconds for span.end(). Add child spans with parent_span_id
for session turn hierarchy in stop hook.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…L to trace-flush-worker

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Apply stashed installer improvements from PM#79-87 testing rounds.
Most stash changes (env-var reads, timeouts, arithmetic fixes, safe
re-install, base-wins merge) were already in the langfuse branch.
Only remaining delta: add -L symlink checks to deploy_parzival_commands
to handle broken symlinks, plus conditional deployment logging.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add scripts/stack.sh v1.1.0 for start/stop/restart/status/nuke of the
full AI Memory Docker stack. Handles both compose files in correct order
(core first for start, Langfuse first for stop) to prevent the network
conflict identified in BUG-148.

Two rounds of adversarial Opus code review — all findings resolved:
- Token masking (never leak GITHUB_TOKEN to stdout)
- Conditional --env-file (graceful when .env is absent)
- All profiles covered in stop/nuke (monitoring, github, testing)
- Non-interactive safety guard on nuke confirmation
- Partial-start user guidance when Langfuse fails

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…USE_ENABLED

BUG-149: trace-flush-worker runs as UID 1001 (Dockerfile.worker USER
classifier) but buffer files are written by host hooks as UID 1000.
Add user: directive to match classifier-worker and github-sync pattern.

BUG-150: classifier-worker missing LANGFUSE_ENABLED env var. The
emit_trace_event() kill-switch in trace_buffer.py defaults to false,
so 9_classify spans (BUG-146 fix) never fire. Pass through host setting
with false default for backward compatibility.

Reviewed by: Opus adversarial (0B/0H/0M/0L) + Sonnet functional (7/7 PASS)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tests 1, 2, 5 mocked old v2 API (client.trace()) but BUG-145 migrated
production code to v3 (start_span/update_trace/end). Aligns test mocks
with actual API usage in trace_flush_worker.py:145-164.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The minio/mc Docker image sets mc as its ENTRYPOINT, so passing
sh -c "..." as arguments was silently failing. Added --entrypoint sh
to override. Also removed grep -v pipe that caused false-positive
warnings on successful bucket creation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
BUG-150 originally added emit_trace_event("9_classify") to
classification_worker.py, but the classifier container runs
process_classification_queue.py. This commit fixes 3 issues:

- Add emit_trace_event to process_classification_queue.py (the actual
  entry point) covering success, skipped, and error code paths
- Add trace_buffer volume mount + AI_MEMORY_INSTALL_DIR env var to
  classifier-worker in docker-compose.yml so it can write to the
  shared trace buffer directory
- Fix langfuse_setup.sh model registration idempotency check to
  paginate through all API pages instead of only reading page 1

Verified: 9_classify span confirmed in Langfuse API after classifier
processes a queue item. All 9 pipeline steps now flow end-to-end:
1_capture → 2_log → 3_detect → 4_scan → 5_chunk → 6_embed → 7_store
→ 8_enqueue → 9_classify

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Hidden-History Hidden-History changed the title feat: Langfuse v3 Phase 1+4 — Infrastructure + Session Tracing feat: v2.0.7 Langfuse LLM Observability — All 5 Phases + 22 Bug Fixes Feb 24, 2026
WB Solutions and others added 2 commits February 24, 2026 05:14
… guide

- Update version badge to 2.0.7, add Langfuse badge
- Add LLM Observability to Key Features section
- Add V2.0.7 release section with 8 feature highlights
- Add Langfuse integration section with quick-start guide
- Update architecture diagram with Langfuse service tree
- Create docs/LANGFUSE-INTEGRATION.md with full setup, architecture,
  pipeline spans, session grouping, troubleshooting, and security docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…lity callouts

- CHANGELOG.md: v2.0.7 section with Langfuse phases, 20+ bug fixes, stack.sh
- INSTALL.md: stack.sh as primary stack management, Langfuse optional callouts,
  updated uninstall for Langfuse volumes
- README.md: stack.sh in Quick Start, Langfuse marked optional, service count fix
- docker/README.md: Langfuse services section, compose files docs, replaced stale
  "Monitoring (Future)" with current state

Adversarial review: 12 findings (1B+3H+3M+4L), all blocking/high/medium fixed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Hidden-History Hidden-History changed the title feat: v2.0.7 Langfuse LLM Observability — All 5 Phases + 22 Bug Fixes v2.0.7: Langfuse LLM Observability (Optional) + Unified Stack Management Feb 24, 2026
@Hidden-History Hidden-History merged commit a68a6a0 into main Feb 24, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant