Skip to content

Enhance project and files flow with new features and fixes#2

Merged
DioCrafts merged 190 commits into
DioCrafts:mainfrom
unnamedlab:feature/big-refactoring
Apr 30, 2026
Merged

Enhance project and files flow with new features and fixes#2
DioCrafts merged 190 commits into
DioCrafts:mainfrom
unnamedlab:feature/big-refactoring

Conversation

@unnamedlab
Copy link
Copy Markdown
Contributor

This pull request introduces several infrastructure and CI improvements, most notably replacing Redis with Valkey, removing Qdrant in favor of Vespa, and adding new CI linting workflows for event bus, Kafka, Ceph, and Prometheus contracts. It also updates documentation and environment templates to reflect these changes and clarifies service ownership.

Infrastructure and Service Changes:

  • Redis replaced by Valkey: The Redis container is replaced with Valkey 8 (an OSS fork), including renaming the Compose service, volumes, and related environment variables. The Rust client remains compatible as Valkey uses the same protocol. Migration instructions are provided in the changelog. (.env.example, .github/workflows/ci.yml, CHANGELOG.md, [1] [2] [3] [4]
  • Qdrant removal and Vespa adoption: Qdrant is removed due to OSS license restrictions, with Vespa (Apache-2.0) noted as the future replacement. Environment variables and documentation for Qdrant are removed, and Vespa variables are added. Pgvector is now used for embedded vector search. (.env.example, CHANGELOG.md, [1] [2]

CI and Linting Enhancements:

Documentation and Ownership Updates:

  • CODEOWNERS and environment documentation: Updates service ownership to reflect the split of event-bus crates and clarifies environment variable usage for search and vector DB services. (.github/CODEOWNERS, [1] [2] [3]

These changes modernize the stack, improve compliance with OSS licenses, and strengthen CI contract enforcement for infrastructure and service dependencies.

Copilot AI and others added 30 commits April 29, 2026 10:08
Agent-Logs-Url: https://github.com/unnamedlab/OpenFoundry/sessions/1033f588-8056-44f0-bbaa-e42059940de8

Co-authored-by: unnamedlab <272794385+unnamedlab@users.noreply.github.com>
Agent-Logs-Url: https://github.com/unnamedlab/OpenFoundry/sessions/1033f588-8056-44f0-bbaa-e42059940de8

Co-authored-by: unnamedlab <272794385+unnamedlab@users.noreply.github.com>
Agent-Logs-Url: https://github.com/unnamedlab/OpenFoundry/sessions/91406a4c-cf7c-4516-9ed4-61a84b4c8eb0

Co-authored-by: unnamedlab <272794385+unnamedlab@users.noreply.github.com>
Agent-Logs-Url: https://github.com/unnamedlab/OpenFoundry/sessions/64b5de12-e1be-4a31-a59f-0b523bd5d133

Co-authored-by: unnamedlab <272794385+unnamedlab@users.noreply.github.com>
Agent-Logs-Url: https://github.com/unnamedlab/OpenFoundry/sessions/64b5de12-e1be-4a31-a59f-0b523bd5d133

Co-authored-by: unnamedlab <272794385+unnamedlab@users.noreply.github.com>
Agent-Logs-Url: https://github.com/unnamedlab/OpenFoundry/sessions/5dc086c8-7275-4062-8436-72d3e0d0b8c5

Co-authored-by: unnamedlab <272794385+unnamedlab@users.noreply.github.com>
Agent-Logs-Url: https://github.com/unnamedlab/OpenFoundry/sessions/64b5de12-e1be-4a31-a59f-0b523bd5d133

Co-authored-by: unnamedlab <272794385+unnamedlab@users.noreply.github.com>
…-and-import-data

Fix root layout locale initialization warning
Agent-Logs-Url: https://github.com/unnamedlab/OpenFoundry/sessions/1dd177ad-1230-4a32-991d-30f067155022

Co-authored-by: unnamedlab <272794385+unnamedlab@users.noreply.github.com>
Agent-Logs-Url: https://github.com/unnamedlab/OpenFoundry/sessions/982e2a4f-9c63-47e2-bff5-48ddc70ddb32

Co-authored-by: unnamedlab <272794385+unnamedlab@users.noreply.github.com>
Agent-Logs-Url: https://github.com/unnamedlab/OpenFoundry/sessions/1dd177ad-1230-4a32-991d-30f067155022

Co-authored-by: unnamedlab <272794385+unnamedlab@users.noreply.github.com>
Agent-Logs-Url: https://github.com/unnamedlab/OpenFoundry/sessions/1dd177ad-1230-4a32-991d-30f067155022

Co-authored-by: unnamedlab <272794385+unnamedlab@users.noreply.github.com>
Agent-Logs-Url: https://github.com/unnamedlab/OpenFoundry/sessions/2d778c9f-d612-48d0-bcb2-9d4945a5043c

Co-authored-by: unnamedlab <272794385+unnamedlab@users.noreply.github.com>
Agent-Logs-Url: https://github.com/unnamedlab/OpenFoundry/sessions/982e2a4f-9c63-47e2-bff5-48ddc70ddb32

Co-authored-by: unnamedlab <272794385+unnamedlab@users.noreply.github.com>
Agent-Logs-Url: https://github.com/unnamedlab/OpenFoundry/sessions/1dd177ad-1230-4a32-991d-30f067155022

Co-authored-by: unnamedlab <272794385+unnamedlab@users.noreply.github.com>
…ionality

Persist project folders in the backend and enforce space-scoped project creation
Agent-Logs-Url: https://github.com/unnamedlab/OpenFoundry/sessions/fad07448-800c-49eb-a654-f3806b7ca2b2

Co-authored-by: unnamedlab <272794385+unnamedlab@users.noreply.github.com>
Agent-Logs-Url: https://github.com/unnamedlab/OpenFoundry/sessions/fad07448-800c-49eb-a654-f3806b7ca2b2

Co-authored-by: unnamedlab <272794385+unnamedlab@users.noreply.github.com>
Agent-Logs-Url: https://github.com/unnamedlab/OpenFoundry/sessions/539cb9d2-bd68-4b4c-a68e-c81463021adf

Co-authored-by: unnamedlab <272794385+unnamedlab@users.noreply.github.com>
Agent-Logs-Url: https://github.com/unnamedlab/OpenFoundry/sessions/1f2a8b28-06ab-4301-91a9-c5f545436646

Co-authored-by: unnamedlab <272794385+unnamedlab@users.noreply.github.com>
Agent-Logs-Url: https://github.com/unnamedlab/OpenFoundry/sessions/1f2a8b28-06ab-4301-91a9-c5f545436646

Co-authored-by: unnamedlab <272794385+unnamedlab@users.noreply.github.com>
Agent-Logs-Url: https://github.com/unnamedlab/OpenFoundry/sessions/1f2a8b28-06ab-4301-91a9-c5f545436646

Co-authored-by: unnamedlab <272794385+unnamedlab@users.noreply.github.com>
…ionality-again

Align Ontology Manager with Foundry branch creation and datasource-backed object wizard flow
Bootstrap first-install onboarding and promote the first registered user to admin
Agent-Logs-Url: https://github.com/unnamedlab/OpenFoundry/sessions/69292802-9a5f-4311-94e2-0083f06ae5ad

Co-authored-by: unnamedlab <272794385+unnamedlab@users.noreply.github.com>
Agent-Logs-Url: https://github.com/unnamedlab/OpenFoundry/sessions/08ccec54-082a-4221-bc39-276cd4811414

Co-authored-by: unnamedlab <272794385+unnamedlab@users.noreply.github.com>
Copilot AI and others added 19 commits April 30, 2026 09:33
…g T17 audit gap

Agent-Logs-Url: https://github.com/unnamedlab/OpenFoundry/sessions/da69e7eb-5294-4f0f-8243-6eb251188a0d

Co-authored-by: unnamedlab <272794385+unnamedlab@users.noreply.github.com>
…sitory

Add 7 Grafana dashboards to close T17 observability gap
Agent-Logs-Url: https://github.com/unnamedlab/OpenFoundry/sessions/854f956d-c726-4be6-aaed-cf52389ce8e8

Co-authored-by: unnamedlab <272794385+unnamedlab@users.noreply.github.com>
infra(helm): wire remaining in-chart services to CNPG `*-pg-app` Secrets (T13)
…proof of concept

Agent-Logs-Url: https://github.com/unnamedlab/OpenFoundry/sessions/dd5cad1b-30db-4695-b6f4-37b33bee3ce2

Co-authored-by: unnamedlab <272794385+unnamedlab@users.noreply.github.com>
Add PoC documentation for Aviation/MRO proof of concept
…ng; es-ES wording

Agent-Logs-Url: https://github.com/unnamedlab/OpenFoundry/sessions/f1df61e8-6e8e-4057-8f8c-24db634cc50f

Co-authored-by: unnamedlab <272794385+unnamedlab@users.noreply.github.com>
Eliminate hardcoded JWT secrets; auto-generate and persist on first boot
…a-plane-another-one

Rook-Ceph: enforce mon/mgr AZ topology spread + add ceph-lint contract
@DioCrafts DioCrafts merged commit 27faf4a into DioCrafts:main Apr 30, 2026
9 of 19 checks passed
tant pushed a commit to tant/OpenFoundry that referenced this pull request May 18, 2026
…gMap (ADR-0045 Phase C.6)

Phase C.6 of ADR-0045 — the closing PR of Phase C. Migrates the dev
manifests in infra/dev/ from SparkApplication CRs that ran the
Scala pipeline-runner-spark JAR to batch/v1 Job + ConfigMap pairs
that run the Go pipeline-runner (Phase C.5) over a structured
pipelineplan.Plan (Phase C.1, executed by Phase C.2 against Iceberg).

Runner extension:

- internal/runner/args.go: new --plan-file flag. internal/runner/run.go
  resolves the Plan from --plan-file > PIPELINE_PLAN_FILE env var >
  PIPELINE_PLAN_B64 env var (the dispatcher's path). Dev YAMLs use
  the file path so the Plan JSON stays readable in `kubectl describe
  configmap`; the dispatcher (Phase C.4.a) keeps using the env var.
- Tests added for loadPlanFromFile happy path + missing file; one
  existing test updated for the new "no Plan source" error message.

infra/dev manifests:

- poc-pipeline-nodes.yaml fully rewritten. Three pipelines ship as
  Job + ConfigMap pairs:
  * online-retail-clean (read → filter quantity>0 AND price>0 → project
    + derived revenue → write_table create_or_replace)
  * online-retail-returns (same shape with WHERE quantity<0)
  * online-retail-cust (read transactions_clean → aggregate group_by
    customer_id with sum(revenue), count_distinct(invoice),
    count_distinct(country) → write_table create_or_replace)
- spark-smoke.yaml → pipeline-runner-smoke.yaml: trivial one-row
  read_table → limit 1 → write_table(append) for catalog
  connectivity validation.
- spark-ingest-online-retail.yaml deleted. CSV DataSource API is
  the Phase 0 inventory's flagged deferred case (re-route via
  connector-management-service); no runner-side replacement.
- spark-rbac.yaml deleted. Spark Operator goes away in Phase D; the
  pipeline-runner Jobs use the standard `pipeline-runner` service
  account.

online-retail-anomalies is intentionally NOT migrated:
the CTE + CROSS JOIN shape needs the v2 `join` operator
(libs/pipeline-plan only ships union in v1, per ADR-0045 § Migration
plan / Phase C). When `join` lands the pipeline becomes a two-stage
decomposition; the inventory doc tracks it.

Documentation:

- services/pipeline-runner/README.md fully rewritten for the Phase
  C surface (Plan-source priority order, CLI flags, env fallbacks,
  Phase A discoveries bundled with IcebergReader, distroless image
  size delta, dev manifest pointers).
- services/pipeline-runner-spark/README.md gains a SUPERSEDED
  header pointing at the new flow + migration map for the three
  Scala mains (PipelineRunner → pipeline-runner; IcebergToObjectStoreIndexer
  → iceberg-object-indexer (Phase A); ActionLogStreamSink →
  action-log-sink (Phase B)).
- docs/migration/pipeline-runner-spark-to-go-inventory.md gains a §G
  with the sub-PR map and the pre-requisite checklist showing what's
  done (DioCrafts#1, DioCrafts#3) vs what's still pending (DioCrafts#2 CSV re-route, DioCrafts#4 live
  smoke).

Test plan:
- go vet ./... clean across the full repo.
- go test -count=1 ./services/pipeline-runner/... green (all three
  packages; new loadPlanFromFile tests + the updated empty-env
  assertion).
- The dev YAMLs themselves are validated by kubectl apply against
  the dev k3s cluster — that smoke is the Phase D entry gate per
  ADR-0045 § Migration plan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DioCrafts pushed a commit that referenced this pull request May 20, 2026
…lth probe

Closes B04 acceptance criteria #1, #2, #3 (backend half), #4, #6.
UI work (#3 dropdown, #5 admin view) deferred to a separate ticket.

Schema (Postgres + idempotent re-deploy):
- 20260520120000_llm_models_quotas_features.sql adds `quotas JSONB`
  + `enabled_for_features TEXT[]` with GIN indices on both, plus a
  GIN index on capabilities so the new filter is sub-millisecond.
- 20260520120100_llm_models_seed_demo.sql inserts ollama/llama-3.1-70b
  and azure/gpt-4o with deterministic UUIDs + ON CONFLICT DO NOTHING.
  Both are enabled for aip-chatbot, ai-analyst, document-ai so the
  Chatbot Studio dropdown can switch between them without admin
  intervention.

Models:
- New `CapabilityChat` enum value — the Chatbot Studio dropdown's
  discriminator. NormalizeCapability handles any-case input.
- New `ProviderAzure` — Azure OpenAI Service uses a different
  endpoint scheme than vanilla OPENAI.
- New `Quotas` struct (requests_per_minute, tokens_per_minute,
  max_concurrent_requests, daily_token_budget,
  daily_cost_budget_usd_cents) — flat JSONB column so adding a
  dimension never needs another migration.
- New `UpdateModelRequest` (all-optional fields) for PATCH.

Repo:
- New `ListFilter` carries provider + capability + feature +
  only_enabled. PgStore uses `$N = ANY(...)` against the GIN-indexed
  array columns; MemoryStore mirrors with helper containers.
- New `Update(rid, req)` builds the SET list dynamically so PATCH
  with absent fields never overwrites existing values.

Handlers:
- `Catalog.UpdateModel` on PATCH /api/v1/llm/models/{rid} — flips
  enabled (and any other field) without a service restart.
- ListModels accepts ?capability=… and ?feature=…
- providersHint + capabilitiesHint updated to include AZURE/CHAT.

Provider health (B04 §AC#6):
- New `internal/providers.Prober` polls each configured upstream's
  liveness endpoint (Ollama /api/tags, OpenAI /v1/models, Azure
  /openai/models?api-version=..., Anthropic /v1/messages) every
  30 s. Status maps to ok / degraded (auth fail or latency > 2 s) /
  down / unknown.
- GET /api/v1/llm/providers/health returns the snapshot the UI
  badges off. Empty-upstream config leaves the route unmounted so
  the service stays single-process in CI.

Config:
- New AZURE_OPENAI_API_KEY, AZURE_OPENAI_BASE_URL,
  LLM_PROVIDER_HEALTH_INTERVAL_SECONDS,
  LLM_PROVIDER_HEALTH_DEGRADE_AFTER_MS env knobs.

Edge gateway:
- /api/v1/llm/* now routes to LLMCatalog alongside the existing
  /api/v1/ai/providers rule.

Tests:
- catalog_test.go: 6 new cases (Azure provider, capability+feature
  filter, unknown-capability rejection, PATCH enabled flip,
  PATCH partial fields, PATCH 404).
- providers/health_test.go: 6 cases (reachable→ok, unreachable→down,
  auth-401→degraded, high-latency→degraded, missing-baseurl→unknown,
  snapshot sorted by provider).
- repo/models_test.go: 5 cases on MemoryStore (round-trip quotas +
  features, capability + feature filter, partial PATCH, quotas/
  features wholesale replace, 404 on PATCH).
- repo/models_integration_test.go (integration tag, testcontainers
  Postgres): 4 cases — round-trip on a real DB, GIN-indexed filters,
  PATCH persistence, seed-migration idempotency.
DioCrafts pushed a commit that referenced this pull request May 20, 2026
…ed.v1

Closes B05 acceptance #1, #2, #3, #4, #6 + the SLA escalation slice.
Workshop-side wiring (B01) is unblocked: any action.executed event now
fans out to subscription-based channels and into the Approvals inbox.

Backend (notification-alerting-service):
- New migration 0002 adds notification_subscriptions /
  notification_events / notification_event_deliveries with GIN-style
  partial indices for the worker pull (status IN (pending, retrying)
  AND scheduled_at <= now()) and SLA scan.
- New `internal/models/subscription.go` with Subscription /
  CreateSubscriptionRequest / UpdateModel… / Event / Delivery wire
  shapes. HMACSecret is never serialised — HasHMACSecret boolean
  surfaces the fact without the value.
- New `internal/repo/subscriptions.go` with PgStore CRUD +
  MatchingSubscriptions + InsertEvent/GetEvent + InsertDelivery/
  ListDeliveries + ClaimDueDeliveries (FOR UPDATE SKIP LOCKED so
  concurrent workers don't double-attempt) + ClaimSLABreaches +
  MarkDeliveryAttempt + MarkEscalated.
- New `internal/service/fanout.go`:
  - Dispatcher.Submit persists the event, creates one Delivery per
    matching enabled subscription, and eagerly writes in-app rows
    into the existing notifications inbox so the Approvals feed
    sees them without a worker tick.
  - Worker.Tick claims due deliveries and runs the per-channel
    adapter. Webhook channel signs the body with HMAC-SHA256 when
    hmac_secret is set (header X-OpenFoundry-Signature: sha256=<hex>)
    and retries with exponential backoff (200 ms → 30 s) up to
    max_attempts; final failure leaves status=failed (DLQ visible
    via GET /events/{id}/deliveries).
  - Worker.SLATick claims rows whose sla_due_at has passed without
    escalation and fires SLAEscalationHook. LoopbackEscalator
    re-submits the original event with `.escalated.v1` suffix so
    escalation_target subscribers (manager email, etc.) react.
- New handlers + routes:
  - GET/POST/DELETE /api/v1/notifications/subscriptions[/{id}]
  - POST /api/v1/notifications/events
  - GET /api/v1/notifications/events/{id}/deliveries
  - POST /internal/events (no-auth producer surface for
    ontology-actions-service / workflow-automation-service,
    restricted at the network layer like /internal/notifications).
- main.go wires the worker + LoopbackEscalator goroutine.

Producer hook (libs/ontology-kernel/handlers/actions):
- execute.go calls emitActionEvent alongside emitActionNotifications
  on every successful action execute.
- side_effects.go::emitActionEvent POSTs to /internal/events with
  event_type=action.executed.v1 carrying the action + actor + target
  context. Subscriptions on action.executed.v1 (in-app, webhook to
  MRO, SLA escalator) receive the fan-out without per-action config.

UI (apps/web):
- New ApprovalsPage at /approvals (replaces the AuditPage squat —
  /audit now points at the real audit page) is a two-pane:
  inbox on the left with ✓ Acknowledge per row, per-event delivery
  audit on the right (webhook/in-app/email status, retry count,
  last_error, escalation_at — failed rows ARE the DLQ surface).
- lib/api/notifications.ts extended with Subscription / Delivery /
  Event types and list/create/delete/submit/listDeliveries helpers.

Tests:
- service/fanout_test.go: 6 cases (HMAC body match, no header when
  no secret, non-2xx propagates, retry-until-success, template
  precedence, backoff schedule clamps).
- service/fanout_integration_test.go (//go:build integration,
  testcontainers Postgres): 5 cases (fan-out by event_type,
  eager in-app row, worker delivers webhook with HMAC, retries
  until DLQ, SLA escalator re-emits .escalated.v1 to the listener).
- lib/api/notifications.test.ts: 5 cases on the new wire helpers.
- routes/approvals/ApprovalsPage.test.tsx: 4 cases (inbox rows,
  delivery audit on selection, legacy notification hint, ack).
DioCrafts pushed a commit that referenced this pull request May 20, 2026
…surface

Closes B06 acceptance #1, #2, #3, #5 + the Reader/Writer concrete
implementations the pipeline-runtime needed. Spark transactional
runs (AC#4) and the 90 s cold-time benchmark (AC#6) are
infra-level scope and out of this commit.

Pipeline runtime (libs/pipeline-runtime):
- iceberg_writer.go: IcebergHTTPWriter POSTs AppendBatch to
  /openfoundry/iceberg/v1/append. Schema fetched once per table via
  GET /iceberg/v1/namespaces/{ns}/tables/{t} and cached. Partition
  transform + sort order default to identity(id)/id ASC. ProjectRID
  + auth header propagated.
- iceberg_reader.go: IcebergHTTPReader pages through
  /openfoundry/iceberg/v1/scan, yielding rows lazily via RowStream.
  Default page size 10k; pinning SnapshotID forwards the catalog's
  resolver. Honours ctx cancellation between pages.
- lineage_writer.go: LineageWriter decorator wraps an inner Writer
  and POSTs an OpenLineage RunEvent
  (eventType=COMPLETE / FAIL) to lineage-service after each Write.
  Best-effort: emit failure invokes OnEmitError without propagating.
  WithInputs returns a copy so callers can attach upstream datasets
  per call without mutating the receiver.

Iceberg catalog (services/iceberg-catalog-service):
- migrations/20260520130000_iceberg_table_rows.sql: new
  iceberg_table_rows table backs the Phase-B append/scan path with
  GIN-style partial indices for the scan resolver. Production
  Parquet writers swap the InsertRowsForSnapshot helper.
- repo/table_rows.go: InsertRowsForSnapshot (bulk + idempotent via
  PRIMARY KEY) and ScanRows (resolves snapshot via main/master
  branch ref, falls back to latest by sequence_number).
- handlers/append.go now persists rows after CommitTable; response
  echoes the resulting snapshot_id.
- handlers/scan.go: new GET /openfoundry/iceberg/v1/scan with
  snapshot_id / limit / offset.
- Store interface + fakes updated.

Pipeline build service (services/pipeline-build-service):
- migrations/20260520140000_dataset_health.sql: new
  dataset_health_events table with per-(dataset, check_name) trend
  history.
- handler/dataset_health.go: RecordEvent + LatestPerCheck + Recent +
  Get/Record HTTP handlers. Overall status is degraded when ANY
  latest check is degraded.
- server.go now exposes:
  - GET /api/v1/datasets/{rid}/health/events — rollup
  - POST /internal/datasets/{rid}/health/events — producer surface
  Wired via the new NewWithDeps + BuildRouterWithDeps entry points
  so main.go threads the pgxpool.

Edge gateway (services/edge-gateway-service):
- /api/v1/datasets/*/health/events routes to PipelineBuild while
  the singular /health stays on DatasetVersioning — the two
  surfaces coexist without conflict.

UI (apps/web):
- lib/api/datasetHealthEvents.ts: typed client mirroring the new
  Go DatasetHealthSummary shape.
- lib/components/health/CheckEventsPanel.tsx: rolls up overall
  status, latest per check, and recent events feed. Refetches every
  30 s; renders nothing on fetch error so the existing HealthTab
  fallback stays visible.
- routes/datasets/DatasetDetailPage.tsx: panel wired into HealthTab
  beside the existing snapshot view.

Tests:
- libs/pipeline-runtime: 5 writer cases (POST shape, no-op on empty
  rows, surfaces catalog 4xx, schema cache, BaseURL guard), 5
  reader cases (paging, snapshot/project headers, 404 propagates,
  empty BaseURL guard, ctx cancellation), 5 lineage decorator cases
  (COMPLETE on success, FAIL on inner error, no-op when URL empty,
  best-effort emit failure, WithInputs copy).
- iceberg-catalog-service: append+scan handlers compile against the
  extended Store interface. 4 new integration scenarios against a
  real Postgres (insert+scan, new-snapshot-replaces-old visibility,
  limit/offset paging, unknown-snapshot sentinel).
- pipeline-build-service: severity/status validation rejects bad
  enum tokens.
- apps/web: 3 CheckEventsPanel cases (latest grid + recent feed,
  passing badge, fetch-error fallback).
DioCrafts pushed a commit that referenced this pull request May 20, 2026
Closes B07 acceptance #1, #2, #3, #4, #5, #6 + plumbs #7 (the
"schedule a B-check on N12345" demo collapses once the seed data
hits the demo). The agent runtime now has every seam in place:
budget-aware ReAct loop, real tool router with JWT pass-through,
catalog-resolved model selection, document upload + retrieval.

Agent runtime (services/agent-runtime-service):
- migrations/20260520150000_threads.sql adds threads,
  thread_messages, thread_traces with per-thread budgets
  (max_tool_calls=6, max_prompt_tokens=16000 by default) and a
  state machine for the trace kinds.
- internal/repo/threads.go owns CRUD + atomic AppendMessage
  (MAX(position)+1 inside a tx) + AppendTraceStep + ListTrace.
- internal/models/threads.go declares Thread / ThreadMessage /
  ThreadTraceStep / ToolDefinition / ToolKind (object_query,
  action, function, retrieval, command, request_clarification).
- internal/react/runner.go drives the loop: builds an
  OpenAI-compatible Invoke from the thread's history + tool
  manifest, dispatches tool calls via [ToolRouter] (propagating
  the caller's JWT verbatim per AC#6), records every plan / tool
  call / observation / final / error / budget_exhausted into
  thread_traces. Pre-checks the prompt-token budget and the
  tool-call counter; overshoot emits a graceful "budget exhausted"
  assistant message.
- internal/react/clients.go: HTTPLLMClient POSTs to llm-catalog
  /api/v1/llm/invoke with the thread's model_rid (AC#5) and
  re-interprets the response as either final or a tool_call (JSON
  shape). HTTPToolRouter dispatches per-kind to
  object-database-service /objects/query, ontology-actions-service
  /actions/{id}/execute, retrieval-context-service /retrieval/search.
  403/401 responses surface as observations, not errors (AC#6).
- internal/handlers/threads.go is the HTTP shell wired in
  server.go via NewWithDeps + BuildRouterWithDeps:
  - POST/GET/DELETE /api/v1/agent-runtime/threads[/{id}]
  - GET/POST /api/v1/agent-runtime/threads/{id}/messages
  - GET /api/v1/agent-runtime/threads/{id}/trace
- main.go wires the runner when LLM_CATALOG_SERVICE_URL is set;
  ObjectDatabaseURL / OntologyActionsURL / RetrievalURL enable
  the corresponding tool kinds.

Retrieval context (services/retrieval-context-service):
- migrations/0002_knowledge_documents.sql adds knowledge_documents
  + knowledge_document_chunks with a 15-dim PoC embedding column.
- internal/handlers/knowledge.go: POST /api/v1/retrieval/documents
  chunks on whitespace (≤1200 chars) and writes a hash-based BoW
  signature per chunk. POST /api/v1/retrieval/search scores with
  cosine + lexical-overlap boost. The shape is right; production
  swaps the embedder to libs/ai-kernel-go/embeddings without
  touching the surface or schema.

Edge gateway:
- /api/v1/retrieval/* and /api/v1/document-intelligence/* route to
  RetrievalContext.

UI (apps/web):
- lib/api/threads.ts: typed wire client mirroring the Go shapes.
- routes/ai/ThreadsPage.tsx: replaces the previously-mocked page
  with a real three-pane layout: thread list (auto-selects, 30 s
  poll), message stream + composer, ReAct trace + document
  uploader (5 s poll on trace).

Tests (all green):
- agent-runtime react/runner_test.go: 6 cases (final-no-tool,
  tool-call-then-final + JWT propagation, step budget, prompt-token
  budget, unknown-tool fallthrough, LLM transport error).
- agent-runtime react/clients_test.go: 7 cases (LLM final, tool-call
  JSON, non-2xx, object-query path/auth, action path/auth,
  forbidden-as-observation, unconfigured-endpoint friendly response).
- retrieval-context handlers/knowledge_test.go: 7 cases (chunk
  splitting, deterministic + unit embedding, cosine identity,
  cosine separation, lexical boost, sort).
- apps/web threads.test.ts: 7 client method assertions.
- apps/web ThreadsPage.test.tsx: 4 UI cases (list + budgets, trace
  pane, empty state, "+ New" creates a thread).
seonghoaidencho pushed a commit to seonghoaidencho/OpenFoundry that referenced this pull request May 20, 2026
Closes G4 of B03 / acceptance criterion DioCrafts#2: the Ontology Manager
"Indexing" sub-tab can now ask "is Aircraft caught up?" without going
to the search backend.

- `internal/status.Tracker` keeps per-(tenant, object_type) counters
  (indexed/deleted, last_indexed_at, last_event_time) updated by the
  runtime projector whenever an `OutcomeIndexed`/`OutcomeDeleted`
  outcome commits. In-process only; Postgres-backed survivability is
  a follow-up if the demo needs it.
- `GET /api/v1/ontology-indexer/status?objectType=...&tenant=...`
  returns `{indexed_count, deleted_count, last_indexed_at,
  last_event_time, lag_seconds}` where lag = max(0, indexed_at -
  event_time) (ETL-style delay). Omitting `objectType` returns the
  full per-(tenant,type) list, sorted.
- edge-gateway routes `/api/v1/ontology-indexer/*` to the new
  `OntologyIndexer` upstream, registered ahead of the catch-all
  `/api/v1/ontology` rule.

Tests: tracker (incl. concurrent record), handler (happy path + zero
state + cross-tenant aggregation + list), runtime wiring proving
`RunWithOptionsAndTracker` increments counters after the projector
commits.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants