Skip to content

feat/mcp audit response#7

Merged
Koopa0 merged 12 commits into
mainfrom
feat/mcp-audit-response
Apr 24, 2026
Merged

feat/mcp audit response#7
Koopa0 merged 12 commits into
mainfrom
feat/mcp-audit-response

Conversation

@Koopa0
Copy link
Copy Markdown
Owner

@Koopa0 Koopa0 commented Apr 24, 2026

No description provided.

Koopa0 and others added 12 commits April 24, 2026 16:21
VPS deploy dir is ~/koopa0.dev, not ~/koopa. Previous path would have
failed on any CI-triggered deploy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add three grouped-aggregate queries (stats, concept distribution,
category distribution) plus a LastEndedSession lookup to back the
session_progress MCP tool. Rowsets are per-session (typically 2-10
attempts, 5-30 observations) and covered by existing
idx_learning_attempts_session + idx_learning_attempt_observations_attempt;
no new indexes.

CategoryDist's CASE sort with ELSE 99 routes future signal_type enum
values to the sort tail rather than silently dropping them. The
observation_count alias reference in ORDER BY is documented inline
because it is a PostgreSQL non-standard extension that a reader
unfamiliar with Postgres would otherwise flag.

Store DTOs convert bigint counts to int64 end-to-end — no narrowing
at the Go boundary.
ReadOnly MCP tool returning in-session aggregate state for the
currently-active learning session. Coach invokes between attempts to
make interleaving / retrieval-practice decisions based on paradigm
distribution, concept slug distribution, and observation category
distribution.

{active: false} path returns last_ended_session_id + last_ended_at as
an affordance (not a fallback) so the caller can pivot to
attempt_history(session_id=...) without a separate lookup. Scope is
deliberately single-mode; session_delta remains the 24h pan-feature
tool (mcp-decision-policy §10 — distinct entity, distinct workflow).

Concept kind distribution intentionally omitted — kind is auto-assigned
'skill' for all session-created concepts and would produce trivial
noise. Tracked as HERMES W-10; tool description carries the disclaimer
so callers don't ask.

SessionID and StartedAt are pointers so {active: false} does not leak
zero-UUID or zero-time to JSON consumers. Other active-path fields
deliberately omit 'omitempty' so encoding/json emits every declared
key on the Active=true path — a JS consumer iterating
response.concept_slug_distribution must never hit undefined.
Four integration tests cover the plan's invariants:
- NoActive — {active: false} on a fresh DB; no last-ended affordance.
- ActiveEmpty — Active=true with zero attempts emits all distribution
  slices as empty (not nil), so the JSON wire format carries them as
  []. Adds a json.Marshal round-trip + key-presence probe guarding
  against omitempty regressions in SessionProgressOutput.
- WithAttempts — aggregates reflect SQL rollup; slug dist sorted by
  count DESC + slug ASC (all-tied ties break alphabetically); category
  dist emits weakness before mastery per the CASE sort.
- LastEndedSurfaced — after end_session, {active: false} carries the
  just-ended session id and ended_at as the affordance path, and
  aggregate fields stay zero (affordance, not fallback).
Application-assigned zero-based index per attempt. Enables coach-
insertion ordering on attempt_history reads — created_at ties within
a transaction (PG now() is txn-start-constant) and id is
gen_random_uuid (v4, random), so neither alone preserves insertion
order.

CHECK chk_learning_attempt_observations_position_nonneg rejects
negative values at the DB boundary. DEFAULT 0 is a schema-level
guard; application code always sets position explicitly from the
record_attempt request's array index.

Pre-production edit-in-place of migrations/001 per project memory
feedback_pre_production_edit_in_place — the VPS SQL patch to apply
on production lives at .agents/plan-b-production-sql.md (gitignored;
local reference for manual apply).
Remove the parallel MatchedObservation struct. Attempt now carries a
full Observations slice (ordered by position ASC) and, on concept-mode
queries, a MatchedObservationID pointer into that slice indicating
which observation drove the query match. Single authoritative shape —
no dual representation of the same entity.

SQL changes:
- AttemptsByConcept returns matched_observation_id instead of the
  four signal/category/severity/detail columns; the observation
  payload comes from the batched ObservationsByAttemptIDs fetch.
- ObservationsByAttemptIDs (new): batched fetch keyed by
  attempt_ids[], ordered by (attempt_id, position ASC). Single
  round-trip regardless of attempt count — IVL latency stays bounded.
- ObservationsByAttempt and CreateObservation both carry
  confidence + position end-to-end. Writer assigns position from
  the record_attempt array index.

Go changes:
- learning.Attempt gains Observations []Observation +
  MatchedObservationID *uuid.UUID; Matched + MatchedObservation
  struct deleted. Observation gains Position int32.
- Store.AttachObservations helper: batches the fetch and partitions
  by attempt_id, assigning empty (non-nil) slices to attempts with
  zero observations so downstream JSON consumers never hit nil.
- RecordObservation now takes a position parameter;
  processObservations in the MCP layer passes i (the array index
  in the request) so coach-written order survives the write.

Rationale (from learning-studio review on the shape decision):
dual representation of the same entity is a classic correctness
trap — if matched_observation.severity desynchronizes from
observations[i].severity the coach sees contradictory signal.
Pointer + single list guarantees one source of truth. The pointer
also exposes observation_id in the wire format, unblocking future
W-09 supersede_observation work.
Add include_observations *bool to AttemptHistoryInput (default true).
Handler branches on the flag to invoke the batched
Store.AttachObservations after the attempts query in all three modes.
concept_slug mode preserves matched_observation_id even when
include_observations=false — the pointer comes from the primary
query, not the observation fetch.

Tool description rewritten to (a) name all three modes crisply, (b)
declare the sort invariant per mode (target/concept DESC,
session_id ASC), (c) embed a concrete JSON example of the
include_observations=false behavior so a caller reading tools/list
sees the shape without digging into Go source.

Fixes the core learning-studio audit P0 #2: Improvement Verification
Loop's primary entry (target mode) was observation-blind; revisits
had to infer concept context from stuck_at text. Every mode now
carries the full observation list with the confidence label, closing
.claude/rules/mcp-decision-policy.md §5's read-side gap
('confidence is a label, not a gate').
Five tests cover the Plan B invariants:
- TargetMode_IncludesObservations records three observations with
  deliberately non-alphabetical concept slugs [z, a, m] to verify
  position-based sort returns them in coach-insertion order, not
  slug order. Also asserts confidence labels survive the round trip.
  Includes the B3 wall-time smoke guard (<150ms on a warm
  testcontainers DB) so IVL hot-path regressions surface quickly.
- ConceptMode_MatchedObservationID verifies the pointer points into
  the observations list at the queried concept's slot.
- SessionMode_Observations verifies the by-session path also
  attaches observations with confidence.
- IncludeObservationsFalse verifies the flag skips the batch fetch
  in both target and concept modes, AND that concept mode's
  matched_observation_id survives (the pointer dangles to an
  observation not in the response — documented in the tool
  description).
- SortInvariants records three attempts on one target and verifies
  target mode returns [3,2,1] (DESC) while session mode returns
  [1,2,3] (ASC) — the documented invariant learning-studio called
  out as needing an explicit test.
Three small-knife items from Plan A §3 small-knife scope, all
ship-together per Koopa's 'do updates thoroughly' direction:

1. ops.Meta.FieldEnums map[string][]string — structured enum
   advertising for fields the Go jsonschema tag mechanism cannot
   express. Applied in addTool post-processing: look up by tool name
   from ops.All(), walk schema.Properties, set .Enum on matching
   top-level properties. Clients reading tools/list now see valid
   values structurally without parsing Description prose (closes
   Plan A small-knife item 1 for v1 flat fields; nested observations[].*
   tracked as HERMES W-08).
   - record_attempt.outcome — canonical DB values plus every semantic
     synonym mapProblemSolving + mapImmersive accept.
   - learning_dashboard.view, learning_dashboard.confidence_filter.

2. record_attempt response echoes canonical_outcome. Caller sees the
   storage-form outcome (e.g. 'solved_with_hint') alongside their
   input (e.g. 'needed help') in one round-trip, so the coach doesn't
   have to reach into attempt.outcome to verify what got normalized.

3. learning_dashboard emits stable JSON shape across empty and
   populated views via a custom MarshalJSON. Every response is
   {view, total, <view_key>: [...]} — the view-specific array is
   always present (empty [] when the view has no data), other view
   keys are absent. Resolves the wire-shape divergence a JS consumer
   iterating response.mastery would hit as 'undefined'. Loud
   struct-level comment on LearningDashboardOutput names MarshalJSON
   as authoritative so future readers don't trust the json tags.
Seven new ReadOnly tools, one per entity type, each with a typed
input struct driving native JSON Schema required/description
metadata in tools/list:

  propose_goal
  propose_project
  propose_milestone
  propose_directive
  propose_hypothesis
  propose_learning_plan
  propose_learning_domain

Shared internal: proposeEntity(type, fields) extracted from the
Wave-1 propose_commitment handler. Every typed handler packs its
struct into the existing map[string]any shape and calls through —
zero duplication of resolve*Fields or signProposal. commit_proposal
stays unified since the signed token carries Type and the split is
propose-side only (discriminator-by-entity rule per
mcp-decision-policy §10).

propose_directive performs the SubmitTasks capability pre-check at
handler entry instead of at commit time, so unauthorized callers
fail fast without producing a signed token.

propose_commitment stays registered as a deprecation shim
(Stability: StabilityDeprecated) — forwards to proposeEntity, logs
a sunset warning, and prepends a caller-visible deprecation notice
to the Warnings array. Removal scheduled 2026-05-08; checklist at
.agents/shim-removal-checklist-2026-05-08.md (gitignored; local
reference for the removal PR).

propose_directive.priority advertises its enum via FieldEnums —
{high, medium, low} shows up in tools/list as structured schema,
not prose. Rationale per Koopa: 'machine-readable contract >
human-readable description'.
Four tests cover the Plan A flat-split invariants:

- ProposeGoal_CommitRoundTrip — propose_goal → commit_proposal
  round-trip persists the goal. Representative of the happy-path
  shape for all seven flat tools; the workhorse (proposeEntity) is
  shared so coverage of one is coverage of all on the structural
  dimension.

- ProposeDirective_CapabilityPreCheck — learning-studio (which
  lacks SubmitTasks) calling propose_directive receives a
  capability-denied error at propose time rather than at commit.
  Confirms the pre-check moved off commit_proposal.

- ProposeCommitmentShim_WarnsAndForwards — the deprecation shim
  still produces a valid token AND the response's Warnings array
  carries a notice naming the replacement tool (propose_goal) and
  the 2026-05-08 sunset date. Guards against silent shim breakage
  during the 2-week deprecation window.

- ToolsListAdvertisesEnums — structural probe against ops.All()
  confirming record_attempt.outcome and learning_dashboard.view
  carry non-empty FieldEnums. If the enum injection regresses the
  test fails at catalog layer before addTool runs.
golangci-lint v2 moved exclusion keys out of top-level `issues:` —
`exclude-dirs` and `exclude-rules` now live under
`linters.exclusions.paths` and `linters.exclusions.rules`
respectively. CI's `golangci-lint config verify` rejects the v1
shape under v2.11.4 with

  jsonschema: 'issues' does not validate with
    '/properties/issues/additionalProperties': additional properties
    'exclude-dirs', 'exclude-rules' not allowed

Move the two blocks without semantic change; `issues.max-issues-per-linter`
and `issues.max-same-issues` stay put (still valid in v2).

Tighter nolintlint semantics in v2 flag redundant `//nolint:gocognit`
and `//nolint:errcheck` directives on test files where the
exclusion rule already disables those linters. Strip the eight
now-redundant directives rather than loosen nolintlint — the
exclusion rule is blanket and specific directives for the same
linters add noise. Net behavior unchanged: `gocognit` /
`gocyclo` / `errcheck` stay disabled on `_test.go` via the
exclusion rule.
@Koopa0 Koopa0 merged commit 5b6a3d8 into main Apr 24, 2026
2 checks passed
@Koopa0 Koopa0 deleted the feat/mcp-audit-response branch April 24, 2026 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant