From cd18ca87ba1259213c351a60a8d65dbee7f1995d Mon Sep 17 00:00:00 2001
From: SyniRon <66834451+SyniRon@users.noreply.github.com>
Date: Thu, 28 May 2026 23:24:47 -0400
Subject: [PATCH] docs: seed CONTEXT.md, ADRs, and agent docs

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 CONTEXT.md                                    | 108 ++++++++++++++++++
 ...001-split-process-grpc-and-http-gateway.md |  31 +++++
 .../0002-intra-process-plaintext-grpc-dial.md |  24 ++++
 ...response-cache-with-update-time-polling.md |  37 ++++++
 docs/adr/0004-scope-based-bearer-auth.md      |  47 ++++++++
 docs/agents/domain.md                         |  51 +++++++++
 docs/agents/issue-tracker.md                  |  22 ++++
 docs/agents/triage-labels.md                  |  15 +++
 8 files changed, 335 insertions(+)
 create mode 100644 CONTEXT.md
 create mode 100644 docs/adr/0001-split-process-grpc-and-http-gateway.md
 create mode 100644 docs/adr/0002-intra-process-plaintext-grpc-dial.md
 create mode 100644 docs/adr/0003-redis-response-cache-with-update-time-polling.md
 create mode 100644 docs/adr/0004-scope-based-bearer-auth.md
 create mode 100644 docs/agents/domain.md
 create mode 100644 docs/agents/issue-tracker.md
 create mode 100644 docs/agents/triage-labels.md

diff --git a/CONTEXT.md b/CONTEXT.md
new file mode 100644
index 0000000..69f3957
--- /dev/null
+++ b/CONTEXT.md
@@ -0,0 +1,108 @@
+# Context
+
+Domain language used by the 7Cav API. The proto files
+(`proto/milpacs.proto`, `proto/tickets.proto`) are the contract; this
+document covers the concepts and any nuance that isn't obvious from
+reading the schema.
+
+## Source data
+
+The API is a read layer over a [XenForo](https://xenforo.com) forum's
+MySQL database, augmented by the `NF Rosters` add-on (which contributes
+the `xf_nf_rosters_*` tables that hold milpac records) and the
+`Cav7/ApiKeyManager` add-on (which contributes the API-key and scope
+tables). The API itself owns no schema; it queries upstream tables and
+maps them to its own proto types.
+
+## Milpac
+
+A "milpac" is a member's military-personnel-record entry: rank,
+position, awards, service record, and the identifiers used to look them
+up across the wider 7Cav stack. `MilpacService` is the surface that
+serves them.
+
+## Profile shapes
+
+A member's milpac is served in three shapes by different RPCs, each
+tuned to a known consumer:
+
+- **`Profile`** — full view: rank, positions, awards, records, and the
+  connected-account identifiers.
+- **`LiteProfile`** — slim view: enough to render a roster row, without
+  the per-member relational payload.
+- **`S1UniformsProfile`** — view for the uniforms tool; includes
+  uniform-relevant fields and omits the rest.
+
+The three are not subsets of one type; they are hand-mapped from the
+same upstream rows into distinct proto messages.
+
+## Roster and RosterType
+
+A `Roster` is a collection of members grouped by unit, course, or
+status. `RosterType` is an enum identifying which roster is requested;
+its numeric values are used as the `roster_id` foreign key in the
+upstream tables. The three roster RPCs (`GetRoster`, `GetLiteRoster`,
+`GetS1UniformsRoster`) return the same set of members in the
+corresponding profile shape above.
+
+## Rank, Position, PositionGroup
+
+A `Rank` is a pay-grade entry from the upstream rank catalog. A
+`Position` is an org-chart slot; positions are grouped into
+`PositionGroup`s for hierarchical browsing. `RankExpanded` and
+`PositionExpanded` are the variants that include relational fields the
+plain message omits.
+
+## Record and Award
+
+A `Record` is an entry on a member's service history (joins, promotions,
+transfers, etc.); `RecordType` enumerates the categories. An `Award` is
+a decoration entry.
+
+## AWOL
+
+An entry on the AWOL list — members flagged as absent without leave.
+Served by `GetAwol`, used by status-tracking consumers.
+
+## Connected accounts
+
+Members are looked up by 7Cav user id, by username, and by external
+account identifiers maintained by the forum's connected-account
+integrations:
+
+- **Discord** — Discord user id.
+- **Gamertag** — Xbox / PlayStation handle.
+- **Keycloak** — legacy SSO identifier. The Keycloak auth path has been
+  removed; the lookup RPC is on the chopping block and should not be
+  used in new code.
+
+## Tickets
+
+`TicketsService` exposes the forum's ticket system (powered by the
+`NF Tickets` add-on) as a read-only API.
+
+- **`Ticket`** — a thread: title, status, category, participants,
+  message count, timestamps. `forum_url` is populated when the API is
+  configured with the public forum base URL.
+- **`Message`** — one post within a ticket, addressed by `position`
+  (0-indexed within the thread).
+- **`Category`** — a top-level grouping for tickets; carries a current
+  ticket count.
+- **`TicketParticipant`** — a member-to-ticket association with a role.
+
+`ListTicketMessages` paginates with an opaque cursor whose semantic is
+"next `position` to include" (inclusive lower bound), so `position=0`
+is reachable.
+
+## API key and scope
+
+Clients authenticate with a `Bearer` token (case-insensitive prefix per
+RFC 7235). Tokens are issued by the forum admin UI, not by this
+service. Each token carries a set of named scopes. Current scopes:
+
+- **`read`** — gates the milpac surface (profiles, rosters, ranks,
+  positions, AWOL).
+- **`read:tickets`** — gates the tickets surface.
+
+Scope membership is checked per-handler; a token with `read` cannot
+read tickets, and vice versa.
diff --git a/docs/adr/0001-split-process-grpc-and-http-gateway.md b/docs/adr/0001-split-process-grpc-and-http-gateway.md
new file mode 100644
index 0000000..f38902e
--- /dev/null
+++ b/docs/adr/0001-split-process-grpc-and-http-gateway.md
@@ -0,0 +1,31 @@
+# ADR 0001: Split-process gRPC server + HTTP/JSON gateway from a single proto
+
+## Context
+
+The API exposes the same surface over two transports: a gRPC service for
+machine clients that want typed RPCs, and an HTTP/JSON surface for clients
+that don't. We want one source of truth for the schema, the routes, and the
+generated client code, with no risk of the two surfaces drifting.
+
+## Decision
+
+Define the service in `proto/milpacs.proto` (and additional `.proto` files
+per service). The gRPC server is hand-implemented against the generated
+`*.pb.go`. The HTTP surface is generated by `grpc-gateway` via
+`google.api.http` annotations on each RPC, which produces a reverse-proxy
+handler (`*.pb.gw.go`) that translates REST → gRPC. The OpenAPI document
+is generated by `protoc-gen-openapiv2` from the same proto.
+
+Both surfaces run inside one binary as two TCP listeners — gRPC on one
+port, the gateway on another. The gateway dials the gRPC server in-process.
+
+## Consequences
+
+- Adding an endpoint is a single edit (proto annotation + handler method);
+  the HTTP route and OpenAPI entry fall out of code generation.
+- The gateway is a thin translator with no business logic; all behavior
+  lives in the gRPC handlers.
+- The OpenAPI document is always in sync with the gRPC service because
+  both are generated from the same source.
+- The two-listener layout assumes a reverse proxy terminates TLS in front
+  of the gateway; see ADR 0002.
diff --git a/docs/adr/0002-intra-process-plaintext-grpc-dial.md b/docs/adr/0002-intra-process-plaintext-grpc-dial.md
new file mode 100644
index 0000000..bfe1389
--- /dev/null
+++ b/docs/adr/0002-intra-process-plaintext-grpc-dial.md
@@ -0,0 +1,24 @@
+# ADR 0002: Intra-process plaintext gRPC dial; TLS terminates at the reverse proxy
+
+## Context
+
+The binary runs two listeners (see ADR 0001): the gRPC server on one port
+and the HTTP/JSON gateway on another. The gateway dials the gRPC server
+to translate each REST call into a gRPC call. Public traffic to the
+gateway arrives over HTTPS; we need to decide whether the in-process
+gateway → gRPC hop should also be TLS.
+
+## Decision
+
+The gateway dials the local gRPC server with plaintext credentials. TLS
+is terminated by a reverse proxy in front of the gateway; the API binary
+itself does not handle TLS.
+
+## Consequences
+
+- No certificate management inside the binary; deployment topology owns it.
+- The two listeners are co-located in one process by design — the
+  plaintext hop never traverses an untrusted network.
+- If end-to-end TLS were ever required (e.g., running the gRPC server in
+  a separate process or on a separate host), both the gateway dial and
+  the gRPC server setup would need to grow TLS credentials.
diff --git a/docs/adr/0003-redis-response-cache-with-update-time-polling.md b/docs/adr/0003-redis-response-cache-with-update-time-polling.md
new file mode 100644
index 0000000..9984d92
--- /dev/null
+++ b/docs/adr/0003-redis-response-cache-with-update-time-polling.md
@@ -0,0 +1,37 @@
+# ADR 0003: Redis response cache invalidated by polling MySQL UPDATE_TIME
+
+## Context
+
+The API is a read layer over a MySQL database it does not own. Almost
+every endpoint is a GET that runs joins and preloads against tables that
+change infrequently relative to read volume. We need a cache layer that
+keeps responses fast without serving data that has gone stale after a
+write — but the writes happen in the upstream application, not in this
+service, so we cannot bust the cache from the write path.
+
+## Decision
+
+Cache successful HTTP GET responses in Redis, keyed by request path, with
+a long TTL (currently 6h). The cache middleware sits inside the auth
+middleware so authentication still runs on every request, and serves
+cached bodies pre-gzipped.
+
+Invalidation runs in a background goroutine that polls
+`information_schema.tables` on a fixed interval (currently 10 minutes)
+for a fixed set of monitored tables. If any monitored table's
+`UPDATE_TIME` is newer than the cached snapshot, the goroutine issues a
+Redis `FlushAll`. There is no per-record bust and no per-endpoint bust.
+
+## Consequences
+
+- Reads are cheap and uniform; the long TTL is safe because the polling
+  goroutine flushes everything when upstream data changes.
+- Stale data is possible for up to one polling interval after an upstream
+  write — acceptable because the read surface is not transactional.
+- The blast radius of any write is one `FlushAll`; this is intentional
+  and keeps the invalidation logic trivial.
+- Adding a new endpoint that reads from a previously-unmonitored upstream
+  table requires registering that table in the monitored set, or stale
+  responses linger for up to one TTL.
+- Cache middleware logging is intentionally verbose at INFO (HIT / MISS
+  per request); it is the current substitute for request analytics.
diff --git a/docs/adr/0004-scope-based-bearer-auth.md b/docs/adr/0004-scope-based-bearer-auth.md
new file mode 100644
index 0000000..b5cc675
--- /dev/null
+++ b/docs/adr/0004-scope-based-bearer-auth.md
@@ -0,0 +1,47 @@
+# ADR 0004: Scope-based bearer auth validated against the upstream DB on every request
+
+## Context
+
+The API needs to authenticate machine clients and gate individual
+endpoints by capability (e.g., a key that can read profile data should
+not necessarily be able to read tickets). Keys are issued and managed in
+the upstream forum application via its admin UI, not by this service.
+We need an auth model where revocation and capability changes made in
+the upstream UI take effect immediately, without a deploy or a cache
+invalidation in this service.
+
+## Decision
+
+Clients send a `Bearer` token (case-insensitive prefix per RFC 7235),
+which is validated on every request by a single JOIN over three upstream
+tables: the key table, the key-to-scope mapping, and the scope catalog.
+A row is returned per scope granted to the key. Zero rows → 401.
+
+The validation result carries the set of granted scope names. Each
+handler self-gates by calling a `RequireScope(ctx, "<scope_name>")`
+helper at the top of its body; missing scope → permission-denied.
+
+The scope catalog (which scopes exist, what they mean) is owned by the
+upstream application's admin UI. This service treats the catalog as
+read-only and never writes to it.
+
+The bearer token has a hard length cap to bound the cost of the lookup
+on malformed input.
+
+## Consequences
+
+- Revocation, key creation, and scope changes are instant: the next
+  request hits the new state. No restart, no cache flush.
+- Every request pays for one DB round-trip on the auth path. This is
+  acceptable for the current load and keeps the model simple; if it
+  becomes a bottleneck, a short-TTL in-process cache keyed by token hash
+  is the obvious next step.
+- The gRPC `UnaryInterceptor` and the HTTP gateway middleware must both
+  extract and validate the bearer token; they share a parsing helper to
+  keep the prefix handling consistent across surfaces.
+- Adding a new endpoint requires picking the scope it gates on (or
+  introducing a new scope in the upstream admin UI first) and adding a
+  `RequireScope` call at the top of the handler.
+- Because auth touches the same DB as the data path, an outage in that
+  DB surfaces as 401 from this service, not as a 5xx — fault-injection
+  smokes against handler-level error mapping must account for this.
diff --git a/docs/agents/domain.md b/docs/agents/domain.md
new file mode 100644
index 0000000..c97d6a6
--- /dev/null
+++ b/docs/agents/domain.md
@@ -0,0 +1,51 @@
+# Domain Docs
+
+How the engineering skills should consume this repo's domain documentation when exploring the codebase.
+
+## Before exploring, read these
+
+- **`CONTEXT.md`** at the repo root, or
+- **`CONTEXT-MAP.md`** at the repo root if it exists — it points at one `CONTEXT.md` per context. Read each one relevant to the topic.
+- **`docs/adr/`** — read ADRs that touch the area you're about to work in. In multi-context repos, also check `src/<context>/docs/adr/` for context-scoped decisions.
+
+If any of these files don't exist, **proceed silently**. Don't flag their absence; don't suggest creating them upfront. The producer skill (`/grill-with-docs`) creates them lazily when terms or decisions actually get resolved.
+
+## File structure
+
+Single-context repo (most repos):
+
+```
+/
+├── CONTEXT.md
+├── docs/adr/
+│   ├── 0001-event-sourced-orders.md
+│   └── 0002-postgres-for-write-model.md
+└── src/
+```
+
+Multi-context repo (presence of `CONTEXT-MAP.md` at the root):
+
+```
+/
+├── CONTEXT-MAP.md
+├── docs/adr/                          ← system-wide decisions
+└── src/
+    ├── ordering/
+    │   ├── CONTEXT.md
+    │   └── docs/adr/                  ← context-specific decisions
+    └── billing/
+        ├── CONTEXT.md
+        └── docs/adr/
+```
+
+## Use the glossary's vocabulary
+
+When your output names a domain concept (in an issue title, a refactor proposal, a hypothesis, a test name), use the term as defined in `CONTEXT.md`. Don't drift to synonyms the glossary explicitly avoids.
+
+If the concept you need isn't in the glossary yet, that's a signal — either you're inventing language the project doesn't use (reconsider) or there's a real gap (note it for `/grill-with-docs`).
+
+## Flag ADR conflicts
+
+If your output contradicts an existing ADR, surface it explicitly rather than silently overriding:
+
+> _Contradicts ADR-0007 (event-sourced orders) — but worth reopening because…_
diff --git a/docs/agents/issue-tracker.md b/docs/agents/issue-tracker.md
new file mode 100644
index 0000000..cce77ec
--- /dev/null
+++ b/docs/agents/issue-tracker.md
@@ -0,0 +1,22 @@
+# Issue tracker: GitHub
+
+Issues and PRDs for this repo live as GitHub issues. Use the `gh` CLI for all operations.
+
+## Conventions
+
+- **Create an issue**: `gh issue create --title "..." --body "..."`. Use a heredoc for multi-line bodies.
+- **Read an issue**: `gh issue view <number> --comments`, filtering comments by `jq` and also fetching labels.
+- **List issues**: `gh issue list --state open --json number,title,body,labels,comments --jq '[.[] | {number, title, body, labels: [.labels[].name], comments: [.comments[].body]}]'` with appropriate `--label` and `--state` filters.
+- **Comment on an issue**: `gh issue comment <number> --body "..."`
+- **Apply / remove labels**: `gh issue edit <number> --add-label "..."` / `--remove-label "..."`
+- **Close**: `gh issue close <number> --comment "..."`
+
+Infer the repo from `git remote -v` — `gh` does this automatically when run inside a clone.
+
+## When a skill says "publish to the issue tracker"
+
+Create a GitHub issue.
+
+## When a skill says "fetch the relevant ticket"
+
+Run `gh issue view <number> --comments`.
diff --git a/docs/agents/triage-labels.md b/docs/agents/triage-labels.md
new file mode 100644
index 0000000..b716855
--- /dev/null
+++ b/docs/agents/triage-labels.md
@@ -0,0 +1,15 @@
+# Triage Labels
+
+The skills speak in terms of five canonical triage roles. This file maps those roles to the actual label strings used in this repo's issue tracker.
+
+| Label in mattpocock/skills | Label in our tracker | Meaning                                  |
+| -------------------------- | -------------------- | ---------------------------------------- |
+| `needs-triage`             | `needs-triage`       | Maintainer needs to evaluate this issue  |
+| `needs-info`               | `needs-info`         | Waiting on reporter for more information |
+| `ready-for-agent`          | `ready-for-agent`    | Fully specified, ready for an AFK agent  |
+| `ready-for-human`          | `ready-for-human`    | Requires human implementation            |
+| `wontfix`                  | `wontfix`            | Will not be actioned                     |
+
+When a skill mentions a role (e.g. "apply the AFK-ready triage label"), use the corresponding label string from this table.
+
+Edit the right-hand column to match whatever vocabulary you actually use.