You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Capture every "I don't have that in my corpus" response and route it into the content roadmap via the Partner Gaps kanban swim lane. This is the feedback flywheel that turns honest corpus gaps into content priorities.
Scope
1. Structured gap detection in AI proxy
Update the Partner system prompt to require a JSON envelope alongside the prose response:
Raw question text stored in D1 for Craig's reference
GitHub issue body uses only scrubbed_summary — a paraphrased version generated by Haiku at gap-capture time
One-click redaction endpoint: any D1 row can be hard-nuked and linked GitHub issue closed.
4. Semantic deduplication
Background worker embeds each new gap question and runs cosine similarity against existing open gaps. If >0.9 similarity → increment occurrence_count on existing gap rather than creating a new row.
individual (initial) — every gap → own GitHub issue
digest (triggered at 20K total users, ~58 gaps/day) — singletons aggregate into daily digest issues; clusters (≥3 occurrences) and thumbs-down still get dedicated issues
Switch is a config change, not a backend migration. D1 schema unchanged.
7. Resolution loop
Content PR merged → CI hook matches PR title/body for closes corpus-gap-XXXX refs
Worker marks D1 row status = content_shipped, sets linked_content_pr, closes GitHub issue
Acceptance criteria
System prompt updated; model reliably emits gap signal in ≥95% of gap cases (measured against test set)
D1 binding configured on Worker; gap rows persist
Semantic dedup working (verified via test queries)
Parent: #1446 (Epic: AI Study Partner)
Phase: 1 (Foundations)
Capture every "I don't have that in my corpus" response and route it into the content roadmap via the Partner Gaps kanban swim lane. This is the feedback flywheel that turns honest corpus gaps into content priorities.
Scope
1. Structured gap detection in AI proxy
Update the Partner system prompt to require a JSON envelope alongside the prose response:
{ "gap": true | false, "gap_type": "content" | "translation" | "out_of_scope", "topic": "one-line summary" }Cloudflare Worker inspects the stream, extracts the JSON envelope. If
gap: true, the proxy:Additional gap signals combined at proxy:
2. Cloudflare D1 schema
3. Privacy scrub (permissive)
Light regex scrub at capture:
scrubbed_summary— a paraphrased version generated by Haiku at gap-capture timeOne-click redaction endpoint: any D1 row can be hard-nuked and linked GitHub issue closed.
4. Semantic deduplication
Background worker embeds each new gap question and runs cosine similarity against existing open gaps. If >0.9 similarity → increment
occurrence_counton existing gap rather than creating a new row.5. GitHub issue auto-creation (
_tools/corpus_gap_sync.py)Scheduled Worker job (or periodic Cron trigger). On each new gap or
occurrence_countincrement:corpus-gap: <scrubbed summary>corpus-gap6. Scale trigger — 20K user switch
Config flag
GAP_SYNC_MODE=individual|digestindividual(initial) — every gap → own GitHub issuedigest(triggered at 20K total users, ~58 gaps/day) — singletons aggregate into daily digest issues; clusters (≥3 occurrences) and thumbs-down still get dedicated issues7. Resolution loop
closes corpus-gap-XXXXrefsstatus = content_shipped, setslinked_content_pr, closes GitHub issueAcceptance criteria
corpus-gapappliedindividual/digestmodes present and documentedSize: M
Labels: ai-partner, phase-1