Add capture event transport and server-side write classification#112
Add capture event transport and server-side write classification#112jspahn80134 wants to merge 49 commits into
Conversation
Update the ignored PostgreSQL integration test to assert the rich events schema columns and fix timestamp/JSONB parameter casts used by the capture insert path. Verified against the AWS PostgreSQL database with event_capture_inserts_rich_schema_event_into_db.
macOS CI occasionally delivered the loadfile acknowledgement just after the old two-second harness timeout. Increase the shared browser message wait to five seconds so test_client does not fail on that timing edge.
The first CI rerun passed the original test_client wait but exposed the same timing issue in test_client_updates while waiting for the autosave content update. Use the client response window as the shared browser test wait budget.
The overall browser tests share one WebDriver endpoint and were running concurrently inside the same test binary. This was causing test_client_updates to miss its autosave content update on CI, especially macOS/Safari. Guard the harness with a shared async mutex so each browser session runs in isolation.
bjones1
left a comment
There was a problem hiding this comment.
Here's some initial comments on the PR, mainly questions -- I'd like to hear your thoughts. I'll continue to review.
Use generated Rust-backed capture wire/status types in the VS Code extension. Restore the explanatory extension comments and the current-file update after LoadFile. Keep study lifecycle commands available for automation while removing them from the Command Palette.
Resolve conflicts in the VS Code extension, translation capture path, and overall test harness. Keep upstream CursorPosition/WebDriver updates while preserving capture instrumentation and serialized browser test timing.
bjones1
left a comment
There was a problem hiding this comment.
Good progress!
If there's some discussion/a question you answer, don't resolve it -- this helps me find an read your responses. When everything's already resolved, it's hard for me to find/think about discussions.
Add a code_external_insert_candidate capture event for code edits that look non-incremental but were not observed as paste operations. The classifier records only coarse metadata: basis, confidence, size band, block kind, source, and classification basis. Paste markers continue to take precedence so a single edit is not double-counted as both paste and heuristic external insertion. Include targeted code comments and unit coverage for multi-line, small single-line, and large-block classifier behavior.
|
I noticed that {
"event": {
"client_tz_offset_min": -300,
"data": {
"classification_basis": "codemirror_doc_blocks",
"doc_block_count_after": 3,
"doc_block_count_before": 3,
"doc_block_diff": [
{
"Update": {
"contents": [
{
"from": 0,
"insert": "<p>Copyright (C) 2025 Bryan A. Jones.<p>This file is part of the CodeChat Editor.<p>The CodeChat Editor is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.<p>The CodeChat Editor is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.<p>You should have received a copy of the GNU General Public License along with the CodeChat Editor. If not, see <a href=http://www.gnu.org/licenses/>http://www.gnu.org/licenses/</a>.<h1><code>.gitignore</code> -- files for Git to ignore</h1><p>dist build output",
"to": 835
}
],
"from": 0
}
}
],
"mode": "python",
"source": "ide"
},
"event_id": "server-15356-1780568509808280-2",
"event_source": "vscode_extension",
"event_type": "write_doc",
"file_hash": "c22e65e3f32618653447a821e7e2c35ec4cea7a142f2edc1ec7915b9ca7b3821",
"language_id": null,
"schema_version": 2,
"sequence_number": null,
"session_id": "8eafb29c-9634-459b-9e4e-a6b55ef5808c",
"timestamp": "2026-06-04T10:21:49.808236300+00:00",
"user_id": "1234"
},
"fallback_timestamp": "2026-06-04T10:21:49.809424800+00:00"
} |
|
I noticed that starting capture, quitting VSCode, then restarting it, leaves capture on. Turn capture off when the extension first starts up -- I thought I'd need to re-enable it on startup. |
|
When I open a second VSCode window, capture doesn't seem to work. |
|
Edits to raw files (ones the the CodeChat Editor doesn't support) think that a minor edit is a copy/paste. |
|
Any edits to a Markdown file puts the entire contents in the log, not just a diff. Probably need to think about the diff format sent. |
Remove paste/external-insert heuristic capture rows, make recording session-local, add JSONL fallback capture without DB config, assign server-generated capture sequence numbers, and diff Markdown source writes.
|
Addressed the June 4 discussion notes in 3fd74f2:
|
|
I asked Claude to review Code Review:
|
Serialize capture activity events, add closed_by to activity-ended doc sessions, skip activity classification while capture is off, tighten Markdown/RST code classification, normalize parsed CaptureStatus counters, make DomLocation cursor handling explicit, fix RST reflection prompt output, and reuse captureLog for failure messages.
|
Addressed the extension.ts Claude review findings in 0a9198d:
Verification: |
Summary:
capture_config.example.json, redacted config summaries, PostgreSQL capture writes, and JSONL fallback capture when database capture is missing or unavailable.server/scripts/capture_events_schema.sqland linked it fromtoc.md.event_id,sequence_number,schema_version,user_id,session_id,event_source,language_id,file_hash,event_type,timestamp,client_tz_offset_min, and event-specificdata.event_source=server_translationand get stream-local sequence numbers.closed_byto activity-ended doc sessions, avoided classification scans while capture is off, tightened Markdown/RST code-block classification, fixed RST reflection prompt output, normalized parsed capture-status counters, and made unsupported DOM cursor updates explicit.query(...).wait(...)Mocha result wait.Validation:
cargo test export_bindingscargo clippy --manifest-path server/Cargo.toml --all-targets --all-features -- -Dwarningscargo test --manifest-path server/Cargo.toml --lib -- --test-threads=1pnpm exec tsc -noEmitpnpm exec tsc -noEmitpnpm exec eslint src