Skip to content

Add comprehensive end-to-end user test suite#24

Merged
BillJr99 merged 1 commit into
mainfrom
claude/comprehensive-tests-docker-hksW3
May 23, 2026
Merged

Add comprehensive end-to-end user test suite#24
BillJr99 merged 1 commit into
mainfrom
claude/comprehensive-tests-docker-hksW3

Conversation

@BillJr99
Copy link
Copy Markdown
Owner

Summary

This PR introduces a complete end-to-end user test suite (tests/user/) that exercises OSScreenObserver by spawning real python main.py subprocesses and driving them over the wire (REST HTTP and MCP stdio). This complements the existing in-process regression tests and catches threading, serialization, header, and protocol issues that in-process testing cannot expose.

Key Changes

  • Test infrastructure (conftest.py):

    • oso_server_factory fixture that spawns configurable OSO subprocesses with mock adapter
    • oso_mcp_server fixture for MCP stdio mode testing
    • HttpJson helper class for urllib-based JSON HTTP requests over loopback
    • MCPClient helper for newline-delimited JSON-RPC 2.0 framing
    • Process lifecycle management with signal handling and health checks
  • REST API coverage (test_rest_full.py):

    • Health, capabilities, windows, monitors, structure endpoints
    • Find element and selector matching
    • Element actions (click, focus, set_value, right_click, double_click, etc.)
    • Snapshot lifecycle (create, get, diff, drop)
    • Observe diff tokens
    • Metrics endpoint in Prometheus format
  • MCP protocol (test_mcp_protocol.py):

    • JSON-RPC 2.0 handshake and initialization
    • Tools list and tools/call coverage of all 49 MCP tools
    • Error envelope round-tripping
    • stdout purity verification (logs must go to stderr)
  • Scenario and trace testing:

    • test_scenarios_user.py: Drives login.yaml end-to-end with reactions and oracles
    • test_trace_replay.py: Record/replay round-trip with divergence detection
  • Predicate and action coverage:

    • test_predicates_full.py: All 9 assert_state predicate kinds (element_exists, element_absent, value_equals, value_matches, text_visible, window_focused, window_exists, tree_hash_equals, AND combination)
    • test_element_actions_full.py: Focus, set_value, invoke, select_option, hover, drag, key_into, clear_text
  • Specialized tests:

    • test_ascii_render_snapshot.py: ASCII sketch renderer against stored snapshot
    • test_budget_redaction_audit.py: Budget caps, redaction, audit log enforcement
    • test_ocr_real_tesseract.py: Real Tesseract OCR against generated PIL images
    • test_vlm_real_ollama.py: Vision-LLM pipeline against real Ollama daemon
    • test_setup_config_live.py: setup_config.py subprocess execution
    • test_xvfb_live.py: Live X11 adapter against real Xvfb display
  • Test configuration:

    • pytest.ini: Marker definitions for user, slow_llm, slow_vlm, needs_display, needs_tesseract
    • CI integration: Separate regression test lane (-m "not user") from user tests
  • Documentation: Updated README with testing tier explanation and user test coverage details

Implementation Details

  • Subprocesses are spawned with PYTHONUNBUFFERED=1 and stderr redirected to log files for debugging
  • Health checks use _wait_for_http() polling with configurable timeout
  • Process cleanup uses SIGTERM → SIGKILL escalation with timeouts
  • Free port allocation via socket binding to avoid conflicts
  • Fixtures use module scope for oso_server_factory to amortize startup cost
  • MCP framing reads one byte at a time to handle arbitrary line lengths
  • Mock adapter is default; real adapter tests are marked needs_display and skipped when no X11
  • Ollama and Tesseract tests are marked slow_llm/slow_vlm/needs_tesseract and skipped when unavailable

https://claude.ai/code/session_01Q7eSEmS8XK4wU5GsK5Ey1z

Adds tests/user/ with end-to-end subprocess-driven coverage:
- test_rest_full.py: every Flask endpoint, response envelopes, snapshot
  lifecycle, observe diff, Prometheus metrics.
- test_mcp_protocol.py: NDJSON framing, all 49 MCP tools smoked, stdout
  purity (logs to stderr).
- test_predicates_full.py: all 9 assert_state predicate kinds plus AND.
- test_element_actions_full.py: focus/set_value/invoke/select/hover/drag/
  key_into/clear_text/propose-confirm flow.
- test_scenarios_user.py, test_trace_replay.py, test_ascii_render_snapshot.py,
  test_budget_redaction_audit.py, test_setup_config_live.py.
- Optional-deps tests (test_ocr_real_tesseract.py, test_vlm_real_ollama.py,
  test_ollama_setup_live.py, test_xvfb_live.py) skip gracefully without
  the underlying binaries / daemons.

Adds pytest.ini with markers (user, slow_llm, slow_vlm, needs_display,
needs_tesseract). Updates ci.yml to run the new tier alongside regression.
Documents the test surface in README.md.

https://claude.ai/code/session_01Q7eSEmS8XK4wU5GsK5Ey1z
@BillJr99 BillJr99 merged commit baa8bed into main May 23, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants