Skip to content

Extended capabilities round 2 + cookbook + integration tests#86

Merged
JE-Chen merged 50 commits intomainfrom
dev
Apr 26, 2026
Merged

Extended capabilities round 2 + cookbook + integration tests#86
JE-Chen merged 50 commits intomainfrom
dev

Conversation

@JE-Chen
Copy link
Copy Markdown
Member

@JE-Chen JE-Chen commented Apr 26, 2026

Summary

46 commits since PR #85. Highlights:

Latest waves

  • Action JSON formatter (canonical kwarg order), Markdown → action JSON transpiler, failure clustering (normalised signatures), synthetic monitoring (edge-triggered alerts), Storybook integration (discovery + visual snapshots), shadow DOM auto-pierce, OTLP exporter for Jaeger/Tempo.
  • Driver pinner (cache geckodriver/chromedriver, dodge GitHub rate limit), Selenium → Playwright translator, form auto-fill, workspace bootstrapper, a11y diff, fan-out task runner, extension harness, file-backed event bus.
  • CDP message tap (record/replay), cross-browser parity, Page Object codegen, state diff, workspace lock file, a11y trend dashboard, perf P95 drift detector.
  • Process supervisor, multi-stage pipeline DSL, regex test selector, Appium mobile gestures, coverage map.

Optimisation pass

  • 7 pytest collection warnings cleared (__test__ = False on TestObject / TestRecord / etc).
  • workspace_lock distribution walk cached → 4 tests dropped from 0.3s to <0.05s each.
  • socket_server quit tests went from 2.42s → 0.49s via threading.Event + tighter poll_interval.
  • New driver_dispatch module collapses three Selenium-or-Playwright dispatch sites into one.
  • Suite: 15.41s → 11.57s (-25%).

API façade + docs

  • je_web_runner.api.{authoring, debugging, frontend, infra, mobile, networking, observability, quality, reliability, security, test_data} re-exports new helpers thematically.
  • docs/source/conf.py gains sphinx.ext.autodoc + autosummary + napoleon with mock imports for soft deps.
  • New Eng/doc/api_reference/api_reference.rst driving recursive per-module page generation.

Real-browser smoke + cookbook

  • test/e2e_test/ with conftest.py that detects Selenium Grid availability and skips cleanly.
  • .github/workflows/e2e_browser.yml boots selenium/hub:4.20.0 + selenium/node-chrome and runs daily / on demand.
  • examples/ cookbook: counting_stars.{py,json}, google_search.py, form_submit.py, smart_wait_demo.py, pii_redact_demo.py, fanout_demo.py, quick_smoke.json.

Bugs found by actually running the project

  • webdriver_wrapper.execute_script swallowed return values → fixed (caught by JSON cookbook smoke).
  • LSP Content-Length framing corrupted on Windows due to text-mode \n → \r\n translation → __main__.py now sys.stdout.reconfigure(newline="") (caught by integration subprocess test).

New: WR_sleep

  • Native time.sleep action so JSON pipelines no longer need WR_execute_async_script + setTimeout to pace themselves.

Comprehensive integration tests

  • test/integration_test/ — 30 tests across 10 files, each wiring 2+ modules with real I/O (in-memory SQLite, in-process HTTP servers, real subprocess for MCP / LSP).
  • Wired into test_dev.yml + test_stable.yml right after the unit-test step.

Numbers

Test plan

  • Unit + integration green on Python 3.10 / 3.11 / 3.12 / 3.13.
  • E2E daily workflow runs against Selenium Grid.
  • PyPI publish workflow fires on merge (auto patch bump + tag + release).
  • Sphinx build picks up the new autosummary tree without errors.

JE-Chen added 30 commits April 26, 2026 14:32
…pper / a11y diff / fanout / extension / event bus)
JE-Chen added 16 commits April 26, 2026 15:34
The 3-line script was a side-effect-on-cwd standalone runner, never
referenced by either CI workflow. The proper pytest coverage already
lives at test/unit_test/test_create_project.py.
…ook snapshots / appium gestures / coverage map)
…tion

Three concrete wins with no behaviour change:

- Pytest collection warnings (7 -> 0): mark TestObject / TestObjectRecord /
  TestRecord / TestRailError / TestcontainersError with __test__ = False so
  pytest stops trying to collect domain / exception classes whose name
  happens to start with "Test".

- workspace_lock dist-walk caching: importlib.metadata.distributions() was
  being walked every call; the result is now memoised behind
  reset_distribution_cache() so per-test setup drops from ~0.3s to <0.05s.

- socket_server tests (2.42s -> 0.49s): expose a threading.Event on the
  TCP server so callers can wait for shutdown without polling, and pass
  poll_interval=0.02 to serve_forever from the test helper so shutdown()
  itself returns within ~20ms instead of the stdlib default 500ms.

Plus shared driver_dispatch.{evaluate_expression, run_script} that
collapses three independent Selenium-or-Playwright dispatch sites
(memory_leak / csp_reporter / smart_wait) into one module. The shared
helper has its own unit tests covering both backends.

Net: 7 warnings cleared, suite 15.41s -> 11.57s (-25%), 1174 -> 1184 tests.
Also gitignore the local issues.json / hotspots.json / codacy.json
artefacts that the SonarCloud/Codacy curl helpers drop into the repo.
(b) je_web_runner.api thematic façade
  Group the 50+ helpers added in recent waves into 11 themed submodules
  so callers can ``from je_web_runner.api import quality, observability``
  instead of memorising deep import paths. Themes:
  authoring / debugging / frontend / infra / mobile / networking /
  observability / quality / reliability / security / test_data.
  9-test smoke suite covers __all__ resolvability + duplicates so the
  façade can't silently drift from the underlying modules.

(a) Real-browser E2E scaffold
  Add test/e2e_test/ with conftest.py that detects the Selenium Grid
  socket and skips cleanly when unreachable. Initial smoke tests cover
  smart_wait fetch idle / SPA route stable, state_diff round trip,
  memory_leak heap probe, csp_reporter empty collect, and
  shadow_pierce open-shadow walk.
  GitHub Actions e2e_browser.yml runs them daily / on demand against
  selenium/hub:4.20.0 + selenium/node-chrome via service containers.
  Local run: ``cd docker && docker compose up -d``, then
  ``WEBRUNNER_E2E_HUB=http://localhost:4444/wd/hub pytest test/e2e_test/``.

(c) Sphinx autodoc + autosummary
  conf.py gains sphinx.ext.autodoc / autosummary / napoleon plus a
  mock-imports list for the soft deps that aren't part of the docs
  build (selenium / playwright / appium / Pillow / locust / OTel /
  testcontainers / etc). New api_reference.rst drives autosummary's
  recursive per-module reference page generation; wired into
  Eng/eng_index.rst so ReadTheDocs picks it up.

Tests: 1184 -> 1193 (added 9 façade smoke tests). E2E suite skips
cleanly without a Grid; the unit critical path stays at 12.7s.
The Python version (examples/counting_stars.py) and the equivalent action
JSON (examples/counting_stars.json) drive Chrome through:

- launching with --autoplay-policy=no-user-gesture-required
- navigating to the regular YouTube watch URL
- dismissing the EU consent banner if present
- forcing video.play() to bypass any remaining autoplay gate
- polling the .ytp-skip-ad-button / .ytp-ad-skip-button selectors for up
  to 30 seconds when a pre-roll ad is showing
- holding the window open for 90 seconds via execute_async_script's
  setTimeout (the executor has no native sleep command, so the JSON
  version sets a 120s script timeout and uses an async setTimeout)

Run: python examples/counting_stars.py
  or python -m je_web_runner -e examples/counting_stars.json
WR_sleep executor command:
  Adds time.sleep wrapper to action_executor with type / non-negative
  validation. Replaces the awkward
  ``WR_execute_async_script + setTimeout(callback, ms)`` pattern that
  the demos previously needed. 7 unit tests cover zero-second / short /
  negative / non-numeric / bool-rejection / executor-registration paths.
  examples/counting_stars.json now uses WR_sleep verbatim.

Bug: webdriver_wrapper.execute_script swallowed return values
  The wrapper called ``self.current_webdriver.execute_script(...)``
  but never returned the result, so every WR_execute_script in an
  action JSON resolved to None — making any "read DOM into a
  variable" pattern unusable. The demo run revealed this immediately.
  Now returns the value (and None on caught exception, matching the
  rest of the wrapper).

Cookbook examples (examples/):
  - counting_stars.json — uses WR_sleep instead of fake setTimeout
  - quick_smoke.json    — minimal sanity check
  - google_search.py    — search + read first result heading
  - form_submit.py      — fill httpbin /forms/post; pairs with
                          form_autofill + state_diff helpers
  - smart_wait_demo.py  — fetch idle + SPA route stable + memory probe
  - fanout_demo.py      — parallel HTTP preflights via run_fan_out
  - pii_redact_demo.py  — pure-logic scan_text / redact_text demo

  Each was run end-to-end against real Chrome (or network for fanout)
  before commit; form_submit revealed httpbin's submit button has no
  type=submit attribute, fixed by switching to form.submit().

Tests: 1193 -> 1200, suite still ~13s.
test/integration_test/ wires 2+ modules together with real I/O — no mocks
where actual file / socket / subprocess exercise is feasible:

- test_authoring_pipeline:    md_authoring → action_formatter → action_linter
                              → JSON byte-stable round trip + legacy alias detect
- test_db_fixtures_sqlite:    load_into_connection on a real in-memory SQLite
                              + truncate + identifier validation safety net
- test_har_replay_roundtrip:  HarReplayServer + urllib + GraphQLClient hit
                              the live HTTP server (literal/glob/regex matchers)
- test_mock_services_roundtrip: MockOAuthServer → bearer token → HAR API,
                              plus MockS3Storage round trip
- test_mcp_subprocess:        spawn ``python -m je_web_runner.mcp_server``
                              and walk initialize → tools/list → tools/call →
                              shutdown over real stdio JSON-RPC
- test_action_lsp_subprocess: spawn ``python -m je_web_runner.action_lsp``
                              and walk initialize → didOpen → publishDiagnostics
                              with proper LSP Content-Length framing
- test_test_selection_pipeline: coverage_map + impact_analysis + diff_shard
                              fed the same action-tree, asserting they agree
- test_bootstrap_pipeline:    init_workspace → format → lint → schema sanity
- test_trend_pipelines:       run_ledger.record_run → trend_dashboard +
                              a11y_trend.aggregate_history end-to-end
- test_live_dashboard_roundtrip: dashboard /records endpoint exercise +
                              VisualReviewServer accept-baseline workflow

The LSP subprocess test caught a real Windows bug: ``python -m
je_web_runner.action_lsp`` ran sys.stdout in text mode, so ``\n`` in the
LSP framing got translated to ``\r\n``, producing ``\r\r\n`` boundaries
that no LSP client can parse. Fixed in __main__.py via
``sys.stdout.reconfigure(newline="")`` so the ``Content-Length`` framing
survives.

CI: test_dev.yml + test_stable.yml gain a step that runs the integration
suite right after the unit suite (60s timeout, same job).

Tests: 1200 unit + 30 integration = 1230 passing.
@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented Apr 26, 2026

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 1883 complexity · 19 duplication

Metric Results
Complexity 1883
Duplication 19

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

JE-Chen added 4 commits April 26, 2026 17:33
CI fix:
- Drop --timeout=60 from the integration-test workflow step; pytest-timeout
  isn't a dev dep, so the flag was breaking the CI run. Each subprocess
  test sets its own subprocess.communicate(timeout=...) anyway.

Codacy / Bandit:
- B110 (try/except/pass) on best-effort cleanup paths in examples/* and
  test/e2e_test/conftest.py annotated with `# nosec B110` + reason.
- B112 (try/except/continue) on workspace_lock dist scan and the
  google_search.py selector probes: log via web_runner_logger.debug
  and `# nosec B112` so silently-skipped errors are still observable.
- B202 (tarfile.extractall) — extract_archive already validates members
  against the destination root via _safe_extract_zip / _safe_extract_tar;
  added matching ZipFile validator for symmetry and `# nosec B202` on the
  actual extractall calls.
- B101 (assert) — pytest-style assertions in test/e2e_test and
  test/integration_test marked `# nosec B101` per line.
- 11 unused imports across new modules trimmed.

SonarCloud:
- S5754 broaden-except → narrow Exception in process_supervisor.with_watchdog
  with a comment about why KeyboardInterrupt / SystemExit must propagate.
- S3358 nested ternaries in perf_drift collapsed into _direction_for helper.
- S6353 [^A-Za-z0-9_]+ → \W+ in pom_codegen.
- S3457 f-string without placeholders fixed in pom_codegen.
- S1172 unused params (cdp_tap.execute_cdp_cmd, action_lsp._completion)
  renamed to _params with a comment about preserving the public signature.
- S7500 dict / list comprehensions of generator-throw idiom replaced
  with proper helper functions in test_otlp_exporter / test_synthetic_monitoring.
- S7504 unnecessary list() preserved in bidi_backend with a NOSONAR
  comment because removing it breaks RuntimeError-during-iteration safety.
- S5843 timestamp regex split into named pieces; _PATH_RE bounded
  ([\w.\-]{1,80}) so polynomial backtracking can't escape its budget.
- S5852 hotspots in md_authoring tightened to greedy \S.* / bounded
  template name pattern.
- S5869 dup char class in _TEMPLATE_RE removed.
- S5906 assert_isinstance / assert_true switches in test_api_facade,
  test_failure_cluster, test_event_bus.
- S2068 `password` literal annotated as fixture.
- S125 commented-code false-positive in test_driver_pin reworded.
- S1192 dup "text/plain" literal in visual_review extracted to _TEXT_PLAIN.
- S5131 reflected user input in har_replay's 404 payload pinned to
  application/json + X-Content-Type-Options nosniff so any echoed path
  fragment can't be interpreted as HTML.
- S4144 duplicate do_PUT / do_PATCH bodies aliased to do_POST.
- S3776 cog complexity in storybook.discover_stories refactored into
  _entries_map + _story_from_entry helpers.

Tests still 1200 unit + 30 integration green.
CI fix:
- The integration subprocess tests were failing with
  'Popen object has no attribute _fileobj2output' because the finally
  block called proc.communicate() a second time after the try block had
  already consumed the streams. Wrap the cleanup communicate() in
  try/except + nosec B110 so the harmless double-call no longer fails.

Codacy:
- pom_codegen.py: Dict was removed from the typing import last round but
  is still used on a class attribute; restore it (F821).
- failure_cluster._PATH_RE: anchor a nosemgrep marker so the bounded
  pattern (every quantifier capped at {1,80}/{1,40}) stops being flagged.

SonarCloud hotspots:
- S5852 md_authoring _BULLET_RE / _TEMPLATE_RE: tightened the template
  pattern to ``[A-Za-z_][A-Za-z0-9_-]{0,80}`` and anchored NOSONAR on the
  bullet capture.
- S5332 fixtures: ftp:// in test_driver_pin and http:// in
  test_storybook_visual_snapshots annotated as deliberate test fixtures.
- S4828 process_supervisor.os.kill(pid, 9): NOSONAR with explanation —
  pid list is filtered by KNOWN_DRIVER_NAMES and excludes os.getpid().
- S5042 driver_pin._extract_archive: NOSONAR — both branches route
  through _safe_extract_* helpers that pre-validate members.
- S1313 test_pii_scanner: 192.168.0.1 RFC1918 fixture annotated.

Tests still 1230 green.
CI fix:
- The subprocess integration tests (MCP / LSP) were failing with
  ``ValueError: I/O operation on closed file`` because we wrote to
  proc.stdin manually, called proc.stdin.close(), then immediately
  invoked proc.communicate() — communicate() then tried to use the
  closed stdin reference. Replace the pattern with a single
  ``communicate(input=payload, timeout=...)`` call (it auto-closes
  stdin) and route fallback drains through a try/except.

SonarCloud:
- S7632 suppression-comment syntax: NOSONAR markers had been on
  preceding-line comments rather than the violation lines. Anchored
  them on the actual flagged line in driver_pin._extract_archive,
  failure_cluster._PATH_RE, md_authoring._BULLET_RE / _TEMPLATE_RE,
  and process_supervisor.os.kill().
- S5869 duplicate char class: drop the explicit ``A-Za-z0-9`` ranges
  in _TEMPLATE_RE and use ``\w`` so SonarCloud sees no duplicate.
- S5131 reflected user input: har_replay's 404 envelope now passes the
  echoed method / path through ``_safe_echo()`` which strips anything
  outside the URI grammar allow-list, so a hostile request can't smuggle
  HTML / control bytes into the response (defence in depth on top of the
  JSON envelope + nosniff header).
- S3776 cog complexity refactors:
  - pipeline.load_pipeline: 26 → split into _coerce_pipeline_document /
    _load_pipeline_from_text / _parse_stage helpers.
  - coverage_map.build_coverage_map: 17 → extracted _load_action_list
    + _routes_in iterator.

Tests still 1230 green.
SonarCloud cleanup:
- S1313 in test_pii_scanner: NOSONAR moved from preceding-line comment
  onto the redact_text call line and the assertNotIn line.
- S5042 in driver_pin: NOSONAR anchored on the ``with tarfile.open(...)``
  line instead of the helper docstring above it.
- S5131 in har_replay: NOSONAR anchored on the ``self.wfile.write(payload)``
  line so SonarCloud sees the suppression at the violation site (the
  payload is already strip-sanitised by ``_safe_echo``).
- S5869 in md_authoring: combined the suppression onto the same line as
  ``_TEMPLATE_RE``.
- S7504 in bidi_backend.unsubscribe_all: hoist the list() snapshot into a
  named ``snapshot`` local with the NOSONAR anchored on that line so the
  marker isn't on a comment.

Cognitive complexity refactors (S3776):
- fanout.run_fan_out: split task parsing into _parse_tasks and result
  collection into _collect_results.
- browser_pool.checkout: extract _acquire_session that linearises the
  get_nowait → grow → wait branches.
- visual_review do_GET: move the /img/* handler into _serve_image.
- a11y_trend.aggregate_history: split per-entry / per-violation logic
  into _absorb_entry / _count_violation.
- storybook.visual_snapshots.capture_story_snapshots: move the per-story
  capture+compare body into _snapshot_story.
- examples/counting_stars.py main: split into _force_play /
  _await_ad_clear / _wait_out_unskippable_ad / _navigate_and_play.

Tests still 1230 green.
@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant