Conversation
…view / impact analysis / LSP)
…pper / a11y diff / fanout / extension / event bus)
…ic / OTLP / storybook / shadow pierce)
…en / lock / a11y trend / perf drift)
The 3-line script was a side-effect-on-cwd standalone runner, never referenced by either CI workflow. The proper pytest coverage already lives at test/unit_test/test_create_project.py.
…ook snapshots / appium gestures / coverage map)
…tion
Three concrete wins with no behaviour change:
- Pytest collection warnings (7 -> 0): mark TestObject / TestObjectRecord /
TestRecord / TestRailError / TestcontainersError with __test__ = False so
pytest stops trying to collect domain / exception classes whose name
happens to start with "Test".
- workspace_lock dist-walk caching: importlib.metadata.distributions() was
being walked every call; the result is now memoised behind
reset_distribution_cache() so per-test setup drops from ~0.3s to <0.05s.
- socket_server tests (2.42s -> 0.49s): expose a threading.Event on the
TCP server so callers can wait for shutdown without polling, and pass
poll_interval=0.02 to serve_forever from the test helper so shutdown()
itself returns within ~20ms instead of the stdlib default 500ms.
Plus shared driver_dispatch.{evaluate_expression, run_script} that
collapses three independent Selenium-or-Playwright dispatch sites
(memory_leak / csp_reporter / smart_wait) into one module. The shared
helper has its own unit tests covering both backends.
Net: 7 warnings cleared, suite 15.41s -> 11.57s (-25%), 1174 -> 1184 tests.
Also gitignore the local issues.json / hotspots.json / codacy.json
artefacts that the SonarCloud/Codacy curl helpers drop into the repo.
(b) je_web_runner.api thematic façade Group the 50+ helpers added in recent waves into 11 themed submodules so callers can ``from je_web_runner.api import quality, observability`` instead of memorising deep import paths. Themes: authoring / debugging / frontend / infra / mobile / networking / observability / quality / reliability / security / test_data. 9-test smoke suite covers __all__ resolvability + duplicates so the façade can't silently drift from the underlying modules. (a) Real-browser E2E scaffold Add test/e2e_test/ with conftest.py that detects the Selenium Grid socket and skips cleanly when unreachable. Initial smoke tests cover smart_wait fetch idle / SPA route stable, state_diff round trip, memory_leak heap probe, csp_reporter empty collect, and shadow_pierce open-shadow walk. GitHub Actions e2e_browser.yml runs them daily / on demand against selenium/hub:4.20.0 + selenium/node-chrome via service containers. Local run: ``cd docker && docker compose up -d``, then ``WEBRUNNER_E2E_HUB=http://localhost:4444/wd/hub pytest test/e2e_test/``. (c) Sphinx autodoc + autosummary conf.py gains sphinx.ext.autodoc / autosummary / napoleon plus a mock-imports list for the soft deps that aren't part of the docs build (selenium / playwright / appium / Pillow / locust / OTel / testcontainers / etc). New api_reference.rst drives autosummary's recursive per-module reference page generation; wired into Eng/eng_index.rst so ReadTheDocs picks it up. Tests: 1184 -> 1193 (added 9 façade smoke tests). E2E suite skips cleanly without a Grid; the unit critical path stays at 12.7s.
The Python version (examples/counting_stars.py) and the equivalent action JSON (examples/counting_stars.json) drive Chrome through: - launching with --autoplay-policy=no-user-gesture-required - navigating to the regular YouTube watch URL - dismissing the EU consent banner if present - forcing video.play() to bypass any remaining autoplay gate - polling the .ytp-skip-ad-button / .ytp-ad-skip-button selectors for up to 30 seconds when a pre-roll ad is showing - holding the window open for 90 seconds via execute_async_script's setTimeout (the executor has no native sleep command, so the JSON version sets a 120s script timeout and uses an async setTimeout) Run: python examples/counting_stars.py or python -m je_web_runner -e examples/counting_stars.json
WR_sleep executor command:
Adds time.sleep wrapper to action_executor with type / non-negative
validation. Replaces the awkward
``WR_execute_async_script + setTimeout(callback, ms)`` pattern that
the demos previously needed. 7 unit tests cover zero-second / short /
negative / non-numeric / bool-rejection / executor-registration paths.
examples/counting_stars.json now uses WR_sleep verbatim.
Bug: webdriver_wrapper.execute_script swallowed return values
The wrapper called ``self.current_webdriver.execute_script(...)``
but never returned the result, so every WR_execute_script in an
action JSON resolved to None — making any "read DOM into a
variable" pattern unusable. The demo run revealed this immediately.
Now returns the value (and None on caught exception, matching the
rest of the wrapper).
Cookbook examples (examples/):
- counting_stars.json — uses WR_sleep instead of fake setTimeout
- quick_smoke.json — minimal sanity check
- google_search.py — search + read first result heading
- form_submit.py — fill httpbin /forms/post; pairs with
form_autofill + state_diff helpers
- smart_wait_demo.py — fetch idle + SPA route stable + memory probe
- fanout_demo.py — parallel HTTP preflights via run_fan_out
- pii_redact_demo.py — pure-logic scan_text / redact_text demo
Each was run end-to-end against real Chrome (or network for fanout)
before commit; form_submit revealed httpbin's submit button has no
type=submit attribute, fixed by switching to form.submit().
Tests: 1193 -> 1200, suite still ~13s.
test/integration_test/ wires 2+ modules together with real I/O — no mocks
where actual file / socket / subprocess exercise is feasible:
- test_authoring_pipeline: md_authoring → action_formatter → action_linter
→ JSON byte-stable round trip + legacy alias detect
- test_db_fixtures_sqlite: load_into_connection on a real in-memory SQLite
+ truncate + identifier validation safety net
- test_har_replay_roundtrip: HarReplayServer + urllib + GraphQLClient hit
the live HTTP server (literal/glob/regex matchers)
- test_mock_services_roundtrip: MockOAuthServer → bearer token → HAR API,
plus MockS3Storage round trip
- test_mcp_subprocess: spawn ``python -m je_web_runner.mcp_server``
and walk initialize → tools/list → tools/call →
shutdown over real stdio JSON-RPC
- test_action_lsp_subprocess: spawn ``python -m je_web_runner.action_lsp``
and walk initialize → didOpen → publishDiagnostics
with proper LSP Content-Length framing
- test_test_selection_pipeline: coverage_map + impact_analysis + diff_shard
fed the same action-tree, asserting they agree
- test_bootstrap_pipeline: init_workspace → format → lint → schema sanity
- test_trend_pipelines: run_ledger.record_run → trend_dashboard +
a11y_trend.aggregate_history end-to-end
- test_live_dashboard_roundtrip: dashboard /records endpoint exercise +
VisualReviewServer accept-baseline workflow
The LSP subprocess test caught a real Windows bug: ``python -m
je_web_runner.action_lsp`` ran sys.stdout in text mode, so ``\n`` in the
LSP framing got translated to ``\r\n``, producing ``\r\r\n`` boundaries
that no LSP client can parse. Fixed in __main__.py via
``sys.stdout.reconfigure(newline="")`` so the ``Content-Length`` framing
survives.
CI: test_dev.yml + test_stable.yml gain a step that runs the integration
suite right after the unit suite (60s timeout, same job).
Tests: 1200 unit + 30 integration = 1230 passing.
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Complexity | 1883 |
| Duplication | 19 |
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
CI fix:
- Drop --timeout=60 from the integration-test workflow step; pytest-timeout
isn't a dev dep, so the flag was breaking the CI run. Each subprocess
test sets its own subprocess.communicate(timeout=...) anyway.
Codacy / Bandit:
- B110 (try/except/pass) on best-effort cleanup paths in examples/* and
test/e2e_test/conftest.py annotated with `# nosec B110` + reason.
- B112 (try/except/continue) on workspace_lock dist scan and the
google_search.py selector probes: log via web_runner_logger.debug
and `# nosec B112` so silently-skipped errors are still observable.
- B202 (tarfile.extractall) — extract_archive already validates members
against the destination root via _safe_extract_zip / _safe_extract_tar;
added matching ZipFile validator for symmetry and `# nosec B202` on the
actual extractall calls.
- B101 (assert) — pytest-style assertions in test/e2e_test and
test/integration_test marked `# nosec B101` per line.
- 11 unused imports across new modules trimmed.
SonarCloud:
- S5754 broaden-except → narrow Exception in process_supervisor.with_watchdog
with a comment about why KeyboardInterrupt / SystemExit must propagate.
- S3358 nested ternaries in perf_drift collapsed into _direction_for helper.
- S6353 [^A-Za-z0-9_]+ → \W+ in pom_codegen.
- S3457 f-string without placeholders fixed in pom_codegen.
- S1172 unused params (cdp_tap.execute_cdp_cmd, action_lsp._completion)
renamed to _params with a comment about preserving the public signature.
- S7500 dict / list comprehensions of generator-throw idiom replaced
with proper helper functions in test_otlp_exporter / test_synthetic_monitoring.
- S7504 unnecessary list() preserved in bidi_backend with a NOSONAR
comment because removing it breaks RuntimeError-during-iteration safety.
- S5843 timestamp regex split into named pieces; _PATH_RE bounded
([\w.\-]{1,80}) so polynomial backtracking can't escape its budget.
- S5852 hotspots in md_authoring tightened to greedy \S.* / bounded
template name pattern.
- S5869 dup char class in _TEMPLATE_RE removed.
- S5906 assert_isinstance / assert_true switches in test_api_facade,
test_failure_cluster, test_event_bus.
- S2068 `password` literal annotated as fixture.
- S125 commented-code false-positive in test_driver_pin reworded.
- S1192 dup "text/plain" literal in visual_review extracted to _TEXT_PLAIN.
- S5131 reflected user input in har_replay's 404 payload pinned to
application/json + X-Content-Type-Options nosniff so any echoed path
fragment can't be interpreted as HTML.
- S4144 duplicate do_PUT / do_PATCH bodies aliased to do_POST.
- S3776 cog complexity in storybook.discover_stories refactored into
_entries_map + _story_from_entry helpers.
Tests still 1200 unit + 30 integration green.
CI fix:
- The integration subprocess tests were failing with
'Popen object has no attribute _fileobj2output' because the finally
block called proc.communicate() a second time after the try block had
already consumed the streams. Wrap the cleanup communicate() in
try/except + nosec B110 so the harmless double-call no longer fails.
Codacy:
- pom_codegen.py: Dict was removed from the typing import last round but
is still used on a class attribute; restore it (F821).
- failure_cluster._PATH_RE: anchor a nosemgrep marker so the bounded
pattern (every quantifier capped at {1,80}/{1,40}) stops being flagged.
SonarCloud hotspots:
- S5852 md_authoring _BULLET_RE / _TEMPLATE_RE: tightened the template
pattern to ``[A-Za-z_][A-Za-z0-9_-]{0,80}`` and anchored NOSONAR on the
bullet capture.
- S5332 fixtures: ftp:// in test_driver_pin and http:// in
test_storybook_visual_snapshots annotated as deliberate test fixtures.
- S4828 process_supervisor.os.kill(pid, 9): NOSONAR with explanation —
pid list is filtered by KNOWN_DRIVER_NAMES and excludes os.getpid().
- S5042 driver_pin._extract_archive: NOSONAR — both branches route
through _safe_extract_* helpers that pre-validate members.
- S1313 test_pii_scanner: 192.168.0.1 RFC1918 fixture annotated.
Tests still 1230 green.
CI fix:
- The subprocess integration tests (MCP / LSP) were failing with
``ValueError: I/O operation on closed file`` because we wrote to
proc.stdin manually, called proc.stdin.close(), then immediately
invoked proc.communicate() — communicate() then tried to use the
closed stdin reference. Replace the pattern with a single
``communicate(input=payload, timeout=...)`` call (it auto-closes
stdin) and route fallback drains through a try/except.
SonarCloud:
- S7632 suppression-comment syntax: NOSONAR markers had been on
preceding-line comments rather than the violation lines. Anchored
them on the actual flagged line in driver_pin._extract_archive,
failure_cluster._PATH_RE, md_authoring._BULLET_RE / _TEMPLATE_RE,
and process_supervisor.os.kill().
- S5869 duplicate char class: drop the explicit ``A-Za-z0-9`` ranges
in _TEMPLATE_RE and use ``\w`` so SonarCloud sees no duplicate.
- S5131 reflected user input: har_replay's 404 envelope now passes the
echoed method / path through ``_safe_echo()`` which strips anything
outside the URI grammar allow-list, so a hostile request can't smuggle
HTML / control bytes into the response (defence in depth on top of the
JSON envelope + nosniff header).
- S3776 cog complexity refactors:
- pipeline.load_pipeline: 26 → split into _coerce_pipeline_document /
_load_pipeline_from_text / _parse_stage helpers.
- coverage_map.build_coverage_map: 17 → extracted _load_action_list
+ _routes_in iterator.
Tests still 1230 green.
SonarCloud cleanup: - S1313 in test_pii_scanner: NOSONAR moved from preceding-line comment onto the redact_text call line and the assertNotIn line. - S5042 in driver_pin: NOSONAR anchored on the ``with tarfile.open(...)`` line instead of the helper docstring above it. - S5131 in har_replay: NOSONAR anchored on the ``self.wfile.write(payload)`` line so SonarCloud sees the suppression at the violation site (the payload is already strip-sanitised by ``_safe_echo``). - S5869 in md_authoring: combined the suppression onto the same line as ``_TEMPLATE_RE``. - S7504 in bidi_backend.unsubscribe_all: hoist the list() snapshot into a named ``snapshot`` local with the NOSONAR anchored on that line so the marker isn't on a comment. Cognitive complexity refactors (S3776): - fanout.run_fan_out: split task parsing into _parse_tasks and result collection into _collect_results. - browser_pool.checkout: extract _acquire_session that linearises the get_nowait → grow → wait branches. - visual_review do_GET: move the /img/* handler into _serve_image. - a11y_trend.aggregate_history: split per-entry / per-violation logic into _absorb_entry / _count_violation. - storybook.visual_snapshots.capture_story_snapshots: move the per-story capture+compare body into _snapshot_story. - examples/counting_stars.py main: split into _force_play / _await_ad_clear / _wait_out_unskippable_ad / _navigate_and_play. Tests still 1230 green.
|
This was referenced Apr 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Summary
46 commits since PR #85. Highlights:
Latest waves
Optimisation pass
__test__ = Falseon TestObject / TestRecord / etc).workspace_lockdistribution walk cached → 4 tests dropped from 0.3s to <0.05s each.socket_serverquit tests went from 2.42s → 0.49s viathreading.Event+ tighterpoll_interval.driver_dispatchmodule collapses three Selenium-or-Playwright dispatch sites into one.API façade + docs
je_web_runner.api.{authoring, debugging, frontend, infra, mobile, networking, observability, quality, reliability, security, test_data}re-exports new helpers thematically.docs/source/conf.pygainssphinx.ext.autodoc+autosummary+napoleonwith mock imports for soft deps.Eng/doc/api_reference/api_reference.rstdriving recursive per-module page generation.Real-browser smoke + cookbook
test/e2e_test/withconftest.pythat detects Selenium Grid availability and skips cleanly..github/workflows/e2e_browser.ymlbootsselenium/hub:4.20.0+selenium/node-chromeand runs daily / on demand.examples/cookbook:counting_stars.{py,json},google_search.py,form_submit.py,smart_wait_demo.py,pii_redact_demo.py,fanout_demo.py,quick_smoke.json.Bugs found by actually running the project
webdriver_wrapper.execute_scriptswallowed return values → fixed (caught by JSON cookbook smoke).Content-Lengthframing corrupted on Windows due to text-mode\n → \r\ntranslation →__main__.pynowsys.stdout.reconfigure(newline="")(caught by integration subprocess test).New: WR_sleep
time.sleepaction so JSON pipelines no longer needWR_execute_async_script + setTimeoutto pace themselves.Comprehensive integration tests
test/integration_test/— 30 tests across 10 files, each wiring 2+ modules with real I/O (in-memory SQLite, in-process HTTP servers, real subprocess for MCP / LSP).test_dev.yml+test_stable.ymlright after the unit-test step.Numbers
Test plan