feat: add display parameter and single-page capture support for interactive sessions#187
Conversation
Split `capture_page` into two helper functions to reduce duplication and make the capture lifecycle easier to follow: * `setup_page_capture()`: gathers per-page event wiring that was previously inline in `capture_page`. This includes download tracking, request-body storage for favicon extraction, dialog acceptance, and the headless-Chromium PDF workaround. Returns a `PageCaptureState` TypedDict so that the same state can be passed to finalization without relying on closure variables. * `_finalize_capture()`: consolidates post-navigation teardown formerly in `capture_page`’s `finally` block. This handles multiple-download deduplication/zip, cookie and storage collection, page/context shutdown, HAR loading, SOCKS5 IP resolution, and trusted-timestamp requests. `capture_page` now delegates to these helpers; observable behavior remains unchanged. The `PageCaptureState` TypedDict and the `Awaitable`/`Callable`/`Mapping` imports required for its annotations are added in this commit. No feature changes are included in this refactor.
`Capture.__init__` now accepts an optional `display` parameter (default `None`). When provided, `__aenter__` constructs a per-launch environment dictionary that overrides `DISPLAY`, allowing each concurrent interactive session to target its own X11 server without modifying the process-global `os.environ`. A new `capture_current_page()` method captures the current page state without navigating or recursing into child URLs. It reuses `setup_page_capture` (called by the caller before navigation) and the existing `_finalize_capture` workflow for cookies, storage, HAR, and trusted timestamps. This method serves as the final-capture step when an interactive session signals it is ready.
…nch signature The env parameter of BrowserType.launch() expects dict[str, str | float | bool] | None, not dict[str, str] | None. Widen the launch_env annotation to satisfy mypy --strict.
Align keyword-only parameter defaults with the rest of the codebase (no spaces around '=').
…_current_page When the bare 'except Exception' handler re-raises, _finalize_capture was skipped because it was called sequentially after the try/except rather than in a finally block. This left the page/context unclosed and the HAR file unflushed. Wrap the try/except in an outer try/finally, matching the pattern already used by capture_page.
Rafiot
left a comment
There was a problem hiding this comment.
It looks very clean, and that's pretty much mergeable as-is but I'm nitpicky and think we can make it easier to maintain long term.
If you have time/want to work on that, do not hesitate, but I can take it from there if you don't. The main question is the cookies, this one I want to make sure we do it cleanly from the get go.
playwrightcapture/capture.py
Outdated
| def cookies(self) -> list[Cookie]: | ||
| return self._cookies | ||
|
|
||
| def _coerce_cookie_mapping(self, cookie: object) -> Mapping[str, Any] | None: |
There was a problem hiding this comment.
I'm not totally sure about the format you're supporting there, before the pydantic models, the cookies were a dictionary (TypedDict), so it was not possible to access the content with cookie.name, and getattr(cookie, 'name', None) would always be None.
I attempted to do the coercing in the models because this way, it's automatically done the same everywhere. So if we need to support real objects, I'd probably move it to the models.
There was a problem hiding this comment.
I perhaps should have researched the dependency chain(s) a little more before starting; I had completely missed lookyloo-models until I was preparing the other PRs.
Right you are though, that getattr fallback path would never fire on the old TypedDict cookies, and now that the Pydantic Cookie model in lookyloo-models handles all the coercion and validation, this is redundant.
I could remove _coerce_cookie_mapping() entirely and simplify the cookies setter to just accept Cookie instances directly or pass dicts through Cookie.model_validate() - just consume validated objects.
There was a problem hiding this comment.
lookyloo-models are something I started to work on 2 weeks ago, it's brand new, and I jammed it in not expecting anyone else than me was knee deep in the architecture, so it's also barely documented. That's absolutely not on you.
But yes, I think _coerce_cookie_mapping() is not needed, and if we need something like that, we can add it in the models and it will automagically work everywhere.
playwrightcapture/capture.py
Outdated
|
|
||
| return True | ||
|
|
||
| async def setup_page_capture(self, page: Page, *, allow_tracking: bool=False) -> PageCaptureState: |
There was a problem hiding this comment.
Just a note before looking at _finalize_capture: looking at this method confuses me a lot. I get the concept, but my guts hate it. Not sure if I want to change it and how, but I'd probably prefer to avoid returning PageCaptureState and move the methods in the main class, for readability reasons.
There was a problem hiding this comment.
Passing a bag of closures around is sorta clunky. I (think I) was trying to avoid touching instance state on a class I was still learning
I could move multiple_downloads, store_request, and mark_favicons_done onto Capture instance attributes (initialised in setup_page_capture), drop the PageCaptureState TypedDict, and have _finalize_capture reference self.* directly. That would remove the page_capture_state is not None guards scattered around the code.
There was a problem hiding this comment.
Yes, I think it is the way to go.
The whole toolchain is a little bit tricky, so I'm very happy to see how you managed to implement the feature cleanly like that.
playwrightcapture/capture.py
Outdated
| self.logger.warning(f"Unable to get cookies: {e}") | ||
| errors.append(f'Unable to get the cookies: {e}') | ||
| self.should_retry = True | ||
| async def capture_current_page( |
There was a problem hiding this comment.
Looking at this method, I'm wondering is there is way to make it a setting of capture_page? with url made optional, and a boolean current_page_only.
That would de-duplicate some code and I like that.
Remove `_coerce_cookie_mapping()` and rely on the `lookyloo-models` `Cookie` model to handle normalization.
…nt_page - Replace `PageCaptureState` `TypedDict` with instance attributes (`_multiple_downloads`, `_store_request`, `_mark_favicons_done`) on `Capture`. - `setup_page_capture()` now returns `None` and `_finalize_capture()` reads from `self`. - Make `url` optional and add `current_page_only` flag to `capture_page()` - when `True`, the method snapshots the page as-is without navigation. - Delete the now-redundant `capture_current_page()` method.
|
Thank you for all the extra work to make it fit neatly! |
Add the low-level building blocks that let LacusCore run an interactive browser session inside a pre-existing X11 display (managed by xpra) and capture the current page state on demand.
Changes
New capabilities
Captureconstructor accepts an optionaldisplayparameter. When set, the browser launches on that X11 display instead of headless/default.setup_page_capture()- prepare a page for capture (event handlers, HAR recording, favicon collection) without navigating or finalizing. Used by LacusCore to set up the page at session start.capture_current_page()- capture the state of the already-loaded page (screenshot, HAR, downloads) without the navigation/retry loop. Used when the interactive user triggers "Finish Capture."Refactoring
capture_page()into focused internal helpers (_setup_page_for_capture,_perform_navigation_and_capture,_finalize_capture) so the new methods share logic rather than duplicating it.Bugfixes
_finalize_captureis now called in afinallyblock incapture_current_page, matchingcapture_page's pattern. Previously a re-raised exception would skip page/context cleanup and HAR flush.launch_envtype annotation to satisfymypy --strictagainst the PlaywrightBrowserType.launchsignature.Dependency impact
No new dependencies. No changes to the public
capture_page()API. The new methods are onCapturebut not exported in__all__.