fix(playwright): filter unsupported context options in persistent browser#1796
fix(playwright): filter unsupported context options in persistent browser#1796sushant-mutnale wants to merge 1 commit intoapify:masterfrom
Conversation
…wser This addresses issue apify#1784 by dynamically filtering options passed to launch_persistent_context and providing a warning log for ignored options like storage_state.
Pijukatel
left a comment
There was a problem hiding this comment.
Hello, thanks for the PR. Please see my comments; maybe we can use this approach on a different level.
| "scraping", | ||
| ] | ||
| dependencies = [ | ||
| "apify-fingerprint-datapoints>=0.11.0", |
There was a problem hiding this comment.
We have all these added dependencies in the optional dependencies group playwright. So please remove them from here.
| user_data_dir = tempfile.mkdtemp(prefix=self._TMP_DIR_PREFIX) | ||
| self._temp_dir = Path(user_data_dir) | ||
|
|
||
| launch_persistent_context_sig = inspect.signature(self._browser_type.launch_persistent_context) |
There was a problem hiding this comment.
This is a reasonable approach, but it has some drawbacks. If user has just typo ( in otherwise valid argument name), it will just show warning in log. Same for using some completely nonsensical argument. That should raise an error and not just log a warning.
For example, this should raise (typo in headles):
persist_browser = PlaywrightPersistentBrowser(
playwright.chromium, browser_launch_options={'headles': True}
)
Maybe this approach could be adopted one lever higher (not in PlaywrightPersistentBrowser - which always just calls launch_persistent_context), but in PlaywrightBrowserController - that is the class that decides about calling launch_persistent_context or new_context, but feeds them the same arguments.
It should properly raise exceptions for bad arguments, but it could just log a warning as per your suggestion for arguments at least valid in the other method. It would have to get 3 sets of arguments to be able to do such a distinction. Something like:
...
launch_persistent_context_sig = set(inspect.signature(BrowserType.launch_persistent_context).parameters)
new_context_sig = set(inspect.signature(Browser.new_context).parameters)
persistent_unique_options = launch_persistent_context_sig - new_context_sig
new_context_unique_options = new_context_sig - launch_persistent_context_sig
common_options = launch_persistent_context_sig & new_context_sig
...
And then raise an exception or just log based on the selected mode.
This PR fixes issue #1784, where PlaywrightCrawler would crash when passing context options (like storage_state) that are unsupported by Playwright's launch_persistent_context method.
Changes:
Implemented dynamic argument filtering in PlaywrightPersistentBrowser.new_context using inspect. signature.
Added a warning log to guide users when options are filtered out, suggesting the use of incognito pages as an alternative.
Added a unit test in
tests/unit/browsers/test_playwright_browser.py
to verify the fix and prevent regressions.
Fixes #1784