feat: Use specialized Playwright docker images in templates#1757
feat: Use specialized Playwright docker images in templates#1757
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1757 +/- ##
==========================================
- Coverage 92.31% 92.28% -0.03%
==========================================
Files 156 156
Lines 10644 10645 +1
==========================================
- Hits 9826 9824 -2
- Misses 818 821 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
024d605 to
e8abc7c
Compare
e8abc7c to
e360ccb
Compare
…_path` by debug log
|
E2E test from this branch: https://github.com/apify/crawlee-python/actions/runs/22134060453 Reason for changing the exception to debug log: https://apify.slack.com/archives/C07UBD2PZ5M/p1771406294667919?thread_ts=1771345683.896419&cid=C07UBD2PZ5M |
There was a problem hiding this comment.
Pull request overview
Updates Crawlee’s cookiecutter project templates to support browser-specific Playwright Docker images and adds dedicated Playwright template variants (Chrome/Firefox/WebKit), while relaxing PlaywrightBrowserPlugin behavior for browser_type='chrome' with an explicit executable_path.
Changes:
- Add new cookiecutter template options and corresponding
main_*.pytemplates forplaywright-chrome,playwright-firefox, andplaywright-webkit. - Update template routing/dependencies and Dockerfile base images to work consistently across all
playwright*crawler types. - Expand e2e template coverage and scheduled CI matrix to include the new crawler template variants; adjust
PlaywrightBrowserPluginto log instead of raising forchrome+executable_path.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/e2e/project_template/test_static_crawlers_templates.py | Adds e2e coverage for newly introduced Playwright crawler template types. |
| src/crawlee/project_template/{{cookiecutter.project_name}}/{{cookiecutter.__package_name}}/routes.py | Routes all playwright* templates to shared Playwright routes template. |
| src/crawlee/project_template/{{cookiecutter.project_name}}/requirements.txt | Ensures all playwright* templates select the crawlee[playwright] extra; keeps camoufox dependency conditional. |
| src/crawlee/project_template/{{cookiecutter.project_name}}/pyproject.toml | Aligns extras selection so all playwright* templates depend on crawlee[playwright]. |
| src/crawlee/project_template/{{cookiecutter.project_name}}/Dockerfile | Switches to browser-specialized Playwright base images for new template variants (and camoufox). |
| src/crawlee/project_template/templates/routes_playwright_camoufox.py | Removes redundant routes template (now shared via routes_playwright.py). |
| src/crawlee/project_template/templates/main_playwright_chrome.py | New template configuring PlaywrightCrawler(browser_type="chrome"). |
| src/crawlee/project_template/templates/main_playwright_firefox.py | New template configuring PlaywrightCrawler(browser_type="firefox"). |
| src/crawlee/project_template/templates/main_playwright_webkit.py | New template configuring PlaywrightCrawler(browser_type="webkit"). |
| src/crawlee/project_template/cookiecutter.json | Adds new crawler type choices for the specialized Playwright templates. |
| src/crawlee/browsers/_playwright_browser_plugin.py | Changes chrome + executable_path handling from raising to debug logging. |
| .github/workflows/on_schedule_tests.yaml | Expands scheduled e2e matrix to run against new crawler template variants. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
...lee/project_template/{{cookiecutter.project_name}}/{{cookiecutter.__package_name}}/routes.py
Show resolved
Hide resolved
vdusek
left a comment
There was a problem hiding this comment.
Nice, 4 comments from claude, 2 are nits
| if explicit_browser_launch_options.get( | ||
| 'executable_path', default_launch_browser_options.get('executable_path') | ||
| ): | ||
| logger.debug( | ||
| f'Using browser executable from {default_launch_browser_options["executable_path"]},' | ||
| f" which takes precedence over 'chrome' channel." |
There was a problem hiding this comment.
Bug: Debug log message reports wrong executable path
File: src/crawlee/browsers/_playwright_browser_plugin.py
The condition checks explicit_browser_launch_options.get('executable_path',
default_launch_browser_options.get('executable_path')) to determine if an
executable path is set, but the log message always formats
default_launch_browser_options["executable_path"]. When the user passes
executable_path via browser_launch_options, the log will display the wrong
path (the default one, not the user-provided one).
Suggestion:
effective_path = explicit_browser_launch_options.get(
'executable_path', default_launch_browser_options.get('executable_path')
)
if effective_path:
logger.debug(
f"Using browser executable from {effective_path},"
f" which takes precedence over 'chrome' channel."
)
src/crawlee/project_template/{{cookiecutter.project_name}}/requirements.txt
Show resolved
Hide resolved
| @@ -0,0 +1,14 @@ | |||
| # % extends 'main.py' | |||
There was a problem hiding this comment.
Nit: Closing parenthesis formatting
Files: templates/main_playwright_chrome.py,
templates/main_playwright_firefox.py, templates/main_playwright_webkit.py
The closing ) is on the same line as the last argument:
{{ self.http_client_instantiation() }})
Consider:
{{ self.http_client_instantiation() }}
)
vdusek
left a comment
There was a problem hiding this comment.
LGTM, but undo the "from -> rom" change
| @@ -1,4 +1,4 @@ | |||
| from crawlee.crawlers import PlaywrightCrawlingContext | |||
| rom crawlee.crawlers import PlaywrightCrawlingContext | |||
There was a problem hiding this comment.
🙂
| rom crawlee.crawlers import PlaywrightCrawlingContext | |
| from crawlee.crawlers import PlaywrightCrawlingContext |
There was a problem hiding this comment.
Ohh, I remember deleting this file completely...
Description
playwright-chrome,playwright-firefox,playwright-webkitPlaywrightBrowserPluginto not raise an Exception when browsertype='chrome'and explicitexecutable_pathis provided as well. Just add debug log instead.Issues
Testing
Checklist