-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Labels
bugSomething isn't working.Something isn't working.t-toolingIssues with this label are in the ownership of the tooling team.Issues with this label are in the ownership of the tooling team.
Description
Which package is this bug report for? If unsure which one to select, leave blank
@crawlee/puppeteer (PuppeteerCrawler)
Issue description
When trying to intercept requests (e.g., images, fonts) using PuppeteerCrawler, the initial document requests are not captured. This suggests the page might already be fully loaded before the interception is set up.
async function setupRequestInterception(interceptedRequest: HTTPRequest) {
const resourceType = interceptedRequest.resourceType();
const interceptedRequestUrl = interceptedRequest.url();
console.log(interceptedRequestUrl);
}
const crawler = new PuppeteerCrawler({
async requestHandler({ page }) {
// Attempt to intercept requests
await puppeteerRequestInterception.addInterceptRequestHandler(page, setupRequestInterception);
}
});
await crawler.addRequests([{ url: 'https://example.com' }]);
Observed Behavior:
- Requests for initial document resources (HTML, images, fonts, etc.) are not intercepted or logged.
- Only later requests (e.g., XHR/fetch after page load) might be captured.
- Suggests interception setup occurs after the page has already started loading.
Does PuppeteerCrawler initialize the page and trigger navigation before entering requestHandler?
Code sample
Package version
^3.13.0
Node.js version
v22.13.1
Operating system
macos
Apify platform
- Tick me if you encountered this issue on the Apify platform
I have tested this on the next release
No response
Other context
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't working.Something isn't working.t-toolingIssues with this label are in the ownership of the tooling team.Issues with this label are in the ownership of the tooling team.