Skip to content

Request interception in PuppeteerCrawler fails to capture initial document source requests #2886

@wh5938316

Description

@wh5938316

Which package is this bug report for? If unsure which one to select, leave blank

@crawlee/puppeteer (PuppeteerCrawler)

Issue description

When trying to intercept requests (e.g., images, fonts) using PuppeteerCrawler, the initial document requests are not captured. This suggests the page might already be fully loaded before the interception is set up.

async function setupRequestInterception(interceptedRequest: HTTPRequest) {
  const resourceType = interceptedRequest.resourceType();
  const interceptedRequestUrl = interceptedRequest.url();
  console.log(interceptedRequestUrl);
}

const crawler = new PuppeteerCrawler({
  async requestHandler({ page }) {
    // Attempt to intercept requests
    await puppeteerRequestInterception.addInterceptRequestHandler(page, setupRequestInterception);
  }
});

await crawler.addRequests([{ url: 'https://example.com' }]);

Observed Behavior:

  • Requests for initial document resources (HTML, images, fonts, etc.) are not intercepted or logged.
  • Only later requests (e.g., XHR/fetch after page load) might be captured.
  • Suggests interception setup occurs after the page has already started loading.

Does PuppeteerCrawler initialize the page and trigger navigation before entering requestHandler?

Code sample

Package version

^3.13.0

Node.js version

v22.13.1

Operating system

macos

Apify platform

  • Tick me if you encountered this issue on the Apify platform

I have tested this on the next release

No response

Other context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working.t-toolingIssues with this label are in the ownership of the tooling team.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions