Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request interception with multiple plugins enabled #91

Closed
ghost opened this issue Dec 5, 2019 · 13 comments · Fixed by #104
Closed

Request interception with multiple plugins enabled #91

ghost opened this issue Dec 5, 2019 · 13 comments · Fixed by #104
Assignees
Labels
bug Something isn't working planned-feature Will be added in a future release

Comments

@ghost
Copy link

ghost commented Dec 5, 2019

I think the underlying issue is similar to #90 but there is another error that comes up and I think applies for more cases.

const puppeteer = require('puppeteer-extra');

puppeteer.use(require('puppeteer-extra-plugin-anonymize-ua')());
puppeteer.use(require('puppeteer-extra-plugin-stealth')());
const blockResourcesPlugin = require('puppeteer-extra-plugin-block-resources')({
  blockedTypes: new Set(['stylesheet', 'image', 'media', 'font'])
});
puppeteer.use(blockResourcesPlugin);

(async function () {
  const browser = await puppeteer.launch({ headless: false });
  const page = await browser.newPage();

  await page.goto('https://www.google.com');

  await browser.close();
})();

Running the code above produces two errors (but multiple times, presumably for every request).

(node:1300) UnhandledPromiseRejectionWarning: Error: You set up a request listener but no interception. If you intend to modify requests you need to add: `await page.setRequestInterception(true)`.
    at Request.continue (I:\CAG Portal\NodeJS\node_modules\puppeteer-extra-plugin-stealth\evasions\accept-language\index.js:120:19)
    at Plugin.onRequest (I:\CAG Portal\NodeJS\node_modules\puppeteer-extra-plugin-block-resources\index.js:101:60)
    at Page.emit (events.js:215:7)
    at NetworkManager.<anonymous> (C:\Users\user\AppData\Roaming\nvm\v9.7.1\node_modules\puppeteer\lib\Page.js:114:68)
    at NetworkManager.emit (events.js:210:5)
    at NetworkManager._onRequest (C:\Users\user\AppData\Roaming\nvm\v9.7.1\node_modules\puppeteer\lib\NetworkManager.js:240:10)
    at NetworkManager._onRequestWillBeSent (C:\Users\user\AppData\Roaming\nvm\v9.7.1\node_modules\puppeteer\lib\NetworkManager.js:173:14)
    at CDPSession.emit (events.js:210:5)
    at CDPSession._onMessage (C:\Users\user\AppData\Roaming\nvm\v9.7.1\node_modules\puppeteer\lib\Connection.js:200:12)
    at Connection._onMessage (C:\Users\user\AppData\Roaming\nvm\v9.7.1\node_modules\puppeteer\lib\Connection.js:112:17)

(node:1300) UnhandledPromiseRejectionWarning: Error: Request is already handled!
    at assert (C:\Users\user\AppData\Roaming\nvm\v9.7.1\node_modules\puppeteer\lib\helper.js:231:11)
    at Request.abort (C:\Users\user\AppData\Roaming\nvm\v9.7.1\node_modules\puppeteer\lib\NetworkManager.js:500:5)
    at Request.<anonymous> (C:\Users\user\AppData\Roaming\nvm\v9.7.1\node_modules\puppeteer\lib\helper.js:111:23)
    at Plugin.onRequest (I:\CAG Portal\NodeJS\node_modules\puppeteer-extra-plugin-block-resources\index.js:101:34)
    at Page.emit (events.js:215:7)
    at NetworkManager.<anonymous> (C:\Users\user\AppData\Roaming\nvm\v9.7.1\node_modules\puppeteer\lib\Page.js:114:68)
    at NetworkManager.emit (events.js:215:7)
    at NetworkManager._onRequest (C:\Users\user\AppData\Roaming\nvm\v9.7.1\node_modules\puppeteer\lib\NetworkManager.js:240:10)
    at NetworkManager._onRequestWillBeSent (C:\Users\user\AppData\Roaming\nvm\v9.7.1\node_modules\puppeteer\lib\NetworkManager.js:173:14)
    at CDPSession.emit (events.js:210:5)
  -- ASYNC --
    at Request.<anonymous> (C:\Users\user\AppData\Roaming\nvm\v9.7.1\node_modules\puppeteer\lib\helper.js:110:27)
    at Plugin.onRequest (I:\CAG Portal\NodeJS\node_modules\puppeteer-extra-plugin-block-resources\index.js:101:34)
    at Page.emit (events.js:215:7)
    at NetworkManager.<anonymous> (C:\Users\user\AppData\Roaming\nvm\v9.7.1\node_modules\puppeteer\lib\Page.js:114:68)
    at NetworkManager.emit (events.js:215:7)
    at NetworkManager._onRequest (C:\Users\user\AppData\Roaming\nvm\v9.7.1\node_modules\puppeteer\lib\NetworkManager.js:240:10)
    at NetworkManager._onRequestWillBeSent (C:\Users\user\AppData\Roaming\nvm\v9.7.1\node_modules\puppeteer\lib\NetworkManager.js:173:14)
    at CDPSession.emit (events.js:210:5)
    at CDPSession._onMessage (C:\Users\user\AppData\Roaming\nvm\v9.7.1\node_modules\puppeteer\lib\Connection.js:200:12)
    at Connection._onMessage (C:\Users\user\AppData\Roaming\nvm\v9.7.1\node_modules\puppeteer\lib\Connection.js:112:17)
@ghost
Copy link
Author

ghost commented Dec 5, 2019

In the above code, everything runs fine if I disable either "puppeteer-extra-plugin-stealth" or "puppeteer-extra-plugin-block-resources". I do not need to disable both to have it run fine.

@ghost ghost changed the title Request handling with multiple plugins enabled Request interception with multiple plugins enabled Dec 5, 2019
@berstend
Copy link
Owner

berstend commented Dec 5, 2019

Hmm, damn - I actually went to great length to avoid that :-/

You have two options until I fix that (I'm busy the next days so the fix won't be as quick as usual):

const PluginStealth = require("puppeteer-extra-plugin-stealth");
const pluginStealth = PluginStealth()

pluginStealth.enabledEvasions.delete('accept-language')

puppeteer.use(pluginStealth)

Or use an older verison (yarn add puppeteer-extra-plugin-stealth@2.4.0)

@berstend
Copy link
Owner

berstend commented Dec 5, 2019

The issue is in the new accept-language plugin and the restriction that only one handler is allowed to modify requests.

I thought I can hack around that by monkey patching things but it seems I need to do this more robust.

@ghost
Copy link
Author

ghost commented Dec 5, 2019

Disabling accept-language worked. Thanks for the quick reply.

@berstend
Copy link
Owner

berstend commented Dec 5, 2019

I just released a hotfix to disable accept-language by default until this is fixed.

New version: puppeteer-extra-plugin-stealth@2.4.5

@SBerkovic
Copy link

is there a way to use the adblocker plugin alongside my own list of urls that I want to block ?
having both enabled gets this errror

(node:43915) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 18) (node:43915) UnhandledPromiseRejectionWarning: Error: Request is already handled!

@berstend
Copy link
Owner

berstend commented Dec 10, 2019

@SBerkovic Regular Puppeteer only allows a single interceptor to modify requests or allow them to signal continuation/abortion (which is bad for plugin systems like puppeteer-extra). I'm looking into a more generic solution of how multiple event listeners are able to modify this data in a chain.

If I cannot hook into high-level pptr functions I'm thinking about intercepting the raw Chrome Devtools Protocol messages to make this modifications transparently to puppeteer :)

@berstend berstend self-assigned this Dec 10, 2019
@berstend berstend added the work-in-progress This is currently being worked on label Dec 10, 2019
@berstend
Copy link
Owner

Another potential solution would be to patch the EventEmitter in pptr to run listeners in sequence.

@berstend
Copy link
Owner

berstend commented Dec 11, 2019

It's a bit more tricky than I thought.

My idea is to not require the user to do anything differently but keep their request interception code compatible.

There are some UX issues I need to figure out, example:

  • The adblocker plugin blocks some requests
  • The user has added their own request interception: Are the blocked adblocker requests supposed to show up in the on("request") handlers or not?
  • If they show up then the user has to decide again to .continue() or to .abort() the request, which makes the adblocker plugin useless

I could add behavior that when the user is not doing .continue() or .abort() in their own listener then the "previous" action (by e.g. the adblocker) will be used (or .continue() as a fallback) - but this is not standard puppeteer behaviour (which REQUIRES you to decide what to do when interception requests, otherwise the request will stall).

Hmmmm.

@ParadoxD
Copy link

ParadoxD commented Dec 19, 2019

I'm running into the same issue, but I'm on v2.4.5. I'm running stealth, recaptcha, and adblocker plugins. If I disable the adblocker, the issues go away. If I enable adblocker and disable the others, the issue persists.

@berstend
Copy link
Owner

berstend commented Jan 6, 2020

Yup, the adblocker is the only major plugin left using requestInterception (after merging in #104).

Unfortunately it's a bit tricky to fix, as puppeteer (and the underlying CDP message flow) doesn't expect multiple entities to be interested in using that functionality. 😄

My earlier optimistic monkey patching didn't hold up to scrutiny so I need to do it more properly. That would probably mean to move the event listener stuff from the base plugin to the puppeteer shim, as we need that birds eye view to add a listener that's always the last one (to handle .continue/.abort as mentioned in my earlier comment).

Definitely a bit more juicy of a challenge that will take a while longer to implement (though nothing that cannot be fixed). Realistically I won't have the time to tackle this within the next 2 weeks.

For the time being: If you need to intercept requests in your own code I advise to not use the adblocker plugin, until this is fixed.

@berstend berstend added bug Something isn't working planned-feature Will be added in a future release and removed work-in-progress This is currently being worked on labels Jan 6, 2020
@gajus
Copy link

gajus commented Feb 16, 2020

For the time being: If you need to intercept requests in your own code I advise to not use the adblocker plugin, until this is fixed.

Is this advice still valid?

What is the correct way to intercept requests now?

@ptommasi
Copy link

ptommasi commented May 1, 2021

In case someone is in my same use case, removing the handler first work (I also left a comment on puppeteer issue 5334, with more details). Calling page.removeAllListeners("request") will clean the current page (so ad and resource blockers will be lost), and any other (existing or future) page is not affected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working planned-feature Will be added in a future release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants