Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Info] Beta versions available for the new puppeteer-extra & playwright-extra #454

Open
berstend opened this issue Mar 17, 2021 · 34 comments
Assignees
Labels
package: core Affecting a core package planned-feature Will be added in a future release plugin: automation-extra AutomationExtra Plugin related plugin: recaptcha 🏴 reCAPTCHA plugin related work-in-progress This is currently being worked on

Comments

@berstend
Copy link
Owner

berstend commented Mar 17, 2021

The rewrite of puppeteer-extra is available for beta testing, to gather some final feedback before we make the switch. This issue is meant as a canonical reference on how to install those packages (also please report bugs/feedback here). 😄

edit: playwright-extra has landed: https://github.com/berstend/puppeteer-extra/tree/master/packages/playwright-extra


👉 We will follow a different approach than a full rewrite with a shared code base between puppeteer-extra and playwright-extra, more info can be found in this comment

(Click for previous (now outdated) info)

❌❌❌ The information below is outdated and does not apply anymore

Context

  • A major new version (rewrite) of puppeteer-extra is close to public release 🎉
  • The new plugin framework will support both Puppeteer and Playwright (adding playwright-extra)
  • Every existing puppeteer-extra-plugin-* should continue working with the new puppeteer-extra
  • In addition new plugins (@extra/*) are being released that support both Puppeteer and Playwright

More info can be found in the PR: #303

How to install (must read ⚡)

Important:

  • ⚡ The temporary tagged beta packages have issues with npm, please use yarn to install those.
  • The beta versions are published under the @next tag, you must add this tag when installing them.

Available packages

Important:

  • ⚡ The documentation links below point to the unreleased automation-extra branch, the installation instructions for the new packages there are written from the perspective of being released and don't mention the @next tag. Please install the packages as instructed in this issue.

puppeteer-extra

yarn add puppeteer@5 puppeteer-extra@next
  • Supports existing puppeteer-extra-plugin-* as well as the new @extra/* plugins

playwright-extra

yarn add playwright@1.8.0 playwright-extra@next
  • Supports Chrome, Firefox and Webkit and the new @extra/* plugins

New plugins

  • These plugins use the new base plugin and are compatible with both Playwright & Puppeteer.

@extra/recaptcha

yarn add @extra/recaptcha@next
  • A plugin for playwright-extra & puppeteer-extra to solve reCAPTCHAs and hCaptchas automatically.
  • Supports Playwright & Puppeteer, Chrome, Firefox and Webkit.

@extra/humanize

yarn add @extra/humanize@next
  • A plugin for playwright-extra & puppeteer-extra to humanize input (mouse movements, etc)
  • Supports Playwright & Puppeteer, Chrome, Firefox and Webkit.

Existing plugins

  • All existing puppeteer-extra plugins are meant to stay compatible with the new puppeteer-extra. Please report any issues you might experience.

Notes

  • Existing puppeteer-extra-plugin-* will work with puppeteer-extra, not playwright-extra.
  • An updated version of the popular stealth plugin with playwright support is not yet available.
  • The target audience of those beta packages are developers interested in testing them and providing feedback before the public release. I don't advise using them in production unless you really know what you're doing :-)
  • Puppeteer broke typings support in their latest releases, use puppeteer@5 when using TypeScript
@berstend berstend added work-in-progress This is currently being worked on planned-feature Will be added in a future release plugin: recaptcha 🏴 reCAPTCHA plugin related plugin: automation-extra AutomationExtra Plugin related package: core Affecting a core package labels Mar 17, 2021
@berstend berstend self-assigned this Mar 17, 2021
@berstend berstend pinned this issue Mar 17, 2021
@berstend berstend changed the title [Info/Umbrella] Beta versions available for the new puppeteer-extra & playwright-extra [Info] Beta versions available for the new puppeteer-extra & playwright-extra Mar 17, 2021
@j3lev
Copy link

j3lev commented Apr 15, 2021

Hey @berstend, I'm having an issue with using versions of Playwright greater than 1.8.0. I ran into this when attempting to use Playwright 1.10.0 with playwright-extra inside a docker container. The browser launch fails because the library tries to use the 1.8 browser binary (chromium-844399) which is missing from a clean Playwright 1.10 install. When I swap out playwright-extra for the vanilla library, the browsers launch fine. I was not running into this issue locally because the 1.8 browser binaries are left over from a previous Playwright 1.8 install. I suspect this might have something to do with the version being locked here

"playwright-core": "1.8.0"
For reference, I am using the official Playwright docker image here https://github.com/microsoft/playwright/blob/master/utils/docker/Dockerfile.bionic. Thoughts?

@berstend
Copy link
Owner Author

berstend commented Apr 15, 2021

@j3lev thanks for the feedback! are you using the regular playwright package as well? If so that one should take precedence over the "bundled" -core one.

The reason we're including the -core package as a dependency currently is:
a) typings (so non-TS VScode users get Intellisense automatically)
b) to re-export the top level stuff from the vanilla package (errors, selectors, devices):

/** Returns playwright specific errors */
export const errors = playwrightCore.errors
/** Selectors can be used to install custom selector engines. */
export const selectors = playwrightCore.selectors
/** Returns a list of devices to be used with browser.newContext([options]) or browser.newPage([options]). */
export const devices = playwrightCore.devices
export default {
addExtra,
chromium,
firefox,
webkit,
errors,
selectors,
devices
}

Overall I'm not too happy to have -core as a regular (and especially version pinned) dependency and will overhaul that before we make the release. A few days ago I realized I should be able to export getters here and lazy load any installed -core or non-core playwright lib. Will give this a go soon. :-)

Thanks for reporting this issue (I suspected pinning the version would cause issues down the line) 👍

@j3lev
Copy link

j3lev commented Apr 15, 2021

I am using playwright 1.10.0 alongside and it does not work. I also tried in the past with 1.9 and was having the same issue but didn't have time to look into it.

@berstend
Copy link
Owner Author

berstend commented Apr 15, 2021

@j3lev oh you're correct - I was mistaken as we're currently trying to require -core prior to the regular one:

const packages = [driverName + '-core', driverName]
const launcher = requirePackages(packages)

I will make sure to change that behavior when I overhaul that aspect.

The automation-extra stuff is currently a beta version, if it's mission-critical for you to get this resolved asap let me know. ;-)

(Using playwright@1.8.0 for the time being would be a workaround of sorts)

@berstend
Copy link
Owner Author

berstend commented Apr 15, 2021

I updated the installation instructions in this issue to install playwright@1.8.0 and save the next beta tester from the experience you had. :-) (This is of course just a temporary fix until I had time to resolve it properly)

@j3lev
Copy link

j3lev commented Apr 15, 2021

Yeah for sure, only reason I bring it up is to be able to take advantage of new features that are coming out such as channels https://playwright.dev/docs/browsers#google-chrome--microsoft-edge, also some new selector syntax was introduced in 1.9.0 which is nice as well. Keep up the good work and I cannot wait to see this get released!

@windbridges
Copy link

@berstend, сould you tell, does using of playwright-extra with stealth-plugin solve this issue, or stealth-plugin still does not work with playwright due to their own intermediate wire protocol instead of CDP?

@berstend
Copy link
Owner Author

@windbridges there's currently no stealth plugin for playwright (and the existing one is not compatible). The main reason is time constraints on my end and playwright making it more difficult to hook into the CDP flow so porting the stuff over from the existing plugin isn't just copy paste but more involved. :-)

@opahopa
Copy link

opahopa commented Apr 30, 2021

@windbridges you can use the minified version of the stealth plugin from the extract-stealth-evasions, works perfectly fine for me with playwright.

@berstend
Copy link
Owner Author

@windbridges you can use the minified version of the stealth plugin from the extract-stealth-evasions, works perfectly fine for me with playwright.

Unfortunately that will only result in cursory fixes, quite a few things rely on CDP and are not part of the js evasions scripts.

@j3lev
Copy link

j3lev commented Jun 3, 2021

hey @berstend! hope all is well, i was just wondering when we can expect to use newer versions of playwright with this, the only reason i ask is that 1.8 appears to be no longer listed in the official Playwright docs, so I'm guessing they may drop support for it quite soon

@terion-name
Copy link

Existing puppeteer-extra-plugin-* will work with puppeteer-extra, not playwright-extra.

BTW, I use puppeteer-extra-plugin-stealth with playwrite for a long time with such hack:

const enabledEvasions = [/*list of my requred evasions*/];
    const evasions = enabledEvasions.map(e => new require(`puppeteer-extra-plugin-stealth/evasions/${e}`));
    const stealth = {
      callbacks: [],
      async evaluateOnNewDocument(...args) {
        this.callbacks.push({cb: args[0], a: args[1]})
      }
    }
    evasions.forEach(e => e().onPageCreated(stealth));
    for (let evasion of stealth.callbacks) {
      await browserContext.addInitScript(evasion.cb, evasion.a);
    }

@maiux
Copy link

maiux commented Sep 11, 2021

@berstend don't know if it's dirty or not, thanks to @terion-name actually I got it work with Playwright@1.14. This is the code I used and the results via screenshots:

(async () => {
    const { chromium } = require("playwright");

    const browser = await chromium.launch({
        channel: "chrome",
        headless: true,
    });

    const originalUserAgent = await (await (await browser.newContext()).newPage()).evaluate(() => { return navigator.userAgent });

    const browserContext = await browser.newContext({
        userAgent: originalUserAgent.replace("Headless", ""),
    });

    const page = await browserContext.newPage();

    const enabledEvasions = [
        'chrome.app',
        'chrome.csi',
        'chrome.loadTimes',
        'chrome.runtime',
        'iframe.contentWindow',
        'media.codecs',
        'navigator.hardwareConcurrency',
        'navigator.languages',
        'navigator.permissions',
        'navigator.plugins',
        'navigator.webdriver',
        'sourceurl',
        // 'user-agent-override', // doesn't work since playwright has no page.browser()
        'webgl.vendor',
        'window.outerdimensions'
    ];
    const evasions = enabledEvasions.map(e => new require(`puppeteer-extra-plugin-stealth/evasions/${e}`));
    const stealth = {
        callbacks: [],
        async evaluateOnNewDocument(...args) {
            this.callbacks.push({ cb: args[0], a: args[1] })
        }
    }
    evasions.forEach(e => e().onPageCreated(stealth));
    for (let evasion of stealth.callbacks) {
        await browserContext.addInitScript(evasion.cb, evasion.a);
    }

    await page.goto("https://bot.sannysoft.com");
    await page.waitForTimeout(1000);
    await page.screenshot({ path: "screenshot-sannysoft.jpg", fullPage: true });

    await page.goto("https://abrahamjuliot.github.io/creepjs/");
    await page.waitForTimeout(1000);
    await page.screenshot({ path: "screenshot-creepjs.jpg", fullPage: true });

    await page.goto("http://f.vision/");
    await page.waitForTimeout(1000);
    await page.screenshot({ path: "screenshot-fvision.jpg", fullPage: true });

    await page.goto("https://pixelscan.net/");
    await page.waitForTimeout(1000);
    await page.screenshot({ path: "screenshot-pixelscan.jpg", fullPage: true });

    // await browserContext.waitForEvent("close");
    await browser.close();
})();

screenshot-creepjs
screenshot-fvision
screenshot-pixelscan
screenshot-sannysoft

@floppabro1337
Copy link

@maiux I've also been using this hack for my program since berstend doesn't seem to have time/interest in updating it.

@berstend
Copy link
Owner Author

berstend commented Sep 13, 2021

hey @berstend! hope all is well, i was just wondering when we can expect to use newer versions of playwright with this

📙 TL;DR: Progress on the switch to the new codebase had stalled but we're back at it now.


A little more context:

Apologies for the delay on this - puppeteer unfortunately breaking TypeScript typings a while back took the wind out of the sails of the planned release of the new branch and I've been waiting a bit for the dust to settle. 😅

Given the projects popularity I'm a bit cautious about replacing the old versions until I'm satisfied it'll be a smooth and backwards compatible transition for everyone, hence we haven't made the switch yet :)

I haven't updated the @next packages in the meantime as the packaging/deployment of those is a bit brittle and cumbersome (our monorepo tool lerna unfortunately fails to resolve their dependencies automatically, which means I need to bump all internal dependencies manually)

I'm not a huge fan of the current limbo situation though and want us to switch to the new codebase as soon as possible.

Things I have on my shortlist in this regard:

  • Figure out the definitive best way how we want to deal with typings in our packages (peerDependencies are a mess, if we don't ship with them as a dependency regular pptr < v5 JS users don't get to enjoy Intellisense hints, if we do ship with a specific version TS users with a different pptr/pw version might run into conflicts, puppeteer switched to TS/built-in types themselves a while ago, etc.)
  • Backport some recent changes made in the old recaptcha plugin to the new @extra/recaptcha
  • Optimize the plugin API to allow for easy script injection in workers as well
  • See if I can find usage numbers on older puppeteer versions, dropping support for some older versions would make the migration a lot easier

Regarding playwright + stealth: The "hacks" discussed here are fine 😄 Unfortunately they only cover JS based evasions and don't handle launch args or more importantly CDP commands, which is the main issue I ran into when working on the playwright stealth port. Playwright only allows to create a new CDP session whereas we need to hook into the existing one. I did however find a promising workaround I'm currently fleshing out, so a stealth plugin with full playwright support is on the horizon again. :)

@andrisi
Copy link

andrisi commented Sep 22, 2021

@berstend have you tried to add a feature request to playwright? they're very responsive and open about their development and what could or couldn't be done. Access to CDP sessions or whatever else you miss.

@Osiris-Team
Copy link

@berstend That's great news! Just wanted to say thank you in the name of all the people using this software! So yeah thanks for the great and open source work, we all appreciate it very much!

@j3lev
Copy link

j3lev commented Dec 29, 2021

Hey there, is there any chance the playwright dependency can be moved up to the latest? The playwright-core dependency is 9 minor versions behind?

@ya-mouse
Copy link

Would be great to bump playwright-core dependency to 1.18.0

@1nVitr0
Copy link

1nVitr0 commented Feb 10, 2022

@berstend Just judging by the NPM downloads of puppeteer, there seems to be a major amount of people hanging on the puppeteer@5 version (and puppeteer@1 for some reason). I'm one of them, but for me this is only due to puppeteer-extra not being compatible with puppeteer versions >=6.

puppeteer-versions

I can't speak for anyone else, but I do think the majority of users would be fine with dropping support for puppeteer < 6, or using an older version of puppeteer-extra if they really need it (I've been using the current version of puppeteer-extra just fine, but I would love to update).

I realize that puppeteer breaking their typings must be really frustrating. And their issue mess is probably not helping. If we can help you with any specific tasks that need doing, let us know. I'm sure a few people would love to help (including me), but don't want to interfere with the upgrade process.

@jv1968
Copy link

jv1968 commented Feb 21, 2022

@maiux thank you for sharing your code, it was quite helpful! That being said the browser seems to have a Trust Score of 0% when visting https://abrahamjuliot.github.io/creepjs/.
Do you know any ways to circumvent that?

@aus10code
Copy link

What's the current status of stealth in playwright? Have the CSP issues been resolved? I've been digging to find the answer to no avail.

@paambaati
Copy link

paambaati commented Mar 2, 2022

Playwright only allows to create a new CDP session whereas we need to hook into the existing one.

@berstend FWIW, their documentation includes a connectOverCDP method that seems to be doing what you describe.

@andrisi
Copy link

andrisi commented Mar 3, 2022

@berstend you can patch the Playwright source, or fork it. It's quite easy to expose the CDP session for Chromium browsers. Are you really just stcuk on this? Shall we help? It would be magical to have your extension for Playwright, which has a much friendlier API than Puppeteer.

@berstend berstend unpinned this issue Jun 20, 2022
@dilame
Copy link

dilame commented Jun 20, 2022

Wow, seems like we have @berstend back! Can't wait to know what does the "unpinned this issue" means 😄

@b5414
Copy link

b5414 commented Jun 20, 2022

LETS GOOOOOOOOO

@berstend
Copy link
Owner Author

berstend commented Jun 29, 2022

Wow, seems like we have @berstend back! Can't wait to know what does the "unpinned this issue" means 😄

Quick update regarding playwright support 😄

I reflected on why I never finished the automation-extra branch and came to the following realizations:

  • A massive rewrite like this is a nightmare to merge in, especially with a project that's used in production by many
  • While the new code was in beta mode the regular plugin development did not stop and I had essentially doubled my workload by having to keep the old and the new plugins (supporting both playwright & puppeteer) in sync
  • Bad timing: Typings are already tricky for a version-agnostic plugin framework, it didn't help that puppeteer switched from @types/puppeteer to their built-in (and initially broken) types
  • Playwright's APIs kept diverging from puppeteer as time went on, in addition they made things less "hacker friendly" (client/server split, custom wire protocol, overzealous input validation, using exports in their package.json which prevents monkey patching, etc)

Instead I decided to follow a more iterative approach:

  • No complete rewrite of the whole project or sharing code with puppeteer-extra (for the moment), playwright-extra is it's own thing which makes rolling it out much easier
  • No new shared plugin base class for now
    • Looking at download numbers the main plugins of interest are stealth & recaptcha
    • I've worked out a "compatibility shim" that allows loading in these major puppeteer-extra plugins without changes into playwright-extra

While working on this I've also found solutions to quite a few long standing issues around types ("how can we use playwright types internally without imposing a specific version on the user", "how to re-export top-level module exports like playwright.devices without shipping with a specific version of it") and other things

The existing stealth and recaptcha plugins are already working well (even with Firefox & Webkit 🎉) and most of the explorative code is done. I'm now working on cleanup, tests and documentation and should be able to release this quite soon and without any potential side-effects (it's just a single new package: playwright-extra)

TL;DR: Instead of a complete rewrite with a new shared plugin framework we start with a playwright-extra version that is compatible with the majority of puppeteer-extra plugins 😄

image

playwright-extra using a puppeteer compatibility layer to load in puppeteer-extra-plugin-recaptcha to solve captchas in webkit 😁

@michelgammelgaard
Copy link

@berstend Sounds great! Stealth for Playwright would be very useful (read: 100% necessary) in one of our projects.

Do you have any kind of ETA on this release? No pressure 😁

This was referenced Jul 1, 2022
@berstend
Copy link
Owner Author

berstend commented Jul 3, 2022

Do you have any kind of ETA on this release? No pressure 😁

I do you one better (than an ETA) by just releasing it 😄

Successfully published:
 - playwright-extra@3.3.2

Readme: https://github.com/berstend/puppeteer-extra/tree/master/packages/playwright-extra

Feedback welcome!

@eliassorensen
Copy link

That's amazing @berstend ! Will test it out.

@NikolaiWaerpen
Copy link

Hey, what's the state of development here? Love the work!

@xfm18901105
Copy link

Is there a plan to support playwright java or python?

@andrisi
Copy link

andrisi commented Jan 4, 2023

@xfm18901105 highly unlikely as it would be a waste of it's developer's time. It's a special purpose tool, if you want to use it, be grateful it exists, write the relevant code in JavaScript and the rest in whatever you like.

@xfm18901105
Copy link

thank u, i try to load javascript in java bindings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
package: core Affecting a core package planned-feature Will be added in a future release plugin: automation-extra AutomationExtra Plugin related plugin: recaptcha 🏴 reCAPTCHA plugin related work-in-progress This is currently being worked on
Projects
None yet
Development

No branches or pull requests