Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

thoughts on a more modular design #2

Closed
transitive-bullshit opened this issue May 26, 2018 · 13 comments · Fixed by #81
Closed

thoughts on a more modular design #2

transitive-bullshit opened this issue May 26, 2018 · 13 comments · Fixed by #81
Labels
enhancement New feature or request idea

Comments

@transitive-bullshit
Copy link
Contributor

I'm definitely onboard with the core idea behind this module, and I have several additional use cases I'd like to add, but there are a few things currently stopping me from jumping in:

  1. Overriding puppeteer methods like newPage seems like it could lead to trouble; why not just listen to the built-in Browser event targetcreated which would do the same thing but in a less intrusive way?
  2. If we continue piling distinct functionality into this module, imho it'll become confusing and unwieldy pretty fast. I'm also hesitant to include code mods by default that I'm not going to make use of (even if they're not enabled).

This seems like a perfect use case for a plugin pattern, where the only addition to the public puppeteer API is a single use method that takes in a plugin and adds it to a private list of registered plugins. PuppeteerExtra.launch would then create a new browser and run it through each of the registered plugins to add whatever hooks / functionality that plugin wants to the browser instance.

The only special case here is plugins that want to affect the functionality / params of puppeteer.launch, so any registered plugin which defines a launch method would be allowed to edit the launch params before creating the browser instance.

const puppeteer = require('puppeteer-extra')
puppeteer.use(require('puppeteer-extra-plugin-stealth')())
puppeteer.use(require('puppeteer-extra-plugin-user-preferences')({ customPrefs: ... }))
puppeteer.use(require('puppeteer-extra-plugin-extension')('/path/to-extension/...'))

const browser = await puppeteer.launch({ ... })

This is a very similar pattern to how passport / express / babel work, and it scales really nicely once you get the base puppeteer-extra and puppeteer-extra-plugin interfaces defined.

Looking forward to hearing your thoughts :)

@berstend
Copy link
Owner

Thanks for the thoughtful feedback, much appreciated! :)

I agree on all points, this was basically the direction I wanted this module to go but then a deadline came up. 😅 But potential feedback like this was the reason I already published the current version, despite not being too happy with it's current form.

I must say I'm very impressed by how modular and clean your modules are (e.g. captcha-solver), a structure like this would be ideal to make puppeteer-extra very extensible and maintainable.

The only concern I have is that the event listeners exposed by Puppeteer might not be sufficient for all use cases and some monkey patching might still be required, but this can definitely be avoided in most common scenarios.

Let me look into this a little bit, will take inspiration from your packages and test out some concepts and make a new branch as a basis for discussions. :)

Thanks!

@berstend berstend added the enhancement New feature or request label May 26, 2018
@transitive-bullshit
Copy link
Contributor Author

Sounds good -- yeah, I hear you about the event listeners, so as long as we ensure that plugins are capable of monkey patching I think it should be fine.

Happy to help out with this btw once you have your mind set on a direction.

@berstend
Copy link
Owner

Alright, I gave it a first shot: https://github.com/berstend/puppeteer-extra/tree/rewrite

const puppeteer = require('puppeteer-extra')

puppeteer.use(require('puppeteer-extra-plugin-font-size')())
puppeteer.use(require('puppeteer-extra-plugin-anonymize-ua')({
  customFn: (ua) => 'MyCoolAgent/' + ua})
)

const browser = await puppeteer.launch({headless: false})

As you can tell the project structure is shamelessly stolen heavily inspired by your recent projects 😄 Also thanks for update-markdown-jsdoc, this saves quite a lot of keystrokes. 👍

There is currently no monkey patching involved, puppeteers event listeners where more powerful than I thought.

In terms of how things are set up:

Main module

  • The job of puppeteer-extra is to provide a drop in replacement for puppeteer, as well as to provide a clean interface for plugins to register themselves.
  • Other than that it should be kept as minimal as possible, besides some sanity/dependency checks and event dispatching to plugins

Plugins:

  • Whenever puppeteer.use() is called the plugin is being added to a local array
  • I wanted typical plugins to be as terse/DSL-like as possible, so for convenience they don't have to hook into puppeteer events themselves (they still can though) but it's sufficient if they expose specific methods, puppeteer-extra will call them when they exist (in sequence)
  • I looked into regular event emitters as well, as opposed to looping through properties, but given that everything is async, should be called in sequence and sometimes with return values they didn't seem like a good fit

Caveats:

  • Right now plugins extend the PuppeteerExtraPlugin base class, that doesn't do much currently. Could still be useful to have plugins derive from it in the future.
  • I'm not sure where the best place is to document the convenience methods/APIs that plugin developers can use separately, using JSDoc. I can't really define them in the base class or the check-if-property-exists approach wouldn't work 😄
  • Having a custom userDataDir is a pretty common plugin requirement, hence I added checks for that in the main module and implicitly require the user-data-dir plugin dynamically if needed and not already registered as plugin by the user
  • I had to add some logic regarding userPreferences to the main module, as all plugins can define some and they only should be merged/written once. I tried to make that as generic as possible, the idea of files (currently only consumed by the user-data-dir plugin) could be extended later to support arbitrary file operations through plugins

Happy to hear your thoughts on this, these fundamental concepts are hard to change later when others are developing plugins as well 😄

@transitive-bullshit
Copy link
Contributor Author

This is really awesome @berstend -- definitely in line with what I was thinking!

A few comments / thoughts:

  • Really like the use of the debug module to scope logging
  • I feel like a better approach for checkPluginDependencies would be for the plugin to check for dependencies upon being added via use. So you could move the current checkPluginDependencies to PuppeteerExtraPlugin base class and make it agnostic to take in a list of dependencies that the child specifies. This would make it so dependencies could require certain other dependencies and also make it not hard-coded to the user-data-dir scenario.
  • I don't think plugin instances should record the instance of 'puppeteer' or 'browser', keeping them as stateless as possible.
  • I'm guessing you're going to readd puppeteer-extra-plugin-stealth? I'm guessing this one'll be really popular :)

@berstend
Copy link
Owner

Thanks for the feedback @transitive-bullshit :)

  • I refactored things once more, as I wasn't too happy with how much the PuppeteerExtra class knew about the puppeteer internals (like events). 😄
  • I moved all the event handling stuff to the PuppeteerExtraPlugin class, which makes a whole lot more sense. Also now the JSDoc really shines and I'm quite happy with the how the plugin API looks.
  • I removed all puppeteer/instance instance passing. Everything is stateless and isolated now, with the small exception of the getDataFromPlugins method. Something like this was necessary to share data between plugins. But using that method requires a special permission and the interface is very clean and agnostic, so I think that's fine.
  • Dependency checking is now generic and mostly handled by the PuppeteerExtraPlugin class, the main class will only try to require the missing plugins (in an agnostic way). The plugin API interface to specify dependencies is also pretty clean (e.g. this plugin will require this one, which will require this one 😄 )

This was a fun exercise. It's interesting to balance the trade-offs between making the plugin framework as agnostic as possible and still being a little opinionated to make the typical 90% use cases very simple (while the other 10% still possible of course).

Would be great if you have time for another look, I think the current design should work well and stay maintainable but I'm happy about feedback from a potential plugin developer. :)

I'm guessing you're going to readd puppeteer-extra-plugin-stealth? I'm guessing this one'll be really popular :)

Yeah definitely, this was one of the main reasons to start the project 😄

@berstend
Copy link
Owner

The new puppeteer-extra is published. 😄🎉

Writing the remaining plugins was a breeze with the new modular design. Especially in the new stealth plugin this really shines, as it is itself just requiring separate plugins on demand using the agnostic dependency system.

Thanks again for the suggestion, I feel I improved as a developer through that exercise. 👍

Closing this PR for now, let me know when you have further feedback/suggestions. :)

@transitive-bullshit
Copy link
Contributor Author

Really happy with how this turned out as well :) I'll definitely be submitting some PRs for more plugins in the coming weeks.

Keep up the great work!

@m59peacemaker
Copy link

There are some really helpful plugins here. Definitely cool stuff. Isn't it the case that when you call puppeteer.use, you are applying that plugin to anything in the application that requires puppeteer-extra? Have you looked into having use return a puppeteer object that has the plugin applied?

const myPuppeteer = puppeteer
  .use(stealth)
  .use(anonymizeUA)

const myOtherPuppeteer = puppeteer
  .use(recaptcha)

const myBrowser = await myPuppeteer.launch()
const myOtherBrowser = await myOtherPuppeteer.launch()

@berstend
Copy link
Owner

Hey @m59peacemaker - thanks for using this relevant PR for your comment 😄👍

You are correct, the current behaviour is not really intuitive:

const puppeteerExtra1 = require("puppeteer-extra")
const puppeteerExtra2 = require("puppeteer-extra")
const puppeteerVanilla = require("puppeteer")

const checkForBeer = name => async browser => {
  const page = await browser.newPage()
  await page.goto("https://httpbin.org/headers", {
    waitUntil: "domcontentloaded"
  })
  const content = await page.content()
  console.log(`- ${name}: User agent includes beer:`, content.includes("Beer"))
  await browser.close()
}

const UserAgentPlugin = require("puppeteer-extra-plugin-anonymize-ua")
const uaPlugin = UserAgentPlugin({
  customFn: ua => ua.replace("Chrome", "Beer")
})

puppeteerExtra1.use(uaPlugin)

async function main() {
  await puppeteerExtra1
    .launch({ headless: true })
    .then(checkForBeer("puppeteerExtra1 (beer)"))
  await puppeteerExtra2
    .launch({ headless: true })
    .then(checkForBeer("puppeteerExtra2 (no beer?)"))
  await puppeteerVanilla
    .launch({ headless: true })
    .then(checkForBeer("puppeteerVanilla (no beer?)"))
}
main()

Output:

❯❯❯ node foo.js
- puppeteerExtra1 (beer): User agent includes beer: true
- puppeteerExtra2 (no beer?): User agent includes beer: true
- puppeteerVanilla (no beer?): User agent includes beer: false

It would be a good idea to return a new instance here, so composition of different puppeteer browser configs are possible.

The culprit for that behavior can be found here, we already instantiate a PuppeteerExtra instance for the user on require, which then gets cached.

The reason for that was to keep the usage as simpel as possible and to imitate how puppeteer is being used (being a "drop-in replacement" and all). For them that works as there's no modifiers on a puppeteer level, but rather the browser instance.

I'm wondering what would be the nicest API here (in terms of developer experience). I'm also growing a distaste of default exports the more I move to typescript, so this would be a good chance to overhaul the API signature of the whole project in general. 😄

Also in our pursuit to "hide" class constructors (and not having the developer to use the pesky new keyword) we use a function to instantiate classes, which makes it impossible to e.g. extend those classes.

Happy to hear thoughts on what the perfect API signature would look like to others and overhaul this in a major release.

@berstend berstend reopened this Feb 24, 2019
@berstend
Copy link
Owner

FWIW chaining .use() is already implemented. 😄

@berstend berstend added the idea label Feb 24, 2019
@m59peacemaker
Copy link

As far as the drop in replacement concern, I think instantiating the drop in replacement is reasonable for most people. Also having the instantiated version somewhere is an option.

// original:
const puppeteer = require('puppeteer')
puppeteer.launch()

// some drop-in possibilities:
const puppeteer = require('puppeteer-extra')()
const puppeteer = require('puppeteer-extra/drop-in')

I'm glad you're into the suggestion!

@berstend
Copy link
Owner

berstend commented Feb 25, 2019

Yeah definitely! This project is actively maintained and I'm always happy to improve things based on suggestions. If I had all the time in the world I would rewrite most things anyway (with now improved domain knowledge) 😄

To keep your suggestion backwards compatible (not a hard requirement for a future solution):

// Current behaviour, single instance (most common use-case?)
const puppeteer = require('puppeteer-extra')

// New addition to support instantiating multiple puppeteer-extra's
const puppeteer = require('puppeteer-extra/multi')()

To throw another possibility into the mix (that would emphasize the lightweight nature of this project):

const puppeteer = require('puppeteer')
const { makeExtra } = require('puppeteer-extra')

makeExtra(puppeteer) // or `enableExtras()`

puppeteer
  .use(stealth)
  .use(anonymizeUA)

or

const vanillaPuppeteer = require('puppeteer')
const { makeExtra } = require('puppeteer-extra')

const puppeteer = makeExtra(vanillaPuppeteer)
  .use(stealth)
  .use(anonymizeUA)

or

const { makePuppeteer } = require('puppeteer-extra')

const puppeteer = makePuppeteer()
  .use(stealth)
  .use(anonymizeUA)

I've noticed that certain cloud environments (apify, etc.) come with puppeteer pre-shipped and require it automatically, this approach would make it possible to still use puppeteer-extra then.

I'm gonna think about this for a little bit, also in regards to #21 and look at how similar projects approach that.

In case multi puppeteer-extra is desperately needed, here's a workaround. 😄

@berstend
Copy link
Owner

berstend commented Nov 29, 2019

This is now supported 😄

const { addExtra } = require('puppeteer-extra')
const puppeteer = addExtra(require('puppeteer-firefox'))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request idea
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants