Skip to content

A puppeter-like Node.js library for interacting with Headless production scenarios.

License

Notifications You must be signed in to change notification settings

adityawankhede5/browserless

 
 

Repository files navigation

browserless

Last version Build Status Dependency status Dev Dependencies Status NPM Status

A puppeteer-like Node.js library for interacting with Headless production scenarios.

Why

Although you can think puppeteer could be enough, there is a set of use cases that make sense built on top of puppeteer and they are necessary to support into robust production scenario, like:

  • Sensible good defaults, aborting unnecessary requests based of what you are doing (e.g, aborting image request if you just want to get .html content).
  • Easily create a pool of instance (via @browserless/pool).
  • Built-in adblocker for aborting ads requests.

Install

browserless is built on top of puppeteer, so you need to install it as well.

$ npm install puppeteer browserless --save

You can use browserless together with puppeteer, puppeteer-core or puppeteer-firefox.

Internally, the library is divided into different packages based on the functionality

Usage

The browserless API is like puppeteer, but doing more things under the hood (not too much, I promise).

For example, if you want to take an screenshot, just do:

const browserless = require('browserless')()

browserless
  .screenshot('http://example.com', { device: 'iPhone 6' })
  .then(buffer => {
    console.log(`your screenshot is here!`)
  })

You can see more common recipes at @browserless/examples.

Basic

All methods follow the same interface:

  • <url>: The target URL. It's required.
  • [options]: Specific settings for the method. It's optional.

The methods returns a Promise or a Node.js callback if pass an additional function as the last parameter.

.constructor(options)

It creates the browser instance, using puppeter.launch method.

// Creating a simple instance
const browserless = require('browserless')()

or passing specific launchers options:

// Creating an instance for running it at AWS Lambda
const browserless = require('browserless')({
  ignoreHTTPSErrors: true,
  args: [
    '--disable-gpu',
    '--single-process',
    '--no-zygote',
    '--no-sandbox',
    '--hide-scrollbars'
  ]
})

options

See puppeteer.launch#options.

Additionally, you can setup:

timeout

type: number
default: 30000

This setting will change the default maximum navigation time.

puppeteer

type: Puppeteer
default: puppeteer|puppeteer-core|puppeteer-firefox

It's automatically detected based on your dependencies being supported puppeteer, puppeteer-core or puppeteer-firefox.

Alternatively, you can pass it.

incognito

type: boolean
default: false

Every time a new page is created, it will be an incognito page.

An incognito page will not share cookies/cache with other browser pages.

.html(url, options)

It serializes the content from the target url into HTML.

const browserless = require('browserless')

;(async () => {
  const url = 'https://example.com'
  const html = await browserless.html(url)
  console.log(html)
})()

options

This method use the following options by default:

{
  disableAnimations: false
}

See browserless.goto to know all the options and values supported.

.text(url, options)

It serializes the content from the target url into plain text.

const browserless = require('browserless')

;(async () => {
  const url = 'https://example.com'
  const text = await browserless.text(url)
  console.log(text)
})()

options

This method use the following options by default:

{
  disableAnimations: false
}

See browserless.goto to know all the options and values supported.

.pdf(url, options)

It generates the PDF version of a website behind an url.

const browserless = require('browserless')

;(async () => {
  const url = 'https://example.com'
  const buffer = await browserless.pdf(url)
  console.log(`PDF generated!`)
})()

options

This method use the following options by default:

{
  disableAnimations: true,
  margin: '0.35cm',
  media: 'print',
  printBackground: true,
  scale: 0.65
}

See browserless.goto to know all the options and values supported.

Also, any page.pdf option is supported.

Additionally, you can setup:

margin

type: stringstring[]
default: '0.35cm'

It sets paper margins. All possible units are:

  • px for pixel.
  • in for inches.
  • cm for centimeters.
  • mm for millimeters.

You can pass an object object specifing each corner side of the paper:

;(async () => {
  const buffer = await browserless.pdf(url.toString(), {
    margin: {
      top: '0.35cm',
      bottom: '0.35cm',
      left: '0.35cm',
      right: '0.35cm'
    }
  })
})()

Or, in case you pass an string, it will be used for all the sides:

;(async () => {
  const buffer = await browserless.pdf(url.toString(), {
    margin: '0.35cm'
  })
})()

.screenshot(url, options)

It takes a screenshot from the target url.

const browserless = require('browserless')

;(async () => {
  const url = 'https://example.com'
  const buffer = await browserless.screenshot(url)
  console.log(`Screenshot taken!`)
})()

options

This method use the following options by default:

{
  disableAnimations: true,
  device: 'macbook pro 13'
}

See browserless.goto to know all the options and values supported.

Also, any page.screenshot option is supported.

Additionally, you can setup:

click

type: stringstring[]

Click the DOM element matching the given CSS selector.

element

type: string

Capture the DOM element matching the given CSS selector. It will wait for the element to appear in the page and to be visible.

hide

type: stringstring[]

Hide DOM elements matching the given CSS selectors.

Can be useful for cleaning up the page.

;(async () => {
  const buffer = await browserless.screenshot(url.toString(), {
    hide: ['.crisp-client', '#cookies-policy']
  })
})()

This sets visibility: hidden on the matched elements.

modules

type: stringstring[]

Inject JavaScript modules into the page.

Accepts an array of inline code, absolute URLs, and local file paths (must have a .js extension).

;(async () => {
  const buffer = await browserless.screenshot(url.toString(), {
    modules: ['https://cdn.jsdelivr.net/npm/@microlink/mql@0.3.12/src/browser.js', 'local-file.js', `document.body.style.backgroundColor = 'red`]
  })
})()
scripts

type: stringstring[]

Same as the modules option, but instead injects the code as <script> instead of <script type="module">. Prefer the modules option whenever possible.

;(async () => {
  const buffer = await browserless.screenshot(url.toString(), {
    scripts: ['https://cdn.jsdelivr.net/npm/jquery@3.4.1/dist/jquery.min.js', 'local-file.js', `document.body.style.backgroundColor = 'red`]
  })
})()
styles

type: stringstring[]

Inject CSS styles into the page.

Accepts an array of inline code, absolute URLs, and local file paths (must have a .css extension).

;(async () => {
  const buffer = await browserless.screenshot(url.toString(), {
    styles: ['https://cdn.jsdelivr.net/npm/hack@0.8.1/dist/dark.css', 'local-file.css', `body { background: red; }`, ``]
  })
})()
scrollTo

type: string | object

Scroll to the DOM element matching the given CSS selector.

overlay

type: object

After the screenshot has been taken, this option allows you to place the screenshot into a fancy overlay

You can configure the overlay specifying:

  • browser: It sets the browser image overlay to use, being light and dark supported values.

  • background: It sets the background to use, being supported to pass:

    • An hexadecimal/rgb/rgba color code, eg. #c1c1c1.
    • A CSS gradient, eg. linear-gradient(225deg, #FF057C 0%, #8D0B93 50%, #321575 100%)
    • An image url, eg. https://source.unsplash.com/random/1920x1080.
;(async () => {
  const buffer = await browserless.screenshot(url.toString(), {
    hide: ['.crisp-client', '#cookies-policy'],
    overlay: {
      browser: 'dark',
      background: 'linear-gradient(45deg, rgba(255,18,223,1) 0%, rgba(69,59,128,1) 66%, rgba(69,59,128,1) 100%)'
    }
  })
})()

.devices

It has all the devices presets available, being possible to load viewport and user agents settings based on a device descriptor.

These devices are used for emulation purposes. It extends from puppeteer.devices.

.getDevice({ device, viewport, headers })

Get a specific device descriptor settings by descriptor name.

It doesn't matter if device name is lower/upper case.

const browserless = require('browserless')

browserless.getDevice({ device: 'Macbook Pro 15' })
// {
//   userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.89 Safari/537.36',
//   viewport: {
//     width: 1440,
//     height: 900,
//     deviceScaleFactor: 2,
//     isMobile: false,
//     hasTouch: false,
//     isLandscape: false
//   }
// }

Advanced

The following methods are exposed to be used in scenarios where you need more granularity control and less magic.

.browser

It returns the internal browser instance used as singleton.

const browserless = require('browserless')

;(async () => {
  const browserInstance = await browserless.browser
})()

.evaluate(fn, gotoOpts)

It exposes an interface for creating your own evaluate function, passing you the page and response.

The fn will receive page and response as arguments:

const browserless = require('browserless')()

const getUrlInfo = browserless.evaluate((page, response) => ({
  statusCode: response.status(),
  url: response.url(),
  redirectUrls: response.request().redirectChain()
}))

;(async () => {
  const url = 'https://example.com'
  const info = await getUrlInfo(url)

  console.log(info)
  // {
  //   "statusCode": 200,
  //   "url": "https://example.com/",
  //   "redirectUrls": []
  // }
})()

Note you don't need to close the page; It will be done under the hood.

Internally, the method performs a browserless.goto, being possible to pass extra arguments as second parameter:

const browserless = require('browserless')()

const getText = browserless.evaluate(
  page => page.evaluate(() => document.body.innerText), {
    waitUntil: 'domcontentloaded',
    disableAnimations: false
  })

;(async () => {
  const url = 'https://example.com'
  const text = await getText(url)

  console.log(text)
})()

.goto(page, options)

It performs a smart page.goto, using a builtin adblocker.

const browserless = require('browserless')

;(async () => {
  const page = await browserless.page()
  const { response, device } = await browserless.goto(page, { url: 'http://example.com' })
})()

options

Any option passed here will bypass to page.goto.

Additionally, you can setup:

adblock

type: boolean
default: true

It will be abort requests detected as ads.

device

type: string
default: 'macbook pro 13'

It specifies the device descriptor to use in order to retrieve userAgent and viewport

disableAnimations

Type: boolean
Default: true

Disable CSS animations and transitions.

disableJavaScript

Type: boolean
Default: false

When it's true, it sets JavaScript as disabled on the current page.

headers

type: object

An object containing additional HTTP headers to be sent with every request.

const browserless = require('browserless')

;(async () => {
  const page = await browserless.page()
  await browserless.goto(page, {
    url: 'http://example.com',
    headers: {
      'user-agent': 'googlebot',
      cookie: 'foo=bar; hello=world'
    }
  })
})()
media

type: string
default: 'screen'

Changes the CSS media type of the page using page.emulateMedia.

url

type: string

The target URL.

viewport

It will setup a custom viewport, using page.setViewport method.

waitFor

type:string|function|number
default: 0

Wait a quantity of time, selector or function using page.waitFor.

.page()

It returns a standalone browser new page.

const browserless = require('browserless')

;(async () => {
  const page = await browserless.page()
})()

Pool of Instances

browserless uses internally a singleton browser instance.

If you want to keep multiple browsers open, you can use @browserless/pool package.

const createBrowserless = require('@browserless/pool')

const browserlessPool = createBrowserless({
  max: 2, // max browsers to keep open
  timeout: 30000 // max time a browser is consiedered fresh
})

You can still pass specific puppeteer options as second argument:

const createBrowserless = require('@browserless/pool')

const browserlessPool = createBrowserless({
  max: 2, // max browsers to keep open
  timeout: 30000 // max time a browser is consiedered fresh
}, {
  ignoreHTTPSErrors: true,
  args: [
    '--disable-gpu',
    '--single-process',
    '--no-zygote',
    '--no-sandbox',
    '--hide-scrollbars'
  ]
})

After that, the API is the same than browserless:

browserlessPool
  .screenshot('http://example.com', { device: 'iPhone 6' })
  .then(buffer => {
    console.log(`your screenshot is here!`)
  })

Every time you call the pool, it handles acquire and release a new browser instance from the pool ✨.

Packages

browserless is internally divided into multiple packages for ensuring just use the mininum quantity of code necessary for your user case.

Package Version Dependencies
browserless npm Dependency Status
@browserless/benchmark npm Dependency Status
@browserless/devices npm Dependency Status
@browserless/examples npm Dependency Status
@browserless/goto npm Dependency Status
@browserless/pdf npm Dependency Status
@browserless/pool npm Dependency Status
@browserless/screenshot npm Dependency Status
@browserless/stats npm Dependency Statsus

Benchmark

For testing different approach, we included a tiny benchmark tool called @browserless/benchmark.

FAQ

Q: Why use browserless over Puppeteer?

browserless not replace puppeteer, it complements. It's just a syntactic sugar layer over official Headless Chrome oriented for production scenarios.

Q: Why do you block ads scripts by default?

Headless navigation is expensive compared with just fetch the content from a website.

In order to speed up the process, we block ads scripts by default because they are so bloat.

Q: My output is different from the expected

Probably browserless was too smart and it blocked a request that you need.

You can active debug mode using DEBUG=browserless environment variable in order to see what is happening behind the code:

DEBUG=browserless node index.js

Consider open an issue with the debug trace.

Q: Can I use browserless with my AWS Lambda like project?

Yes, check chrome-aws-lambda to setup AWS Lambda with a binary compatible.

License

browserless © Microlink, Released under the MIT License.
Authored and maintained by Kiko Beats with help from contributors.

The logo has been designed by xinh studio.

microlink.io · GitHub @MicrolinkHQ · Twitter @microlinkhq

About

A puppeter-like Node.js library for interacting with Headless production scenarios.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • JavaScript 92.5%
  • HTML 7.5%