High level automation API for working with Headless Chrome
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
bench Add concurrency for bench Aug 30, 2018
docs Add codecopy Apr 5, 2018
examples Improve example Jul 7, 2018
src Add docs Sep 10, 2018
static Update docs Aug 28, 2018
test Remove only Sep 7, 2018
.bumpedrc First commit Sep 5, 2017
.editorconfig First commit Sep 5, 2017
.gitattributes First commit Sep 5, 2017
.gitignore First commit Sep 5, 2017
.npmignore First commit Sep 5, 2017
.npmrc First commit Sep 5, 2017
.travis.yml Add specific travis setup Sep 6, 2017
CHANGELOG.md Release 4.1.3 Sep 8, 2018
CNAME Add CNAME Feb 25, 2018
LICENSE First commit Sep 5, 2017
README.md Add docs Sep 10, 2018
gulpfile.js Add documentation site Feb 25, 2018
index.html Update meta Aug 30, 2018
package.json Release 4.1.3 Sep 8, 2018

README.md

browserless

Last version Build Status Coverage Status Dependency status Dev Dependencies Status NPM Status Donate

Features

  • High level automation API on top Headless Chrome.
  • Oriented for production & performance scenarios.
  • Aborting unnecessary requests based on MIME types.
  • Pooling support to keep multiple browsers ready.
  • Blocking ads trackers by default.

Install

$ npm install puppeteer browserless --save

Usage

browserless is an high level API simplification over for do common actions.

For example, if you want to take an screenshot, just do:

const browserless = require('browserless')()

browserless
  .screenshot('http://example.com', { device: 'iPhone 6' })
  .then(tmpFile => {
    console.log(`your screenshot at ${tmpFile.path}`)
    tmpFile.cleanupSync()
  })

See more at examples.

Basic

All methods follow the same interface:

  • url: The target URL (required).
  • options: Specific settings for the method (optional).
  • callback: Node.js callback. If you don't provide one, the method will return a promise.

.constructor(options)

It creates the browser instance, using puppeter.launch method.

// Creating a simple instance
const browserless = require('browserless')()

or passing specific launchers options:

// Creating an instance for running it at AWS Lambda
const browserless = require('browserless')({
  ignoreHTTPSErrors: true,
  args: [
    '--disable-gpu',
    '--single-process',
    '--no-zygote',
    '--no-sandbox',
    '--hide-scrollbars'
  ]
})

options

See puppeteer.launch#options.

By default the library will be pass a well known list of flags, so probably you don't need any additional setup.

timeout

type:number
default: 30000

This setting will change the default maximum navigation time.

incognito

type:boolean
default: true

Every time a new page is created, it will be an incognito page.

An incognito page will not share cookies/cache with other browser pages.

.pool(options)

Tha main browserless constructor expose a singleton browser. This is enough for most scenarios, but in case you need you can intialize a pool of instances.

const createBrowserless = require('browserless')
const browserless = createBrowserless.pool()

options

See puppeteer.launch#options.

It follows the same API than constructor but accept a configurable parameter called poolOpts for setup specific pool options

poolOpts

See generic-pool#options.

.html(url, options)

It returns the full HTML content from the target url.

const browserless = require('browserless')

;(async () => {
  const url = 'https://example.com'
  const html = await browserless.html(url)
  console.log(html)
})()

options

See page.goto.

Additionally, you can setup:

waitFor

type:string|function|number
default: 0

Wait a quantity of time, selector or function using page.waitFor.

waitUntil

type:array
default: ['networkidle2', 'load', 'domcontentloaded']

Specify a list of events until consider navigation succeeded, using page.waitForNavigation.

userAgent

It will setup a custom user agent, using page.setUserAgent method.

viewport

It will setup a custom viewport, using page.setViewport method.

abortTypes

type: array
default: ['image', 'media', 'stylesheet', 'font', 'xhr']

A list of resourceType requests that can be aborted in order to make the process faster.

abortTrackers

type: boolean
default: true

It will be abort request coming for tracking domains.

.text(url, options)

It returns the full text content from the target url.

const browserless = require('browserless')

;(async () => {
  const url = 'https://example.com'
  const text = await browserless.text(url)
  console.log(text)
})()

options

They are the same than .html method.

.pdf(url, options)

It generates the PDF version of a website behind an url.

const browserless = require('browserless')

;(async () => {
  const url = 'https://example.com'
  const tmpFile = await browserless.pdf(url, {
    tmpOpts: {
      path: './',
      name: `${url.hostname}.${Date.now()}`
    }
  })

  console.log(`PDF generated at '${tmpFile.path}'`)
  tmpFile.cleanupSync() // It removes the file!
})()

It returns an tmpFile, with path where the temporal file live and cleanup/cleanupSync methods for clean the temporal file.

options

See page.pdf.

Additionally, you can setup:

tmpOptions

See tempy#api..

media

Changes the CSS media type of the page using page.emulateMedia.

device

It generate the PDF using the device descriptor name settings, like userAgent and viewport.

userAgent

It will setup a custom user agent, using page.setUserAgent method.

viewport

It will setup a custom viewport, using page.setViewport method.

.screenshot(url, options)

It takes a screenshot from the target url.

const browserless = require('browserless')

;(async () => {
  const url = 'https://example.com'
  const tmpFile = await browserless.screenshot(url, {
    tmpOpts: {
      path: './',
      name: `${url.hostname}.${Date.now()}`
    }
  })

  console.log(`Screenshot taken at '${tmpFile.path}'`)
  tmpFile.cleanupSync() // It removes the file!
})()

It returns a temporary file path with cleanup/cleanupSync methods for easily clean it.

options

See page.screenshot.

Additionally, you can setup:

tmpOptions

See tempy#api.

The options provided are passed to page.pdf.

Additionally, you can setup:

device

It generate the PDF using the device descriptor name settings, like userAgent and viewport.

userAgent

It will setup a custom user agent, using page.setUserAgent method.

viewport

It will setup a custom viewport, using page.setViewport method.

.devices

List of all available devices preconfigured with deviceName, viewport and userAgent settings.

These devices are used for emulation purposes.

.getDevice(deviceName)

Get an specific device descriptor settings by descriptor name.

const browserless = require('browserless')

browserless.getDevice('Macbook Pro 15')

// {
//   name: 'Macbook Pro 15',
//   userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X …',
//   viewport: {
//     width: 1440,
//     height: 900,
//     deviceScaleFactor: 1,
//     isMobile: false,
//     hasTouch: false,
//     isLandscape: false
//   }
// }

Advanced

The following methods are exposed to be used in scenarios where you need more granuality control and less magic.

.browser

It returns the internal browser instance used as singleton.

const browserless = require('browserless')

;(async () => {
  const browserInstance = await browserless.browser
})()

.evaluate(page, response)

It exposes an interface for creating your own evaluate function.

const browserless = require('browserless')()

const getUrlInfo = browserless.evaluate((page, response) => ({
  statusCode: response.status(),
  url: response.url(),
  redirectUrls: response.request().redirectChain()
}))

;(async () => {
  const url = 'https://example.com'
  const info = await getUrlInfo(url)

  console.log(info)
  // {
  //   "statusCode": 200,
  //   "url": "https://example.com/",
  //   "redirectUrls": []
  // }
})()

Internally the method performs a .goto operation and it will pass you the page and reponse.

.goto(page, options)

It performs a smart page.goto, blocking ads trackers) requests and other requests based on resourceType.

const browserless = require('browserless')

;(async () => {
  const page = await browserless.page()
  await browserless.goto(page, {
    url: 'http://savevideo.me',
    abortTypes: ['image', 'media', 'stylesheet', 'font']
  })
})()

options

url

type: string

The target URL

abortTypes

type: string
default: []

A list of req.resourceType() to be blocked.

abortTrackers

type: boolean
default: true

It will be abort request coming for tracking domains.

abortTrackers

type: boolean
default: true

It will be abort request coming for tracking domains.

waitFor

type:string|function|number
default: 0

Wait a quantity of time, selector or function using page.waitFor.

waitUntil

type:array
default: ['networkidle2', 'load', 'domcontentloaded']

Specify a list of events until consider navigation succeeded, using page.waitForNavigation.

userAgent

It will setup a custom user agent, using page.setUserAgent method.

viewport

It will setup a custom viewport, using page.setViewport method.

args

type: object

The settings to be passed to page.goto.

.page()

It returns a standalone browser new page.

const browserless = require('browserless')

;(async () => {
  const page = await browserless.page()
})()

Benchmark

We included a tiny benchmark utility for make easier testing multiple configuration settings.

FAQ

Q: Why use browserless over Puppeteer?

browserless not replace puppeteer, it complements. It's just a syntactic sugar layer over official Headless Chrome.

Q: Why do you block ads scripts by default?

Headless navigation is expensive compared with just fetch the content from a website.

In order to speed up the process, we block ads scripts by default because they are so bloat.

Q: My output is different from the expected

Probably browserless was too smart and it blocked a request that you need.

You can active debug mode using DEBUG=browserless* environment variable in order to see what is happening behind the code:

DEBUG=browserless* node index.js

Consider open an issue with the debug trace.

Q: Can I use browserless with my AWS Lambda like project?

Yes, check aws-lambda-chrome to setup AWS Lambda with a binary compatible.

Related

License

browserless © Kiko Beats, Released under the MIT License.
Authored and maintained by Kiko Beats with help from contributors.

logo designed by xinh studio.

kikobeats.com · GitHub Kiko Beats · Twitter @kikobeats