A puppeteer-like Node.js library for interacting with Headless production scenarios.
Although you can think puppeteer could be enough, there is a set of use cases that make sense built on top of puppeteer and they are necessary to support into robust production scenario, like:
- Sensible good defaults, aborting unnecessary requests based of what you are doing (e.g, aborting image request if you just want to get
.html
content). - Easily create a pool of instance (via
@browserless/pool
). - Built-in adblocker for aborting ads requests.
browserless is built on top of puppeteer, so you need to install it as well.
$ npm install puppeteer browserless --save
You can use browserless together with puppeteer
, puppeteer-core
or puppeteer-firefox
.
Internally, the library is divided into different packages based on the functionality
The browserless API is like puppeteer, but doing more things under the hood (not too much, I promise).
For example, if you want to take an screenshot
, just do:
const browserless = require('browserless')()
browserless
.screenshot('http://example.com', { device: 'iPhone 6' })
.then(buffer => {
console.log(`your screenshot is here!`)
})
You can see more common recipes at @browserless/examples
.
All methods follow the same interface:
<url>
: The target URL. It's required.[options]
: Specific settings for the method. It's optional.
The methods returns a Promise or a Node.js callback if pass an additional function as the last parameter.
It creates the browser
instance, using puppeter.launch method.
// Creating a simple instance
const browserless = require('browserless')()
or passing specific launchers options:
// Creating an instance for running it at AWS Lambda
const browserless = require('browserless')({
ignoreHTTPSErrors: true,
args: [
'--disable-gpu',
'--single-process',
'--no-zygote',
'--no-sandbox',
'--hide-scrollbars'
]
})
Additionally, you can setup:
type: number
default: 30000
This setting will change the default maximum navigation time.
type: Puppeteer
default: puppeteer
|puppeteer-core
|puppeteer-firefox
It's automatically detected based on your dependencies
being supported puppeteer, puppeteer-core or puppeteer-firefox.
Alternatively, you can pass it.
type: boolean
default: false
Every time a new page is created, it will be an incognito page.
An incognito page will not share cookies/cache with other browser pages.
It serializes the content from the target url
into HTML.
const browserless = require('browserless')
;(async () => {
const url = 'https://example.com'
const html = await browserless.html(url)
console.log(html)
})()
This method use the following options by default:
{
disableAnimations: false
}
See browserless.goto to know all the options and values supported.
It serializes the content from the target url
into plain text.
const browserless = require('browserless')
;(async () => {
const url = 'https://example.com'
const text = await browserless.text(url)
console.log(text)
})()
This method use the following options by default:
{
disableAnimations: false
}
See browserless.goto to know all the options and values supported.
It generates the PDF version of a website behind an url
.
const browserless = require('browserless')
;(async () => {
const url = 'https://example.com'
const buffer = await browserless.pdf(url)
console.log(`PDF generated!`)
})()
This method use the following options by default:
{
disableAnimations: true,
margin: '0.35cm',
media: 'print',
printBackground: true,
scale: 0.65
}
See browserless.goto to know all the options and values supported.
Also, any page.pdf option is supported.
Additionally, you can setup:
type: string
| string[]
default: '0.35cm'
It sets paper margins. All possible units are:
px
for pixel.in
for inches.cm
for centimeters.mm
for millimeters.
You can pass an object
object specifing each corner side of the paper:
;(async () => {
const buffer = await browserless.pdf(url.toString(), {
margin: {
top: '0.35cm',
bottom: '0.35cm',
left: '0.35cm',
right: '0.35cm'
}
})
})()
Or, in case you pass an string
, it will be used for all the sides:
;(async () => {
const buffer = await browserless.pdf(url.toString(), {
margin: '0.35cm'
})
})()
It takes a screenshot from the target url
.
const browserless = require('browserless')
;(async () => {
const url = 'https://example.com'
const buffer = await browserless.screenshot(url)
console.log(`Screenshot taken!`)
})()
This method use the following options by default:
{
disableAnimations: true,
device: 'macbook pro 13'
}
See browserless.goto to know all the options and values supported.
Also, any page.screenshot option is supported.
Additionally, you can setup:
type: string
| string[]
Click the DOM element matching the given CSS selector.
type: string
Capture the DOM element matching the given CSS selector. It will wait for the element to appear in the page and to be visible.
type: string
| string[]
Hide DOM elements matching the given CSS selectors.
Can be useful for cleaning up the page.
;(async () => {
const buffer = await browserless.screenshot(url.toString(), {
hide: ['.crisp-client', '#cookies-policy']
})
})()
This sets visibility: hidden
on the matched elements.
type: string
| string[]
Inject JavaScript modules into the page.
Accepts an array of inline code, absolute URLs, and local file paths (must have a .js
extension).
;(async () => {
const buffer = await browserless.screenshot(url.toString(), {
modules: ['https://cdn.jsdelivr.net/npm/@microlink/mql@0.3.12/src/browser.js', 'local-file.js', `document.body.style.backgroundColor = 'red`]
})
})()
type: string
| string[]
Same as the modules
option, but instead injects the code as <script>
instead of <script type="module">
. Prefer the modules
option whenever possible.
;(async () => {
const buffer = await browserless.screenshot(url.toString(), {
scripts: ['https://cdn.jsdelivr.net/npm/jquery@3.4.1/dist/jquery.min.js', 'local-file.js', `document.body.style.backgroundColor = 'red`]
})
})()
type: string
| string[]
Inject CSS styles into the page.
Accepts an array of inline code, absolute URLs, and local file paths (must have a .css
extension).
;(async () => {
const buffer = await browserless.screenshot(url.toString(), {
styles: ['https://cdn.jsdelivr.net/npm/hack@0.8.1/dist/dark.css', 'local-file.css', `body { background: red; }`, ``]
})
})()
type: string
| object
Scroll to the DOM element matching the given CSS selector.
type: object
After the screenshot has been taken, this option allows you to place the screenshot into a fancy overlay
You can configure the overlay specifying:
-
browser: It sets the browser image overlay to use, being
light
anddark
supported values. -
background: It sets the background to use, being supported to pass:
- An hexadecimal/rgb/rgba color code, eg.
#c1c1c1
. - A CSS gradient, eg.
linear-gradient(225deg, #FF057C 0%, #8D0B93 50%, #321575 100%)
- An image url, eg.
https://source.unsplash.com/random/1920x1080
.
- An hexadecimal/rgb/rgba color code, eg.
;(async () => {
const buffer = await browserless.screenshot(url.toString(), {
hide: ['.crisp-client', '#cookies-policy'],
overlay: {
browser: 'dark',
background: 'linear-gradient(45deg, rgba(255,18,223,1) 0%, rgba(69,59,128,1) 66%, rgba(69,59,128,1) 100%)'
}
})
})()
It has all the devices presets available, being possible to load viewport and user agents settings based on a device descriptor.
These devices are used for emulation purposes. It extends from puppeteer.devices.
Get a specific device descriptor settings by descriptor name.
It doesn't matter if device name is lower/upper case.
const browserless = require('browserless')
browserless.getDevice({ device: 'Macbook Pro 15' })
// {
// userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.89 Safari/537.36',
// viewport: {
// width: 1440,
// height: 900,
// deviceScaleFactor: 2,
// isMobile: false,
// hasTouch: false,
// isLandscape: false
// }
// }
The following methods are exposed to be used in scenarios where you need more granularity control and less magic.
It returns the internal browser instance used as singleton.
const browserless = require('browserless')
;(async () => {
const browserInstance = await browserless.browser
})()
It exposes an interface for creating your own evaluate function, passing you the page
and response
.
The fn
will receive page
and response
as arguments:
const browserless = require('browserless')()
const getUrlInfo = browserless.evaluate((page, response) => ({
statusCode: response.status(),
url: response.url(),
redirectUrls: response.request().redirectChain()
}))
;(async () => {
const url = 'https://example.com'
const info = await getUrlInfo(url)
console.log(info)
// {
// "statusCode": 200,
// "url": "https://example.com/",
// "redirectUrls": []
// }
})()
Note you don't need to close the page; It will be done under the hood.
Internally, the method performs a browserless.goto, being possible to pass extra arguments as second parameter:
const browserless = require('browserless')()
const getText = browserless.evaluate(
page => page.evaluate(() => document.body.innerText), {
waitUntil: 'domcontentloaded',
disableAnimations: false
})
;(async () => {
const url = 'https://example.com'
const text = await getText(url)
console.log(text)
})()
It performs a smart page.goto, using a builtin adblocker.
const browserless = require('browserless')
;(async () => {
const page = await browserless.page()
const { response, device } = await browserless.goto(page, { url: 'http://example.com' })
})()
Any option passed here will bypass to page.goto.
Additionally, you can setup:
type: boolean
default: true
It will be abort requests detected as ads.
type: string
default: 'macbook pro 13'
It specifies the device descriptor to use in order to retrieve userAgent
and viewport
Type: boolean
Default: true
Disable CSS animations and transitions.
Type: boolean
Default: false
When it's true
, it sets JavaScript as disabled on the current page.
type: object
An object containing additional HTTP headers to be sent with every request.
const browserless = require('browserless')
;(async () => {
const page = await browserless.page()
await browserless.goto(page, {
url: 'http://example.com',
headers: {
'user-agent': 'googlebot',
cookie: 'foo=bar; hello=world'
}
})
})()
type: string
default: 'screen'
Changes the CSS media type of the page using page.emulateMedia.
type: string
The target URL.
It will setup a custom viewport, using page.setViewport method.
type:string|function|number
default: 0
Wait a quantity of time, selector or function using page.waitFor.
It returns a standalone browser new page.
const browserless = require('browserless')
;(async () => {
const page = await browserless.page()
})()
browserless uses internally a singleton browser instance.
If you want to keep multiple browsers open, you can use @browserless/pool
package.
const createBrowserless = require('@browserless/pool')
const browserlessPool = createBrowserless({
max: 2, // max browsers to keep open
timeout: 30000 // max time a browser is consiedered fresh
})
You can still pass specific puppeteer options as second argument:
const createBrowserless = require('@browserless/pool')
const browserlessPool = createBrowserless({
max: 2, // max browsers to keep open
timeout: 30000 // max time a browser is consiedered fresh
}, {
ignoreHTTPSErrors: true,
args: [
'--disable-gpu',
'--single-process',
'--no-zygote',
'--no-sandbox',
'--hide-scrollbars'
]
})
After that, the API is the same than browserless:
browserlessPool
.screenshot('http://example.com', { device: 'iPhone 6' })
.then(buffer => {
console.log(`your screenshot is here!`)
})
Every time you call the pool, it handles acquire and release a new browser instance from the pool ✨.
browserless is internally divided into multiple packages for ensuring just use the mininum quantity of code necessary for your user case.
For testing different approach, we included a tiny benchmark tool called @browserless/benchmark
.
Q: Why use browserless over Puppeteer?
browserless not replace puppeteer, it complements. It's just a syntactic sugar layer over official Headless Chrome oriented for production scenarios.
Q: Why do you block ads scripts by default?
Headless navigation is expensive compared with just fetch the content from a website.
In order to speed up the process, we block ads scripts by default because they are so bloat.
Q: My output is different from the expected
Probably browserless was too smart and it blocked a request that you need.
You can active debug mode using DEBUG=browserless
environment variable in order to see what is happening behind the code:
DEBUG=browserless node index.js
Consider open an issue with the debug trace.
Q: Can I use browserless with my AWS Lambda like project?
Yes, check chrome-aws-lambda to setup AWS Lambda with a binary compatible.
browserless © Microlink, Released under the MIT License.
Authored and maintained by Kiko Beats with help from contributors.
The logo has been designed by xinh studio.
microlink.io · GitHub @MicrolinkHQ · Twitter @microlinkhq