Get all links from a HTML markup. It's based on W3C link checker.
$ npm install html-urls --save
const got = require('got')
const getLinks = require('html-urls')
;(async() => {
const url = process.argv[2]
if (!url) throw new TypeError('Need to provide an url as first argument.')
const {body: html} = await got(url)
const links = getLinks({html, url})
links.forEach(({ url, normalizedUrl }, index) => console.log(normalizedUrl))
// => [
// 'https://microlink.io/component---src-layouts-index-js-86b5f94dfa48cb04ae41.js',
// 'https://microlink.io/component---src-pages-index-js-a302027ab59365471b7d.js',
// 'https://microlink.io/path---index-709b6cf5b986a710cc3a.js',
// 'https://microlink.io/app-8b4269e1fadd08e6ea1e.js',
// 'https://microlink.io/commons-8b286eac293678e1c98c.js',
// 'https://microlink.io',
// ...
// ]
})()
See examples.
Type: string
Default: ''
The HTML markup.
Type: string
Default: ''
The URL associated with the HTML markup.
It is used for resolve relative links that can be present in the HTML markup.
Type: array
Default: []
A list of links to be excluded from the final output. It supports regex patterns.
See [matcher](https://github.com/sindresorhus/matcher#matcher-= for know more.
Type: boolean
Default: true`
Remove duplicated links detected over all the HTML tags.
html-urls © Kiko Beats, released under the MIT License.
Authored and maintained by Kiko Beats with help from contributors.
kikobeats.com · GitHub @Kiko Beats · Twitter @Kikobeats