Skip to content

Latest commit

 

History

History
90 lines (60 loc) · 2.9 KB

README.md

File metadata and controls

90 lines (60 loc) · 2.9 KB

html-urls

Last version Build Status Coverage Status Dependency status Dev Dependencies Status NPM Status Donate

Get all links from a HTML markup. It's based on W3C link checker.

Install

$ npm install html-urls --save

Usage

const got = require('got')
const getLinks = require('html-urls')

;(async() => {
  const url = process.argv[2]
  if (!url) throw new TypeError('Need to provide an url as first argument.')
  const {body: html} = await got(url)
  const links = getLinks({html, url})

  links.forEach(({ url, normalizedUrl }, index) => console.log(normalizedUrl))

  // => [
  //   'https://microlink.io/component---src-layouts-index-js-86b5f94dfa48cb04ae41.js',
  //   'https://microlink.io/component---src-pages-index-js-a302027ab59365471b7d.js',
  //   'https://microlink.io/path---index-709b6cf5b986a710cc3a.js',
  //   'https://microlink.io/app-8b4269e1fadd08e6ea1e.js',
  //   'https://microlink.io/commons-8b286eac293678e1c98c.js',
  //   'https://microlink.io',
  //   ...
  // ]
})()

See examples.

API

htmlUrls([options])

options

html

Type: string
Default: ''

The HTML markup.

url

Type: string
Default: ''

The URL associated with the HTML markup.

It is used for resolve relative links that can be present in the HTML markup.

whitelist

Type: array
Default: []

A list of links to be excluded from the final output. It supports regex patterns.

See [matcher](https://github.com/sindresorhus/matcher#matcher-= for know more.

removeDuplicates

Type: boolean
Default: true`

Remove duplicated links detected over all the HTML tags.

License

html-urls © Kiko Beats, released under the MIT License.
Authored and maintained by Kiko Beats with help from contributors.

kikobeats.com · GitHub @Kiko Beats · Twitter @Kikobeats