Get all urls from a HTML markup
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
__snapshots__
src
test
.bumpedrc
.editorconfig
.gitattributes
.gitignore
.npmrc
.travis.yml
CHANGELOG.md
LICENSE
README.md
example.js
package.json

README.md

html-urls

Last version Build Status Coverage Status Dependency status Dev Dependencies Status NPM Status Donate

Get all links from a HTML markup. It's based on W3C link checker.

Install

$ npm install html-urls --save

Usage

const got = require('got')
const htmlUrls = require('html-urls')

;(async () => {
  const url = process.argv[2]
  if (!url) throw new TypeError('Need to provide an url as first argument.')
  const { body: html } = await got(url)
  const links = htmlUrls({ html, url })

  links.forEach(({ url, normalizedUrl }, index) => console.log(normalizedUrl))

  // => [
  //   'https://microlink.io/component---src-layouts-index-js-86b5f94dfa48cb04ae41.js',
  //   'https://microlink.io/component---src-pages-index-js-a302027ab59365471b7d.js',
  //   'https://microlink.io/path---index-709b6cf5b986a710cc3a.js',
  //   'https://microlink.io/app-8b4269e1fadd08e6ea1e.js',
  //   'https://microlink.io/commons-8b286eac293678e1c98c.js',
  //   'https://microlink.io',
  //   ...
  // ]
})()

See examples.

API

htmlUrls([options])

options

html

Type: string
Default: ''

The HTML markup.

url

Type: string
Default: ''

The URL associated with the HTML markup.

It is used for resolve relative links that can be present in the HTML markup.

whitelist

Type: array
Default: []

A list of links to be excluded from the final output. It supports regex patterns.

See [matcher](https://github.com/sindresorhus/matcher#matcher-= for know more.

removeDuplicates

Type: boolean
Default: true`

Remove duplicated links detected over all the HTML tags.

License

html-urls © Kiko Beats, released under the MIT License.
Authored and maintained by Kiko Beats with help from contributors.

kikobeats.com · GitHub @Kiko Beats · Twitter @Kikobeats