fast-wasm-scraper

A fast alternative for JavaScript-based scraping tools, intended for both frontend and backend. fast-wasm-scraper is practically a wrapper for scraper (intended for parsing HTML and querying with CSS selectors) -- which compiles to WebAssembly.

Installation

$ yarn add fast-wasm-scraper

Examples

Loading

const { Document } = require('fast-wasm-scraper');
const doc = new Document('<html>Hello world!</html>');

doc.root.inner_html;
// => <html>Hello world!</html>

Querying

const { Document } = require('fast-wasm-scraper');
const html = `
<html>
  <body>
    <div>
      <ul>
        <li>One</li>
        <li>Two</li>
        <li>Three</li>
      </ul>
    </div>
  </body>
</html>
`;
const doc = new Document(html);

doc.root.query('li');
// => [
//      Element { name: 'li', inner_html: 'One',   ... },
//      Element { name: 'li', inner_html: 'Two',   ... },
//      Element { name: 'li', inner_html: 'Three', ... },
//    ]

Types

Document

property	type	Description
`constructor`	`(html: string) => Document`	`Takes the raw html as a string and returns a new Document object`
`root`	`Element`	`Returns the root element of the Document`

Element

property	type	Description
`name`	`string`	`Returns the name of the element as a string, ex: 'div'`
`html`	`string`	`Returns a string representation of this Element and it's descendants`
`inner_html`	`string`	`Returns the inner content of this Element as a string`
`attributes`	`Map<string, string>`	`Returns the attributes as a Map<string, string>`
`query`	`(query_str: string) => Array<Element>`	`Returns an array of Elements from the resulting query`
`text`	`() => Array<string>`	`Returns an array of strings from descending text nodes`

Benchmark

	fast-wasm-scraper	cheerio	JsDOM
Runtime	WebAssembly (from Rust)	JavaScript	JavaScript

Parsing, and querying with `li`, for a document with 100 list items

Sample size (#)	87	74	52
Speed (ops/s)	539 (+/- 1.37%)	318 (+/- 4.75%)	38.2 (+/- 11.25%)
Speedup	1.69x compared to cheerio, and 14x to JsDOM	-	-

This benchmark was conducted on a rather modest dual core CPU and Node.js v.12.20.0. You can also run the benchmarks locally by cloning the GitHub repository.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
benchmark		benchmark
src		src
tests		tests
.cargo-ok		.cargo-ok
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
build.sh		build.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

fast-wasm-scraper

Installation

Examples

Loading

Querying

Types

Document

Element

Benchmark

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

julleboi/fast-wasm-scraper

Folders and files

Latest commit

History

Repository files navigation

fast-wasm-scraper

Installation

Examples

Loading

Querying

Types

Document

Element

Benchmark

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages