Skip to content

night-watch-project/sponge

Repository files navigation

Sponge

Hassle-free web scraping service.

FEATURES

Core Features

  • Support client-side-rendered web pages
  • Auto extract metadata and article content
  • Extract DOM elements via CSS selectors
  • Domain blocking (when BLOCKLIST_URL environment variable provided)
  • HTTP proxy (when HTTP_PROXY environment variable provided)

Live Version Features

  • Bundled with a blocklist of over 57,000 adware and malware domains
  • Built-in user-agent pool
  • Built-in rotating proxies

INSTALLATION

Requirements

  • Node.js >= 14
  • Environment variables specified in .env.example

Instructions

Without Docker (dev environment)

$ npm i             # yarn install
$ npm run start:dev # yarn start:dev

With Docker (prod environment)

$ npm run docker:build:app  # yarn docker:build:app
$ npm run docker:start:prod # yarn docker:start:prod

USAGE

Start the app and go to /docs for interactive API documentation.

CHANGELOG

Read more here.

TODO

Read more here.