Node.js backend for web scraping and automation, with scheduled jobs, session management, and a REST API.
- Scraping – Browser automation via Playwright and HTML parsing with Cheerio
- Scheduled jobs – Cron-based task scheduling
- Session management – Login and session handling for target sites
- REST API – Express API with validation (Joi) and optional API key auth
- Database – MySQL with Sequelize (e.g. scraping cache)
- Logging – Winston-based logging
- Node.js (v18+ recommended)
- MySQL
- (Optional) Playwright browsers:
npx playwright installif not already installed
npm installCopy your environment file and set variables (see Configuration):
cp .env.example .envConfigure via environment variables (e.g. in .env):
| Variable | Description |
|---|---|
SERVER_PORT |
API server port (default: 3000) |
TARGET_URL |
Base URL of the site to scrape |
OPENLANCE_USERNAME |
Username for target site login |
OPENLANCE_PASSWORD |
Password for target site login |
API_KEY |
Optional API key for protecting endpoints |
DB_HOST |
MySQL host (default: localhost) |
DB_PORT |
MySQL port (default: 3306) |
DB_NAME |
Database name (default: openlane_scraping) |
DB_USER |
Database user |
DB_PASSWORD |
Database password |
Development (with auto-reload):
npm run devProduction:
npm startThe server listens on http://127.0.0.1:<SERVER_PORT>. Use a reverse proxy (e.g. nginx) if you need external access.
Routes are mounted under /api. See routes/ for available endpoints (e.g. scraping and status).
ISC – see LICENSE.