Skip to content

v1.0.0-beta.3

Pre-release
Pre-release

Choose a tag to compare

@developer-rakeshpaul developer-rakeshpaul released this 02 Jan 12:02
· 15 commits to main since this release
e3e38e9

scrapex v1.0.0-beta.3

Stability and test coverage release with safer result merging and improved LLM/embedding flow ordering.

Highlights

  • LLM enhancements now run before embeddings, so summaries/entities are available for embedding inputs.
  • Safer merge behavior in extraction context prevents undefined values from overwriting prior results.
  • Expanded end-to-end coverage with local HTTP mocks and real-world fixtures.

Improvements

  • LLM enhancement + extraction flow ordered ahead of embeddings for better downstream inputs.
  • Merge logic filters undefined values to avoid accidental data loss.
  • E2E tests now cover:
    • Scraping with redirects, robots.txt handling, and non-HTML responses
    • RSS/Atom parsing and discovery utilities
    • URL utilities (tracking removal, protocol-relative URLs)
    • Markdown parsing behavior
    • LLM HTTP provider and Ollama embeddings via local mocks (no external deps)
  • Added realistic HTML/RSS fixtures to mirror production inputs.

Documentation

  • Updated embeddings, utilities, RSS parsing, and extractor docs for accuracy.

Installation

npm install scrapex@beta

Notes

  • Requires Node.js 20+.

Full Changelog: v1.0.0-beta.2...v1.0.0-beta.3