v1.0.0-beta.3
Pre-release
Pre-release
·
15 commits
to main
since this release
scrapex v1.0.0-beta.3
Stability and test coverage release with safer result merging and improved LLM/embedding flow ordering.
Highlights
- LLM enhancements now run before embeddings, so summaries/entities are available for embedding inputs.
- Safer merge behavior in extraction context prevents undefined values from overwriting prior results.
- Expanded end-to-end coverage with local HTTP mocks and real-world fixtures.
Improvements
- LLM enhancement + extraction flow ordered ahead of embeddings for better downstream inputs.
- Merge logic filters undefined values to avoid accidental data loss.
- E2E tests now cover:
- Scraping with redirects, robots.txt handling, and non-HTML responses
- RSS/Atom parsing and discovery utilities
- URL utilities (tracking removal, protocol-relative URLs)
- Markdown parsing behavior
- LLM HTTP provider and Ollama embeddings via local mocks (no external deps)
- Added realistic HTML/RSS fixtures to mirror production inputs.
Documentation
- Updated embeddings, utilities, RSS parsing, and extractor docs for accuracy.
Installation
npm install scrapex@betaNotes
- Requires Node.js 20+.
Full Changelog: v1.0.0-beta.2...v1.0.0-beta.3