Skip to content

v0.5.6

Choose a tag to compare

@Oaklight Oaklight released this 04 Mar 23:24
· 189 commits to master since this release

What's Changed

✨ New Features

  • Web Fetch Improvements
    • Add Cloudflare content negotiation strategy for markdown extraction, increasing default timeout
    • Add strategy tracking logs to _extract() for better observability and debugging
    • Improve Jina Reader fallback: fix headers, use POST with JSON body and structured JSON response parsing
    • Add _is_content_sufficient() to evaluate BeautifulSoup content quality (minimum length threshold + SPA shell indicator detection)
    • Improve _extract() fallback logic: low-quality BS4 content triggers Jina Reader; if Jina also fails, fall back to BS4 result
    • Enhance logging with strategy reasons and content quality metrics
    • Add comprehensive unit tests (43 test cases)

Full Changelog: v0.5.5...v0.5.6