Merged
Conversation
Update all SDK usage to match the new v2 API from ScrapeGraphAI/scrapegraph-py#82: - smartscraper() → extract(url=, prompt=) - searchscraper() → search(query=) - markdownify() → scrape(url=) - Bump dependency to scrapegraph-py>=2.0.0 BREAKING CHANGE: requires scrapegraph-py v2.0.0+ Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…eeded) Closes #1055 Plasmate (https://github.com/plasmate-labs/plasmate) is an open-source Rust browser engine that outputs Structured Object Model (SOM) instead of raw HTML. It requires no Chrome process, uses ~64MB RAM per session vs ~300MB, and delivers 10-100x fewer tokens per page. Changes: - Add scrapegraphai/docloaders/plasmate.py: PlasmateLoader - Implements BaseLoader (lazy_load + alazy_load) - Calls plasmate binary via subprocess (pip install plasmate) - Supports output_format: 'text' (default), 'som', 'markdown', 'links' - Supports --selector, --header, --timeout flags - Optional fallback_to_chrome=True for JS-heavy SPAs - Async-safe: runs subprocess in executor thread pool - Update scrapegraphai/docloaders/__init__.py: export PlasmateLoader - Update scrapegraphai/nodes/fetch_node.py: support plasmate config dict in FetchNode (alongside browser_base and scrape_do) - Add tests/test_plasmate.py: 25 unit tests (init, cmd building, lazy_load, alazy_load, fallback, error handling) Usage: from scrapegraphai.docloaders import PlasmateLoader loader = PlasmateLoader( urls=['https://docs.python.org/3/library/json.html'], output_format='text', timeout=30, fallback_to_chrome=True, # optional: retry with Chrome for SPAs ) docs = loader.load() # Or via FetchNode config: graph_config = { 'plasmate': { 'output_format': 'text', 'timeout': 30, 'fallback_to_chrome': False, } }
feat: add PlasmateLoader as lightweight scraping backend (no Chrome needed)
- Pass output_schema to extract() so Pydantic schemas are forwarded to the v2 API - Use context manager pattern (with Client(...) as client) for proper resource cleanup - Simplify examples to match the v2 SDK style from scrapegraph-py - Remove unused sgai_logger import (v2 client handles its own logging) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Support both the v2 Client API (PR #82) and the newer ScrapeGraphAI API (PR #84) which uses Pydantic request models and ApiResult[T] wrappers. - Add scrapegraph_py_compat helper with runtime API detection - Route smart_scraper_graph through the compat layer - Add v3-style examples for extract, search, and scrape Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
scrapegraph-py 2.0.0 requires Python >=3.12, so bump the project's requires-python to match. Simplify the test workflow to a single unit-test job on Python 3.12 / ubuntu-latest. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Removed CodeQL badge from the README.
Removed the hero image section from the README.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ph-py-v2 feat!: migrate to scrapegraph-py v2 API surface
| cmd = loader._build_cmd("https://example.com") | ||
| assert "plasmate" in cmd[0] | ||
| assert "fetch" in cmd | ||
| assert "https://example.com" in cmd |
| docs = asyncio.run(run()) | ||
| assert len(docs) == 2 | ||
| sources = {d.metadata["source"] for d in docs} | ||
| assert "https://a.com" in sources |
| assert len(docs) == 2 | ||
| sources = {d.metadata["source"] for d in docs} | ||
| assert "https://a.com" in sources | ||
| assert "https://b.com" in sources |
Comment on lines
+12
to
+34
| name: Unit Tests | ||
| runs-on: ubuntu-latest | ||
|
|
||
| strategy: | ||
| fail-fast: false | ||
| matrix: | ||
| test-group: [smart-scraper, multi-graph, file-formats] | ||
|
|
||
| steps: | ||
| - name: Checkout code | ||
| uses: actions/checkout@v4 | ||
|
|
||
| - name: Set up Python | ||
| uses: actions/setup-python@v5 | ||
| with: | ||
| python-version: '3.11' | ||
| python-version: '3.12' | ||
|
|
||
| - name: Install uv | ||
| uses: astral-sh/setup-uv@v4 | ||
|
|
||
| - name: Install dependencies | ||
| run: | | ||
| uv sync | ||
| run: uv sync | ||
|
|
||
| - name: Install Playwright browsers | ||
| run: | | ||
| uv run playwright install chromium | ||
|
|
||
| - name: Run integration tests | ||
| env: | ||
| OPENAI_APIKEY: ${{ secrets.OPENAI_APIKEY }} | ||
| ANTHROPIC_APIKEY: ${{ secrets.ANTHROPIC_APIKEY }} | ||
| GROQ_APIKEY: ${{ secrets.GROQ_APIKEY }} | ||
| run: | | ||
| uv run pytest tests/integration/ -m integration --integration -v | ||
|
|
||
| - name: Upload test results | ||
| uses: actions/upload-artifact@v4 | ||
| if: always() | ||
| with: | ||
| name: integration-test-results-${{ matrix.test-group }} | ||
| path: | | ||
| htmlcov/ | ||
| benchmark_results/ | ||
|
|
||
| benchmark-tests: | ||
| name: Performance Benchmarks | ||
| runs-on: ubuntu-latest | ||
| if: github.event_name == 'push' && github.ref == 'refs/heads/main' | ||
|
|
||
| steps: | ||
| - name: Checkout code | ||
| uses: actions/checkout@v4 | ||
|
|
||
| - name: Set up Python | ||
| uses: actions/setup-python@v5 | ||
| with: | ||
| python-version: '3.11' | ||
|
|
||
| - name: Install uv | ||
| uses: astral-sh/setup-uv@v4 | ||
|
|
||
| - name: Install dependencies | ||
| run: | | ||
| uv sync | ||
|
|
||
| - name: Run performance benchmarks | ||
| env: | ||
| OPENAI_APIKEY: ${{ secrets.OPENAI_APIKEY }} | ||
| run: | | ||
| uv run pytest tests/ -m benchmark --benchmark -v | ||
|
|
||
| - name: Upload benchmark results | ||
| uses: actions/upload-artifact@v4 | ||
| with: | ||
| name: benchmark-results | ||
| path: benchmark_results/ | ||
|
|
||
| - name: Compare with baseline | ||
| if: github.event_name == 'pull_request' | ||
| run: | | ||
| # Download baseline from main branch | ||
| # Compare and comment on PR if regression detected | ||
| echo "Benchmark comparison would run here" | ||
|
|
||
| code-quality: | ||
| name: Code Quality Checks | ||
| runs-on: ubuntu-latest | ||
| if: github.event_name == 'push' | ||
|
|
||
| steps: | ||
| - name: Checkout code | ||
| uses: actions/checkout@v4 | ||
|
|
||
| - name: Set up Python | ||
| uses: actions/setup-python@v5 | ||
| with: | ||
| python-version: '3.11' | ||
|
|
||
| - name: Install uv | ||
| uses: astral-sh/setup-uv@v4 | ||
|
|
||
| - name: Install dependencies | ||
| run: | | ||
| uv sync | ||
|
|
||
| - name: Run Ruff linting | ||
| run: | | ||
| uv run ruff check scrapegraphai/ tests/ | ||
|
|
||
| - name: Run Black formatting check | ||
| run: | | ||
| uv run black --check scrapegraphai/ tests/ | ||
|
|
||
| - name: Run isort check | ||
| run: | | ||
| uv run isort --check-only scrapegraphai/ tests/ | ||
|
|
||
| - name: Run type checking with mypy | ||
| run: | | ||
| uv run mypy scrapegraphai/ | ||
| continue-on-error: true | ||
| run: uv run playwright install chromium | ||
|
|
||
| test-coverage-report: | ||
| name: Test Coverage Report | ||
| needs: [unit-tests, integration-tests] | ||
| runs-on: ubuntu-latest | ||
| if: always() | ||
|
|
||
| steps: | ||
| - name: Checkout code | ||
| uses: actions/checkout@v4 | ||
|
|
||
| - name: Download coverage artifacts | ||
| uses: actions/download-artifact@v4 | ||
|
|
||
| - name: Generate coverage report | ||
| run: | | ||
| echo "Coverage report generation would run here" | ||
|
|
||
| - name: Comment coverage on PR | ||
| if: github.event_name == 'pull_request' | ||
| uses: py-cov-action/python-coverage-comment-action@v3 | ||
| with: | ||
| GITHUB_TOKEN: ${{ github.token }} | ||
|
|
||
| test-summary: | ||
| name: Test Summary | ||
| needs: [unit-tests, integration-tests, code-quality] | ||
| runs-on: ubuntu-latest | ||
| if: always() | ||
|
|
||
| steps: | ||
| - name: Check test results | ||
| run: | | ||
| echo "All test jobs completed" | ||
| echo "Unit tests: ${{ needs.unit-tests.result }}" | ||
| echo "Integration tests: ${{ needs.integration-tests.result }}" | ||
| echo "Code quality: ${{ needs.code-quality.result }}" | ||
| - name: Run unit tests | ||
| run: uv run pytest tests/ -m "unit or not integration" |
## [2.0.0](v1.76.0...v2.0.0) (2026-04-19) ### ⚠ BREAKING CHANGES * requires scrapegraph-py v2.0.0+ Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> ### Features * add scrapegraph-py PR [#84](#84) SDK compatibility ([e8b2a28](e8b2a28)), closes [#82](#82) * align with scrapegraph-py v2 API surface from PR [#82](#82) ([c0f5fd5](c0f5fd5)) * migrate to scrapegraph-py v2 API surface ([fd23bb0](fd23bb0)), closes [ScrapeGraphAI/scrapegraph-py#82](ScrapeGraphAI/scrapegraph-py#82) ### CI * bump min Python to 3.12 and trim test suite ([5fda03f](5fda03f))
|
🎉 This PR is included in version 2.1.0-beta.1 🎉 The release is available on:
Your semantic-release bot 📦🚀 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.