allignement by VinciGit00 · Pull Request #1066 · ScrapeGraphAI/Scrapegraph-ai

VinciGit00 · 2026-04-19T08:01:58Z

No description provided.

Update all SDK usage to match the new v2 API from ScrapeGraphAI/scrapegraph-py#82: - smartscraper() → extract(url=, prompt=) - searchscraper() → search(query=) - markdownify() → scrape(url=) - Bump dependency to scrapegraph-py>=2.0.0 BREAKING CHANGE: requires scrapegraph-py v2.0.0+ Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…eeded) Closes #1055 Plasmate (https://github.com/plasmate-labs/plasmate) is an open-source Rust browser engine that outputs Structured Object Model (SOM) instead of raw HTML. It requires no Chrome process, uses ~64MB RAM per session vs ~300MB, and delivers 10-100x fewer tokens per page. Changes: - Add scrapegraphai/docloaders/plasmate.py: PlasmateLoader - Implements BaseLoader (lazy_load + alazy_load) - Calls plasmate binary via subprocess (pip install plasmate) - Supports output_format: 'text' (default), 'som', 'markdown', 'links' - Supports --selector, --header, --timeout flags - Optional fallback_to_chrome=True for JS-heavy SPAs - Async-safe: runs subprocess in executor thread pool - Update scrapegraphai/docloaders/__init__.py: export PlasmateLoader - Update scrapegraphai/nodes/fetch_node.py: support plasmate config dict in FetchNode (alongside browser_base and scrape_do) - Add tests/test_plasmate.py: 25 unit tests (init, cmd building, lazy_load, alazy_load, fallback, error handling) Usage: from scrapegraphai.docloaders import PlasmateLoader loader = PlasmateLoader( urls=['https://docs.python.org/3/library/json.html'], output_format='text', timeout=30, fallback_to_chrome=True, # optional: retry with Chrome for SPAs ) docs = loader.load() # Or via FetchNode config: graph_config = { 'plasmate': { 'output_format': 'text', 'timeout': 30, 'fallback_to_chrome': False, } }

feat: add PlasmateLoader as lightweight scraping backend (no Chrome needed)

## [1.76.0](v1.75.1...v1.76.0) (2026-04-09) ### Features * add PlasmateLoader as lightweight scraping backend (no Chrome needed) ([9dd1fb5](9dd1fb5)), closes [#1055](#1055) ### CI * reduce GitHub Actions costs by ~85% on PRs ([403080a](403080a))

- Pass output_schema to extract() so Pydantic schemas are forwarded to the v2 API - Use context manager pattern (with Client(...) as client) for proper resource cleanup - Simplify examples to match the v2 SDK style from scrapegraph-py - Remove unused sgai_logger import (v2 client handles its own logging) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Support both the v2 Client API (PR #82) and the newer ScrapeGraphAI API (PR #84) which uses Pydantic request models and ApiResult[T] wrappers. - Add scrapegraph_py_compat helper with runtime API detection - Route smart_scraper_graph through the compat layer - Add v3-style examples for extract, search, and scrape Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

scrapegraph-py 2.0.0 requires Python >=3.12, so bump the project's requires-python to match. Simplify the test workflow to a single unit-test job on Python 3.12 / ubuntu-latest. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Removed CodeQL badge from the README.

Removed the hero image section from the README.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ph-py-v2 feat!: migrate to scrapegraph-py v2 API surface

+    cmd = loader._build_cmd("https://example.com")
+    assert "plasmate" in cmd[0]
+    assert "fetch" in cmd
+    assert "https://example.com" in cmd


+    docs = asyncio.run(run())
+    assert len(docs) == 2
+    sources = {d.metadata["source"] for d in docs}
+    assert "https://a.com" in sources


+    assert len(docs) == 2
+    sources = {d.metadata["source"] for d in docs}
+    assert "https://a.com" in sources
+    assert "https://b.com" in sources


+    name: Unit Tests
    runs-on: ubuntu-latest

-    strategy:
-      fail-fast: false
-      matrix:
-        test-group: [smart-scraper, multi-graph, file-formats]
-
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
-          python-version: '3.11'
+          python-version: '3.12'

      - name: Install uv
        uses: astral-sh/setup-uv@v4

      - name: Install dependencies
-        run: |
-          uv sync
+        run: uv sync

      - name: Install Playwright browsers
-        run: |
-          uv run playwright install chromium
-
-      - name: Run integration tests
-        env:
-          OPENAI_APIKEY: ${{ secrets.OPENAI_APIKEY }}
-          ANTHROPIC_APIKEY: ${{ secrets.ANTHROPIC_APIKEY }}
-          GROQ_APIKEY: ${{ secrets.GROQ_APIKEY }}
-        run: |
-          uv run pytest tests/integration/ -m integration --integration -v
-
-      - name: Upload test results
-        uses: actions/upload-artifact@v4
-        if: always()
-        with:
-          name: integration-test-results-${{ matrix.test-group }}
-          path: |
-            htmlcov/
-            benchmark_results/
-
-  benchmark-tests:
-    name: Performance Benchmarks
-    runs-on: ubuntu-latest
-    if: github.event_name == 'push' && github.ref == 'refs/heads/main'
-
-    steps:
-      - name: Checkout code
-        uses: actions/checkout@v4
-
-      - name: Set up Python
-        uses: actions/setup-python@v5
-        with:
-          python-version: '3.11'
-
-      - name: Install uv
-        uses: astral-sh/setup-uv@v4
-
-      - name: Install dependencies
-        run: |
-          uv sync
-
-      - name: Run performance benchmarks
-        env:
-          OPENAI_APIKEY: ${{ secrets.OPENAI_APIKEY }}
-        run: |
-          uv run pytest tests/ -m benchmark --benchmark -v
-
-      - name: Upload benchmark results
-        uses: actions/upload-artifact@v4
-        with:
-          name: benchmark-results
-          path: benchmark_results/
-
-      - name: Compare with baseline
-        if: github.event_name == 'pull_request'
-        run: |
-          # Download baseline from main branch
-          # Compare and comment on PR if regression detected
-          echo "Benchmark comparison would run here"
-
-  code-quality:
-    name: Code Quality Checks
-    runs-on: ubuntu-latest
-    if: github.event_name == 'push'
-
-    steps:
-      - name: Checkout code
-        uses: actions/checkout@v4
-
-      - name: Set up Python
-        uses: actions/setup-python@v5
-        with:
-          python-version: '3.11'
-
-      - name: Install uv
-        uses: astral-sh/setup-uv@v4
-
-      - name: Install dependencies
-        run: |
-          uv sync
-
-      - name: Run Ruff linting
-        run: |
-          uv run ruff check scrapegraphai/ tests/
-
-      - name: Run Black formatting check
-        run: |
-          uv run black --check scrapegraphai/ tests/
-
-      - name: Run isort check
-        run: |
-          uv run isort --check-only scrapegraphai/ tests/
-
-      - name: Run type checking with mypy
-        run: |
-          uv run mypy scrapegraphai/
-        continue-on-error: true
+        run: uv run playwright install chromium

-  test-coverage-report:
-    name: Test Coverage Report
-    needs: [unit-tests, integration-tests]
-    runs-on: ubuntu-latest
-    if: always()
-
-    steps:
-      - name: Checkout code
-        uses: actions/checkout@v4
-
-      - name: Download coverage artifacts
-        uses: actions/download-artifact@v4
-
-      - name: Generate coverage report
-        run: |
-          echo "Coverage report generation would run here"
-
-      - name: Comment coverage on PR
-        if: github.event_name == 'pull_request'
-        uses: py-cov-action/python-coverage-comment-action@v3
-        with:
-          GITHUB_TOKEN: ${{ github.token }}
-
-  test-summary:
-    name: Test Summary
-    needs: [unit-tests, integration-tests, code-quality]
-    runs-on: ubuntu-latest
-    if: always()
-
-    steps:
-      - name: Check test results
-        run: |
-          echo "All test jobs completed"
-          echo "Unit tests: ${{ needs.unit-tests.result }}"
-          echo "Integration tests: ${{ needs.integration-tests.result }}"
-          echo "Code quality: ${{ needs.code-quality.result }}"
+      - name: Run unit tests
+        run: uv run pytest tests/ -m "unit or not integration"


## [2.0.0](v1.76.0...v2.0.0) (2026-04-19) ### ⚠ BREAKING CHANGES * requires scrapegraph-py v2.0.0+ Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> ### Features * add scrapegraph-py PR [#84](#84) SDK compatibility ([e8b2a28](e8b2a28)), closes [#82](#82) * align with scrapegraph-py v2 API surface from PR [#82](#82) ([c0f5fd5](c0f5fd5)) * migrate to scrapegraph-py v2 API surface ([fd23bb0](fd23bb0)), closes [ScrapeGraphAI/scrapegraph-py#82](ScrapeGraphAI/scrapegraph-py#82) ### CI * bump min Python to 3.12 and trim test suite ([5fda03f](5fda03f))

github-actions · 2026-04-19T08:04:39Z

🎉 This PR is included in version 2.1.0-beta.1 🎉

The release is available on:

v2.1.0-beta.1
GitHub release

Your semantic-release bot 📦🚀

VinciGit00 and others added 13 commits March 31, 2026 09:41

Merge pull request #1062 from dbhurley/feat/plasmate-loader

1238738

feat: add PlasmateLoader as lightweight scraping backend (no Chrome needed)

ci(release): 1.76.0 [skip ci]

e56a156

## [1.76.0](v1.75.1...v1.76.0) (2026-04-09) ### Features * add PlasmateLoader as lightweight scraping backend (no Chrome needed) ([9dd1fb5](9dd1fb5)), closes [#1055](#1055) ### CI * reduce GitHub Actions costs by ~85% on PRs ([403080a](403080a))

update images

69035c6

Update README.md

3958e3d

Remove CodeQL badge from README

f4a4512

Removed CodeQL badge from the README.

Remove hero image from README

629fabc

Removed the hero image section from the README.

Add banner image to README

39b2152

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Merge pull request #1058 from ScrapeGraphAI/feat/migrate-to-scrapegra…

90de842

…ph-py-v2 feat!: migrate to scrapegraph-py v2 API surface

github-advanced-security AI found potential problems Apr 19, 2026

View reviewed changes

dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Apr 19, 2026

dosubot bot added the enhancement New feature or request label Apr 19, 2026

VinciGit00 merged commit 1bc4c49 into pre/beta Apr 19, 2026
6 of 7 checks passed

github-actions bot added the released on @dev label Apr 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

allignement#1066

allignement#1066
VinciGit00 merged 14 commits intopre/betafrom
main

VinciGit00 commented Apr 19, 2026

Uh oh!

Uh oh!

github-actions bot commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

VinciGit00 commented Apr 19, 2026

Uh oh!

Uh oh!

github-actions bot commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants