feat!: migrate to scrapegraph-py v2 API surface#1058
Merged
VinciGit00 merged 5 commits intomainfrom Apr 19, 2026
Merged
Conversation
Update all SDK usage to match the new v2 API from ScrapeGraphAI/scrapegraph-py#82: - smartscraper() → extract(url=, prompt=) - searchscraper() → search(query=) - markdownify() → scrape(url=) - Bump dependency to scrapegraph-py>=2.0.0 BREAKING CHANGE: requires scrapegraph-py v2.0.0+ Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Dependency ReviewThe following issues were found:
Snapshot WarningsEnsure that dependencies are being submitted on PR branches and consider enabling retry-on-snapshot-warnings. See the documentation for more information and troubleshooting advice. License Issuespyproject.toml
OpenSSF Scorecard
Scanned Files
|
- Pass output_schema to extract() so Pydantic schemas are forwarded to the v2 API - Use context manager pattern (with Client(...) as client) for proper resource cleanup - Simplify examples to match the v2 SDK style from scrapegraph-py - Remove unused sgai_logger import (v2 client handles its own logging) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Support both the v2 Client API (PR #82) and the newer ScrapeGraphAI API (PR #84) which uses Pydantic request models and ApiResult[T] wrappers. - Add scrapegraph_py_compat helper with runtime API detection - Route smart_scraper_graph through the compat layer - Add v3-style examples for extract, search, and scrape Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
scrapegraph-py 2.0.0 requires Python >=3.12, so bump the project's requires-python to match. Simplify the test workflow to a single unit-test job on Python 3.12 / ubuntu-latest. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment on lines
+12
to
+34
| name: Unit Tests | ||
| runs-on: ubuntu-latest | ||
|
|
||
| strategy: | ||
| fail-fast: false | ||
| matrix: | ||
| test-group: [smart-scraper, multi-graph, file-formats] | ||
|
|
||
| steps: | ||
| - name: Checkout code | ||
| uses: actions/checkout@v4 | ||
|
|
||
| - name: Set up Python | ||
| uses: actions/setup-python@v5 | ||
| with: | ||
| python-version: '3.11' | ||
| python-version: '3.12' | ||
|
|
||
| - name: Install uv | ||
| uses: astral-sh/setup-uv@v4 | ||
|
|
||
| - name: Install dependencies | ||
| run: | | ||
| uv sync | ||
| run: uv sync | ||
|
|
||
| - name: Install Playwright browsers | ||
| run: | | ||
| uv run playwright install chromium | ||
|
|
||
| - name: Run integration tests | ||
| env: | ||
| OPENAI_APIKEY: ${{ secrets.OPENAI_APIKEY }} | ||
| ANTHROPIC_APIKEY: ${{ secrets.ANTHROPIC_APIKEY }} | ||
| GROQ_APIKEY: ${{ secrets.GROQ_APIKEY }} | ||
| run: | | ||
| uv run pytest tests/integration/ -m integration --integration -v | ||
|
|
||
| - name: Upload test results | ||
| uses: actions/upload-artifact@v4 | ||
| if: always() | ||
| with: | ||
| name: integration-test-results-${{ matrix.test-group }} | ||
| path: | | ||
| htmlcov/ | ||
| benchmark_results/ | ||
|
|
||
| benchmark-tests: | ||
| name: Performance Benchmarks | ||
| runs-on: ubuntu-latest | ||
| if: github.event_name == 'push' && github.ref == 'refs/heads/main' | ||
|
|
||
| steps: | ||
| - name: Checkout code | ||
| uses: actions/checkout@v4 | ||
|
|
||
| - name: Set up Python | ||
| uses: actions/setup-python@v5 | ||
| with: | ||
| python-version: '3.11' | ||
|
|
||
| - name: Install uv | ||
| uses: astral-sh/setup-uv@v4 | ||
|
|
||
| - name: Install dependencies | ||
| run: | | ||
| uv sync | ||
|
|
||
| - name: Run performance benchmarks | ||
| env: | ||
| OPENAI_APIKEY: ${{ secrets.OPENAI_APIKEY }} | ||
| run: | | ||
| uv run pytest tests/ -m benchmark --benchmark -v | ||
|
|
||
| - name: Upload benchmark results | ||
| uses: actions/upload-artifact@v4 | ||
| with: | ||
| name: benchmark-results | ||
| path: benchmark_results/ | ||
|
|
||
| - name: Compare with baseline | ||
| if: github.event_name == 'pull_request' | ||
| run: | | ||
| # Download baseline from main branch | ||
| # Compare and comment on PR if regression detected | ||
| echo "Benchmark comparison would run here" | ||
|
|
||
| code-quality: | ||
| name: Code Quality Checks | ||
| runs-on: ubuntu-latest | ||
| if: github.event_name == 'push' | ||
|
|
||
| steps: | ||
| - name: Checkout code | ||
| uses: actions/checkout@v4 | ||
|
|
||
| - name: Set up Python | ||
| uses: actions/setup-python@v5 | ||
| with: | ||
| python-version: '3.11' | ||
|
|
||
| - name: Install uv | ||
| uses: astral-sh/setup-uv@v4 | ||
|
|
||
| - name: Install dependencies | ||
| run: | | ||
| uv sync | ||
|
|
||
| - name: Run Ruff linting | ||
| run: | | ||
| uv run ruff check scrapegraphai/ tests/ | ||
|
|
||
| - name: Run Black formatting check | ||
| run: | | ||
| uv run black --check scrapegraphai/ tests/ | ||
|
|
||
| - name: Run isort check | ||
| run: | | ||
| uv run isort --check-only scrapegraphai/ tests/ | ||
|
|
||
| - name: Run type checking with mypy | ||
| run: | | ||
| uv run mypy scrapegraphai/ | ||
| continue-on-error: true | ||
| run: uv run playwright install chromium | ||
|
|
||
| test-coverage-report: | ||
| name: Test Coverage Report | ||
| needs: [unit-tests, integration-tests] | ||
| runs-on: ubuntu-latest | ||
| if: always() | ||
|
|
||
| steps: | ||
| - name: Checkout code | ||
| uses: actions/checkout@v4 | ||
|
|
||
| - name: Download coverage artifacts | ||
| uses: actions/download-artifact@v4 | ||
|
|
||
| - name: Generate coverage report | ||
| run: | | ||
| echo "Coverage report generation would run here" | ||
|
|
||
| - name: Comment coverage on PR | ||
| if: github.event_name == 'pull_request' | ||
| uses: py-cov-action/python-coverage-comment-action@v3 | ||
| with: | ||
| GITHUB_TOKEN: ${{ github.token }} | ||
|
|
||
| test-summary: | ||
| name: Test Summary | ||
| needs: [unit-tests, integration-tests, code-quality] | ||
| runs-on: ubuntu-latest | ||
| if: always() | ||
|
|
||
| steps: | ||
| - name: Check test results | ||
| run: | | ||
| echo "All test jobs completed" | ||
| echo "Unit tests: ${{ needs.unit-tests.result }}" | ||
| echo "Integration tests: ${{ needs.integration-tests.result }}" | ||
| echo "Code quality: ${{ needs.code-quality.result }}" | ||
| - name: Run unit tests | ||
| run: uv run pytest tests/ -m "unit or not integration" |
|
🎉 This PR is included in version 2.0.0 🎉 The release is available on:
Your semantic-release bot 📦🚀 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
scrapegraph-pySDK usage to the new v2 API surface (see feat!: migrate Python SDK to v2 API surface scrapegraph-py#82)scrapegraph-py>=1.44.0to>=2.0.0SmartScraperGraphand all 3 example scriptsoutput_schematoextract()so Pydantic schemas are forwarded to the v2 APIwith Client(...) as client) for proper resource cleanupAPI mapping
smartscraper(website_url=, user_prompt=)extract(url=, prompt=, output_schema=)/api/v2/extractsearchscraper(user_prompt=)search(query=)/api/v2/searchmarkdownify(website_url=)scrape(url=)/api/v2/scrapeget_credits()credits()/api/v2/creditsgenerate_schema()crawl()/get_crawl()crawl.start()/crawl.status()/.stop()/.resume()/api/v2/crawlmonitor.create()/.list()/.pause()/.resume()/.delete()/api/v2/monitorhistory()/api/v2/historyOther v2 changes (from scrapegraph-py)
Authorization: BearerandSGAI-APIKEYheadersFetchConfig(withFetchModeenum: auto/fast/js/direct+stealth/js+stealth),LlmConfigscrape()supports format: markdown, html, screenshot, brandingextract()andsearch()acceptoutput_schema(dict or Pydantic BaseModel)with Client(...) as client:)markdownify,agenticscraper,sitemap,healthz,feedback, all scheduled job methodsBreaking Change
Requires
scrapegraph-py>=2.0.0.Test plan
SmartScraperGraphwithllm_model="scrapegraphai/smart-scraper"works against v2 APIoutput_schema🤖 Generated with Claude Code