Skip to content

feat!: migrate to scrapegraph-py v2 API surface#1058

Merged
VinciGit00 merged 5 commits intomainfrom
feat/migrate-to-scrapegraph-py-v2
Apr 19, 2026
Merged

feat!: migrate to scrapegraph-py v2 API surface#1058
VinciGit00 merged 5 commits intomainfrom
feat/migrate-to-scrapegraph-py-v2

Conversation

@VinciGit00
Copy link
Copy Markdown
Member

@VinciGit00 VinciGit00 commented Mar 31, 2026

Summary

  • Migrate all scrapegraph-py SDK usage to the new v2 API surface (see feat!: migrate Python SDK to v2 API surface scrapegraph-py#82)
  • Bump dependency from scrapegraph-py>=1.44.0 to >=2.0.0
  • Update core integration in SmartScraperGraph and all 3 example scripts
  • Pass output_schema to extract() so Pydantic schemas are forwarded to the v2 API
  • Use context manager pattern (with Client(...) as client) for proper resource cleanup

API mapping

v1 Method v2 Method Endpoint
smartscraper(website_url=, user_prompt=) extract(url=, prompt=, output_schema=) POST /api/v2/extract
searchscraper(user_prompt=) search(query=) POST /api/v2/search
markdownify(website_url=) scrape(url=) POST /api/v2/scrape
get_credits() credits() GET /api/v2/credits
generate_schema() (removed)
crawl() / get_crawl() crawl.start() / crawl.status() / .stop() / .resume() /api/v2/crawl
scheduled jobs monitor.create() / .list() / .pause() / .resume() / .delete() /api/v2/monitor
history() GET /api/v2/history

Other v2 changes (from scrapegraph-py)

  • Auth now sends both Authorization: Bearer and SGAI-APIKEY headers
  • New shared models: FetchConfig (with FetchMode enum: auto/fast/js/direct+stealth/js+stealth), LlmConfig
  • scrape() supports format: markdown, html, screenshot, branding
  • extract() and search() accept output_schema (dict or Pydantic BaseModel)
  • Context manager support (with Client(...) as client:)
  • Removed: markdownify, agenticscraper, sitemap, healthz, feedback, all scheduled job methods

Breaking Change

Requires scrapegraph-py>=2.0.0.

Test plan

  • Verify SmartScraperGraph with llm_model="scrapegraphai/smart-scraper" works against v2 API
  • Verify Pydantic schema is correctly forwarded via output_schema
  • Run example scripts against live API
  • Existing tests pass (no other code paths affected)

🤖 Generated with Claude Code

Update all SDK usage to match the new v2 API from ScrapeGraphAI/scrapegraph-py#82:
- smartscraper() → extract(url=, prompt=)
- searchscraper() → search(query=)
- markdownify() → scrape(url=)
- Bump dependency to scrapegraph-py>=2.0.0

BREAKING CHANGE: requires scrapegraph-py v2.0.0+

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Mar 31, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 31, 2026

Dependency Review

The following issues were found:
  • ✅ 0 vulnerable package(s)
  • ✅ 0 package(s) with incompatible licenses
  • ✅ 0 package(s) with invalid SPDX license definitions
  • ⚠️ 1 package(s) with unknown licenses.
See the Details below.

Snapshot Warnings

⚠️: No snapshots were found for the head SHA 5fda03f.
Ensure that dependencies are being submitted on PR branches and consider enabling retry-on-snapshot-warnings. See the documentation for more information and troubleshooting advice.

License Issues

pyproject.toml

PackageVersionLicenseIssue Type
scrapegraph-py>= 2.0.0NullUnknown License

OpenSSF Scorecard

PackageVersionScoreDetails
pip/scrapegraph-py >= 2.0.0 UnknownUnknown

Scanned Files

  • .github/workflows/test-suite.yml
  • pyproject.toml

@dosubot dosubot bot added dependencies Pull requests that update a dependency file enhancement New feature or request labels Mar 31, 2026
VinciGit00 and others added 4 commits April 9, 2026 11:46
- Pass output_schema to extract() so Pydantic schemas are forwarded to the v2 API
- Use context manager pattern (with Client(...) as client) for proper resource cleanup
- Simplify examples to match the v2 SDK style from scrapegraph-py
- Remove unused sgai_logger import (v2 client handles its own logging)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Support both the v2 Client API (PR #82) and the newer ScrapeGraphAI API
(PR #84) which uses Pydantic request models and ApiResult[T] wrappers.

- Add scrapegraph_py_compat helper with runtime API detection
- Route smart_scraper_graph through the compat layer
- Add v3-style examples for extract, search, and scrape

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
scrapegraph-py 2.0.0 requires Python >=3.12, so bump the project's
requires-python to match. Simplify the test workflow to a single
unit-test job on Python 3.12 / ubuntu-latest.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment on lines +12 to +34
name: Unit Tests
runs-on: ubuntu-latest

strategy:
fail-fast: false
matrix:
test-group: [smart-scraper, multi-graph, file-formats]

steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
python-version: '3.12'

- name: Install uv
uses: astral-sh/setup-uv@v4

- name: Install dependencies
run: |
uv sync
run: uv sync

- name: Install Playwright browsers
run: |
uv run playwright install chromium

- name: Run integration tests
env:
OPENAI_APIKEY: ${{ secrets.OPENAI_APIKEY }}
ANTHROPIC_APIKEY: ${{ secrets.ANTHROPIC_APIKEY }}
GROQ_APIKEY: ${{ secrets.GROQ_APIKEY }}
run: |
uv run pytest tests/integration/ -m integration --integration -v

- name: Upload test results
uses: actions/upload-artifact@v4
if: always()
with:
name: integration-test-results-${{ matrix.test-group }}
path: |
htmlcov/
benchmark_results/

benchmark-tests:
name: Performance Benchmarks
runs-on: ubuntu-latest
if: github.event_name == 'push' && github.ref == 'refs/heads/main'

steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'

- name: Install uv
uses: astral-sh/setup-uv@v4

- name: Install dependencies
run: |
uv sync

- name: Run performance benchmarks
env:
OPENAI_APIKEY: ${{ secrets.OPENAI_APIKEY }}
run: |
uv run pytest tests/ -m benchmark --benchmark -v

- name: Upload benchmark results
uses: actions/upload-artifact@v4
with:
name: benchmark-results
path: benchmark_results/

- name: Compare with baseline
if: github.event_name == 'pull_request'
run: |
# Download baseline from main branch
# Compare and comment on PR if regression detected
echo "Benchmark comparison would run here"

code-quality:
name: Code Quality Checks
runs-on: ubuntu-latest
if: github.event_name == 'push'

steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'

- name: Install uv
uses: astral-sh/setup-uv@v4

- name: Install dependencies
run: |
uv sync

- name: Run Ruff linting
run: |
uv run ruff check scrapegraphai/ tests/

- name: Run Black formatting check
run: |
uv run black --check scrapegraphai/ tests/

- name: Run isort check
run: |
uv run isort --check-only scrapegraphai/ tests/

- name: Run type checking with mypy
run: |
uv run mypy scrapegraphai/
continue-on-error: true
run: uv run playwright install chromium

test-coverage-report:
name: Test Coverage Report
needs: [unit-tests, integration-tests]
runs-on: ubuntu-latest
if: always()

steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Download coverage artifacts
uses: actions/download-artifact@v4

- name: Generate coverage report
run: |
echo "Coverage report generation would run here"

- name: Comment coverage on PR
if: github.event_name == 'pull_request'
uses: py-cov-action/python-coverage-comment-action@v3
with:
GITHUB_TOKEN: ${{ github.token }}

test-summary:
name: Test Summary
needs: [unit-tests, integration-tests, code-quality]
runs-on: ubuntu-latest
if: always()

steps:
- name: Check test results
run: |
echo "All test jobs completed"
echo "Unit tests: ${{ needs.unit-tests.result }}"
echo "Integration tests: ${{ needs.integration-tests.result }}"
echo "Code quality: ${{ needs.code-quality.result }}"
- name: Run unit tests
run: uv run pytest tests/ -m "unit or not integration"
@VinciGit00 VinciGit00 merged commit 90de842 into main Apr 19, 2026
7 of 8 checks passed
@VinciGit00 VinciGit00 deleted the feat/migrate-to-scrapegraph-py-v2 branch April 19, 2026 08:01
@github-actions
Copy link
Copy Markdown

🎉 This PR is included in version 2.0.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file enhancement New feature or request released on @stable size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants