Migrate from open-source scrapegraphai to API-based scrapegraph-py SDK #3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Replaces the open-source
scrapegraphailibrary (requiring local LLM setup) with thescrapegraph-pySDK for cloud-based API scraping.Changes
Dependencies & Configuration
requirements.txt:scrapegraphai>=1.0.0→scrapegraph-py>=1.0.0.env.example:OPENAI_API_KEY/SCRAPEGRAPHAI_API_KEY→SGAI_API_KEYconfig.py: Removedopenai_api_key, addedsgai_api_keyCore Implementation (
scraper.py)SmartScraperGraphwithClientfromscrapegraph_pyscrape_product(): Usesclient.smartscraper()with structured output schemascrape_search_results(): Extracts multiple products with response validationclose()method for resource cleanupExamples & Tests
scraper.close()calls to all examples and quickstartquickstart.pymessaging for API key awarenessDocumentation
Usage
Mock data fallback preserved for testing without API key. All 12 tests pass, 0 security vulnerabilities.
Original prompt
Objective
Migrate the ScrapeGraphAI Elasticsearch Demo from using the open-source
scrapegraphailibrary to the API-basedscrapegraph-pySDK.Background
Currently, the repository uses the open-source ScrapeGraphAI library which requires local LLM setup and OpenAI API keys. We need to transition to using the
scrapegraph-pySDK (installed viapip install scrapegraph-py) which provides a simpler API-based approach for scraping.Reference SDK repository: https://github.com/ScrapeGraphAI/scrapegraph-sdk
Required Changes
1. Update
requirements.txtReplace
scrapegraphai>=1.0.0withscrapegraph-pyand keep all other dependencies:2. Update
.env.exampleReplace OpenAI/ScrapeGraphAI API key variables with the SDK's expected format:
3. Update
src/scrapegraph_demo/config.pyModify the Config class to use
SGAI_API_KEYinstead ofOPENAI_API_KEYorSCRAPEGRAPHAI_API_KEY:openai_api_keyorscrapegraphai_api_keytosgai_api_keyfrom_env()method to readSGAI_API_KEYfrom environmentsgai_api_keypresence4. Rewrite
src/scrapegraph_demo/scraper.pyComplete rewrite to use the scrapegraph-py SDK instead of the open-source library:
Key SDK Methods to Use:
Client(api_key="...")orClient.from_env()- Initialize the SDK clientclient.smartscraper(website_url, user_prompt, output_schema)- Extract structured product dataclient.scrape(website_url)- Get raw HTML contentclient.close()- Close the client connectionImplementation Requirements:
Clientfrom scrapegraph-py in__init__scrape_product(): Useclient.smartscraper()with a detailed prompt to extract:scrape_search_results(): Useclient.smartscraper()orclient.searchscraper()to extract multiple productsclose()method that callsself.client.close()_extract_product_id()and_extract_price()SDK Response Format:
5. Update Example Files
Update all files in the
examples/directory to work with the new SDK-based scraper:examples/basic_usage.pyexamples/product_comparison.pyexamples/advanced_search.pyEnsure they:
scraper.close()when done6. Update
quickstart.pyModify the quickstart script to:
scraper.close()7. Update
README.mdUpdate documentation to reflect the SDK-based approach:
scrapegraph-py8. Update Tests
Update test files in
tests/to work with the SDK-based implementation:Technical Specifications
SDK Documentation Reference:
from scrapegraph_py import Clientclient = Client.from_env()(reads SGAI_API_KEY)AsyncClient(optional for future)Key Differences:
Testing Requirements
After implementation:
This pull request was created as a result of the following prompt from Copilot chat.
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.