Skip to content

docs: migrate chat-webpage-simple-rag cookbook to v2 API (5/N)#95

Draft
Vikrant-Khedkar wants to merge 3 commits into
mainfrom
docs/cookbook-chat-rag-v2
Draft

docs: migrate chat-webpage-simple-rag cookbook to v2 API (5/N)#95
Vikrant-Khedkar wants to merge 3 commits into
mainfrom
docs/cookbook-chat-rag-v2

Conversation

@Vikrant-Khedkar
Copy link
Copy Markdown
Collaborator

⚠️ DO NOT MERGE — Draft

Fifth of N PRs (follows #91, #92, #93, #94) restoring + migrating the cookbook notebooks that were removed in 1f3b123 to the v2 SDK API. This PR migrates cookbook/chat-webpage-simple-rag/scrapegraph_burr_lancedb.ipynb — the heavy one (RAG pipeline with Burr + LanceDB + OpenAI + OpenTelemetry).

Migration

Old New
from scrapegraph_py import Client from scrapegraph_py import ScrapeGraphAI, MarkdownFormatConfig
from scrapegraph_py.logger import sgai_logger removed — module no longer exists in v2
sgai_logger.set_logging(level="INFO") logging.basicConfig(level=logging.INFO) (stdlib)
Client() ScrapeGraphAI()
sgai_client.markdownify(website_url=url) sgai_client.scrape(url, formats=[MarkdownFormatConfig()])
response["result"] (str) "\n\n".join(response.data.results["markdown"]["data"]) (list[str] → joined str) + .status check
https://dashboard.scrapegraphai.com/ https://scrapegraphai.com/dashboard
Old banner (143 KB inline base64) New ScrapeGraphAI banner (77 KB inline base64)

Notable v2 response shape change

scrape() returns response.data.results["markdown"]["data"] as list[str] (one element per page), not a single string like the old markdownify(). Cell 15's fetch_webpage action now joins them with \n\n before passing into the chunking pipeline.

Validation

  • Tested headlessly via jupyter nbconvert --execute — migrated cells (11, 14, 15) all run clean. Verified scrape() returns proper markdown; chunking + embedding (cell 22) consumes the joined string correctly. Full app.run() pipeline (cell 29) takes >10 min due to OpenAI embedding all chunks of scrapegraphai.com — that's pre-existing notebook behavior, not a migration regression.

Follow-ups (separate PRs, not blocking)

This completes all 5 direct-SDK notebooks. Remaining cookbook work:

  • LangChain integration notebooks (4) — via langchain-scrapegraph wrapper, need separate verification
  • LlamaIndex integration notebooks (5) — via llama-index-tools-scrapegraph wrapper, need separate verification
  • LangGraph integration notebooks (3)
  • CrewAI integration notebook (1)

🤖 Generated with Claude Code

- Client -> ScrapeGraphAI (auto-reads SGAI_API_KEY)
- markdownify(website_url=) -> scrape(url, formats=[MarkdownFormatConfig()])
- Drop scrapegraph_py.logger.sgai_logger (no longer exists) -> stdlib logging
- Response shape: response['result'] (str) -> "\\n\\n".join(response.data.results["markdown"]["data"]) (list[str])
- Update dashboard URL: dashboard.scrapegraphai.com -> scrapegraphai.com/dashboard
- Swap outdated banner image

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 11, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

VinciGit00 and others added 2 commits May 11, 2026 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants