feat: add documentation crawler for OpenZeppelin docs #42
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR Summary: feat: add documentation crawler for OpenZeppelin docs
📜 High-Level Summary
This PR replaces the AsciiDoc-based OpenZeppelin documentation ingestion with a simpler approach using pre-crawled markdown files. The crawler fetches documentation from websites using sitemaps and converts HTML content to clean markdown, making it easier to maintain up-to-date documentation sources for the Cairo Coder RAG pipeline.
Changeset 1: Replace AsciiDoc with Pre-Crawled Markdown for OpenZeppelin Docs
Files Affected:
packages/ingester/src/ingesters/OpenZeppelinDocsIngester.tspackages/ingester/src/utils/RecursiveMarkdownSplitter.tspackages/ingester/asciidoc/oz-playbook.yml(deleted)packages/ingester/asciidoc/playbook.ymlSummary of Changes:
[TRIAGE]: NEEDS_REVIEW
Changeset 3: Add Pre-Crawled OpenZeppelin Documentation
Files Affected:
python/scripts/summarizer/generated/openzeppelin_docs_summary.mdSummary of Changes:
[TRIAGE]: NEEDS_REVIEW