feat: Scrapy llms.txt crawler + Pydantic models + CRUD skills + persona subagents#1
Merged
alex-jadecli merged 8 commits intomainfrom Apr 12, 2026
Merged
Conversation
Set up a Python package with Scrapy to crawl Claude Code documentation pages discovered from llms.txt. Uses rbloom for URL deduplication, orjson for fast JSONL output, and Claudebot/2.1.104 user agent with autothrottle concurrency tuning. Includes SessionStart hook for cloud environment setup. https://claude.ai/code/session_01SR15X9ZzoNJdV3qo3fTdmB
Apply patterns from Anthropic engineering blog posts to improve the crawler system: - CLAUDE.md: project conventions under 200 lines (context as finite resource) - Skills: /crawl-audit, /think, /tool-design-checklist (just-in-time retrieval) - Subagents: page-analyzer, crawl-reviewer (isolated context, condensed summaries) - Hooks: PostToolUse/Edit runs ruff lint, SessionStart installs deps - Spider: errback error handling, structured heading extraction, crawl stats - StatsValidatorPipeline: evaluator-grader pattern for page quality scoring - Tests: 18 tests covering spider extraction and pipeline behavior - claude-progress.txt: cross-session handoff for incremental progress https://claude.ai/code/session_01SR15X9ZzoNJdV3qo3fTdmB
- Reusable colorlog-based logger (agentwarehouses.log) with Scrapy-compatible format and OTEL config reference for Claude Code 2.1.104 telemetry - 10 emotion-aware persona subagents modeled on Anthropic's emotion-concept research: SHANNON (reframing), THORP (verification), SIMONS (strategy), BEZOS (operations), JOBS (usability), AMODEI (AI vision), CHERNY (quality), MUSK (kaizen), BROWN (reliability), SU (team dynamics) - /advisors skill with composition patterns for persona orchestration - CLAUDE.md updated with emotional calibration rules - 32 tests passing, all lint clean https://claude.ai/code/session_01SR15X9ZzoNJdV3qo3fTdmB
19 model files covering 125 typed symbols across all 9 resource types: - permissions, tools (37 built-in), hooks (25 events), subagents, mcps - skills (with AgentSkills.io eval types), plugins, connectors - sessions, memories, agent-teams, channels, checkpoints - env-vars, commands, sdk (ClaudeAgentOptions, messages), otel Aligned with claude-agent-sdk Python and modelcontextprotocol/sdk-python v2. Pydantic 3.0-ready patterns (model_config, model_validate, ConfigDict). SemVer tracking for upstream dependency bumps via conventional-commits. 72 tests passing (32 existing + 40 model tests), all lint clean. https://claude.ai/code/session_01SR15X9ZzoNJdV3qo3fTdmB
…se-please Generator script produces 40 SKILL.md + 36 evals.json from resource profiles: - 4 interfaces (cli, sdk, api, graphql) × 9 resources (skills, plugins, connectors, mcps, subagents, hooks, sessions, memories, agent-teams) - 4 router skills for interface-level routing - Per-skill evals following AgentSkills.io specification - Release-please config for conventional-commits + semver versioning - 80 tests passing (32 crawler + 40 models + 8 eval schema) https://claude.ai/code/session_01SR15X9ZzoNJdV3qo3fTdmB
- Makefile with install/install-dev/test/test-cov/lint/crawl targets using uv for fast package management - pytest-xdist parallel workers (auto-detect CPUs, 16 on this machine) - pytest-cov with 90% fail-under threshold (actual: 99.47%) - Return type annotations on all spider, pipeline, and log functions - pytest markers: unit, integration, models, evals - conftest.py with auto-marker application - Comprehensive spider tests covering parse(), parse_doc_page(), handle_error(), closed() with Scrapy TextResponse mocking - SessionStart hook updated for local+remote via make install-dev - 95 tests passing, all lint clean https://claude.ai/code/session_01SR15X9ZzoNJdV3qo3fTdmB
- CONTRIBUTING.md with setup, workflow, code standards, commit conventions, and guides for adding models, skills, and subagents - .claude/sessions/ with full session transcript including all 10 user prompts https://claude.ai/code/session_01SR15X9ZzoNJdV3qo3fTdmB
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Transforms the repo from a single README.md into a full Python package: a Scrapy crawler for Claude Code documentation, Pydantic 2.0 data models for all Claude Code resources, 36 CRUD skills with AgentSkills.io evals, 10 emotion-calibrated persona subagents, and a modern test/CI surface.
Scrapy Crawler (
src/agentwarehouses/)https://code.claude.com/docs/llms.txt, extracts all.mdURLs, deduplicates with rbloom Bloom filter, downloads each pageoutput/docs.jsonlas newline-delimited JSON via orjsonROBOTSTXT_OBEY=True, AutoThrottle withCONCURRENT_REQUESTS=16Pydantic 2.0 Data Models (
src/agentwarehouses/models/)claude-agent-sdkPython andmodelcontextprotocol/sdk-pythonv2model_config,model_validate,ConfigDict)36 CRUD Skills (
.claude/skills/crud-*)scripts/generate_crud_skills.py)10 Persona Subagents (
.claude/agents/)/advisorsskill with composition patternsMakefile + Modern Testing
make install,make install-dev,make test,make test-cov,make lint,make crawl,make ciunit,integration,models,evalsmake install-devon all devices (local + remote)Release-Please + Conventional Commits
.release-please-manifest.json+release-please-config.jsonclaude-agent-sdk,mcp)Stats
Test plan
make install-devinstalls cleanly via uvmake lint— all lint clean (ruff E,F,I,W)make test-cov— 95 tests pass, 99.47% coverage (threshold: 90%)make test-unit/make test-models/make test-evals— marker filtering workspython -c "from agentwarehouses.models import *"— 125 symbols importmake generate-skills— produces 40 SKILL.md + 36 evals.jsonscrapy listdiscoversllmstxtspidermake crawl— full crawl against live docs +make crawl-audithttps://claude.ai/code/session_01SR15X9ZzoNJdV3qo3fTdmB