feat: Scrapy llms.txt crawler + Pydantic models + CRUD skills + persona subagents by alex-jadecli · Pull Request #1 · agenttasks/agentwarehouses

alex-jadecli · 2026-04-12T12:49:13Z

Summary

Transforms the repo from a single README.md into a full Python package: a Scrapy crawler for Claude Code documentation, Pydantic 2.0 data models for all Claude Code resources, 36 CRUD skills with AgentSkills.io evals, 10 emotion-calibrated persona subagents, and a modern test/CI surface.

Scrapy Crawler (`src/agentwarehouses/`)

llmstxt spider: fetches https://code.claude.com/docs/llms.txt, extracts all .md URLs, deduplicates with rbloom Bloom filter, downloads each page
OrjsonWriterPipeline: writes output/docs.jsonl as newline-delimited JSON via orjson
StatsValidatorPipeline: evaluator-grader pattern scoring pages on 4 criteria
Claudebot/2.1.104 user agent, ROBOTSTXT_OBEY=True, AutoThrottle with CONCURRENT_REQUESTS=16
colorlog logger with OTEL telemetry config reference
Full return type annotations on all functions

Pydantic 2.0 Data Models (`src/agentwarehouses/models/`)

19 modules, 125 typed symbols covering all Claude Code resource types
Aligned with claude-agent-sdk Python and modelcontextprotocol/sdk-python v2
Pydantic 3.0-ready patterns (model_config, model_validate, ConfigDict)
SemVer tracking with upstream dependency version management

36 CRUD Skills (`.claude/skills/crud-*`)

4 interfaces (cli, sdk, api, graphql) x 9 resources (skills, plugins, connectors, mcps, subagents, hooks, sessions, memories, agent-teams)
4 router skills + 36 per-skill evals following AgentSkills.io spec
Generator script (scripts/generate_crud_skills.py)

10 Persona Subagents (`.claude/agents/`)

Core three (emotion-calibrated): SHANNON, THORP, SIMONS
Strategic layer: BEZOS, JOBS, AMODEI
Execution layer: CHERNY, MUSK, BROWN, SU
/advisors skill with composition patterns

Makefile + Modern Testing

Makefile control surface: make install, make install-dev, make test, make test-cov, make lint, make crawl, make ci
uv for fast package management
pytest-xdist parallel test execution (auto-detect CPUs)
pytest-cov with 90% fail-under threshold
pytest markers: unit, integration, models, evals
conftest.py with auto-marker application
SessionStart hook runs make install-dev on all devices (local + remote)

Release-Please + Conventional Commits

.release-please-manifest.json + release-please-config.json
Version bumps on upstream dependency changes (claude-agent-sdk, mcp)

Stats

Metric	Count
Files changed	148
Lines added	~7,300
Pydantic model symbols	125
CRUD skills	40
Eval files	36
Persona subagents	10
Tests passing	95
Code coverage	99.47%

Test plan

make install-dev installs cleanly via uv
make lint — all lint clean (ruff E,F,I,W)
make test-cov — 95 tests pass, 99.47% coverage (threshold: 90%)
make test-unit / make test-models / make test-evals — marker filtering works
python -c "from agentwarehouses.models import *" — 125 symbols import
make generate-skills — produces 40 SKILL.md + 36 evals.json
scrapy list discovers llmstxt spider
make crawl — full crawl against live docs + make crawl-audit

https://claude.ai/code/session_01SR15X9ZzoNJdV3qo3fTdmB

Set up a Python package with Scrapy to crawl Claude Code documentation pages discovered from llms.txt. Uses rbloom for URL deduplication, orjson for fast JSONL output, and Claudebot/2.1.104 user agent with autothrottle concurrency tuning. Includes SessionStart hook for cloud environment setup. https://claude.ai/code/session_01SR15X9ZzoNJdV3qo3fTdmB

Apply patterns from Anthropic engineering blog posts to improve the crawler system: - CLAUDE.md: project conventions under 200 lines (context as finite resource) - Skills: /crawl-audit, /think, /tool-design-checklist (just-in-time retrieval) - Subagents: page-analyzer, crawl-reviewer (isolated context, condensed summaries) - Hooks: PostToolUse/Edit runs ruff lint, SessionStart installs deps - Spider: errback error handling, structured heading extraction, crawl stats - StatsValidatorPipeline: evaluator-grader pattern for page quality scoring - Tests: 18 tests covering spider extraction and pipeline behavior - claude-progress.txt: cross-session handoff for incremental progress https://claude.ai/code/session_01SR15X9ZzoNJdV3qo3fTdmB

- Reusable colorlog-based logger (agentwarehouses.log) with Scrapy-compatible format and OTEL config reference for Claude Code 2.1.104 telemetry - 10 emotion-aware persona subagents modeled on Anthropic's emotion-concept research: SHANNON (reframing), THORP (verification), SIMONS (strategy), BEZOS (operations), JOBS (usability), AMODEI (AI vision), CHERNY (quality), MUSK (kaizen), BROWN (reliability), SU (team dynamics) - /advisors skill with composition patterns for persona orchestration - CLAUDE.md updated with emotional calibration rules - 32 tests passing, all lint clean https://claude.ai/code/session_01SR15X9ZzoNJdV3qo3fTdmB

19 model files covering 125 typed symbols across all 9 resource types: - permissions, tools (37 built-in), hooks (25 events), subagents, mcps - skills (with AgentSkills.io eval types), plugins, connectors - sessions, memories, agent-teams, channels, checkpoints - env-vars, commands, sdk (ClaudeAgentOptions, messages), otel Aligned with claude-agent-sdk Python and modelcontextprotocol/sdk-python v2. Pydantic 3.0-ready patterns (model_config, model_validate, ConfigDict). SemVer tracking for upstream dependency bumps via conventional-commits. 72 tests passing (32 existing + 40 model tests), all lint clean. https://claude.ai/code/session_01SR15X9ZzoNJdV3qo3fTdmB

…se-please Generator script produces 40 SKILL.md + 36 evals.json from resource profiles: - 4 interfaces (cli, sdk, api, graphql) × 9 resources (skills, plugins, connectors, mcps, subagents, hooks, sessions, memories, agent-teams) - 4 router skills for interface-level routing - Per-skill evals following AgentSkills.io specification - Release-please config for conventional-commits + semver versioning - 80 tests passing (32 crawler + 40 models + 8 eval schema) https://claude.ai/code/session_01SR15X9ZzoNJdV3qo3fTdmB

- Makefile with install/install-dev/test/test-cov/lint/crawl targets using uv for fast package management - pytest-xdist parallel workers (auto-detect CPUs, 16 on this machine) - pytest-cov with 90% fail-under threshold (actual: 99.47%) - Return type annotations on all spider, pipeline, and log functions - pytest markers: unit, integration, models, evals - conftest.py with auto-marker application - Comprehensive spider tests covering parse(), parse_doc_page(), handle_error(), closed() with Scrapy TextResponse mocking - SessionStart hook updated for local+remote via make install-dev - 95 tests passing, all lint clean https://claude.ai/code/session_01SR15X9ZzoNJdV3qo3fTdmB

https://claude.ai/code/session_01SR15X9ZzoNJdV3qo3fTdmB

- CONTRIBUTING.md with setup, workflow, code standards, commit conventions, and guides for adding models, skills, and subagents - .claude/sessions/ with full session transcript including all 10 user prompts https://claude.ai/code/session_01SR15X9ZzoNJdV3qo3fTdmB

claude added 5 commits April 12, 2026 11:34

alex-jadecli changed the title ~~Claude/python package setup j zrx c~~ feat: Scrapy llms.txt crawler + Pydantic models + CRUD skills + persona subagents Apr 12, 2026

claude added 3 commits April 12, 2026 12:55

fix: add .coverage to .gitignore

6c54ec6

https://claude.ai/code/session_01SR15X9ZzoNJdV3qo3fTdmB

alex-jadecli merged commit d302644 into main Apr 12, 2026

alex-jadecli deleted the claude/python-package-setup-JZrxC branch April 12, 2026 13:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Scrapy llms.txt crawler + Pydantic models + CRUD skills + persona subagents#1

feat: Scrapy llms.txt crawler + Pydantic models + CRUD skills + persona subagents#1
alex-jadecli merged 8 commits intomainfrom
claude/python-package-setup-JZrxC

alex-jadecli commented Apr 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alex-jadecli commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Scrapy Crawler (src/agentwarehouses/)

Pydantic 2.0 Data Models (src/agentwarehouses/models/)

36 CRUD Skills (.claude/skills/crud-*)

10 Persona Subagents (.claude/agents/)

Makefile + Modern Testing

Release-Please + Conventional Commits

Stats

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alex-jadecli commented Apr 12, 2026 •

edited

Loading

Scrapy Crawler (`src/agentwarehouses/`)

Pydantic 2.0 Data Models (`src/agentwarehouses/models/`)

36 CRUD Skills (`.claude/skills/crud-*`)

10 Persona Subagents (`.claude/agents/`)