Automated extraction and documentation of coding standards from real codebases, optimized for RAG retrieval (BGE-large-en-v1.5 + Qdrant).
Next.js (3):
/Users/oppodeldoc/code/helix-dot-com-next/Users/oppodeldoc/code/kariusdx-next/Users/oppodeldoc/code/policy-node
Sanity.js (3):
/Users/oppodeldoc/code/helix-dot-com-sanity/Users/oppodeldoc/code/kariusdx-sanity/Users/oppodeldoc/code/ripplecom-nextjs(Sanity patterns only)
WordPress (2):
/Users/oppodeldoc/code/thekelsey-wp/Users/oppodeldoc/code/airbnb
Existing Documentation:
/Users/oppodeldoc/code/aleph-docs
aleph-code-mine/
├── analysis/ # Raw findings and comparison matrices
├── docs/ # RAG-optimized output documentation
│ ├── js-nextjs/
│ ├── sanity/
│ ├── php-wordpress/
│ └── cross-stack/
└── tooling/ # Enforcement and validation tools
├── semgrep/ # Custom Semgrep rules
├── validate-docs/ # Doc quality validation
└── generate-linter-docs/ # Auto-generate linter docs
Follows guides in:
codebase_mining_guide.md- Analysis methodologyrag_optimized_techdocs_guide.md- Output format specification
Phase 1: Structural Reconnaissance ✅ COMPLETE
- 8 structural analysis files
- Cross-project insights documented
Phase 2: Domain-Targeted Deep Dives 🎯 IN PROGRESS (31% complete)
- ✅ Domain 1: Component Patterns (8 docs + 4 Semgrep rules)
- ✅ Domain 2: Data Fetching (8 docs + 4 Semgrep rules)
- ⏳ Domains 3-9: Remaining Next.js patterns
Documentation: 16 RAG-optimized markdown files Enforcement: 8 Semgrep rules Analysis: 10 comparison/findings files Total Lines: ~5,000 lines of production-ready documentation
See PROGRESS.md for detailed status and next steps.