Generate LLM-optimized snapshots of your code repositories for efficient sharing with AI assistants.
These scripts implement an approach to sharing code context with LLMs. This is a type of practical context engineering for AI-assisted software development, directly addressing the research question from "Context Engineering 2.0": "How can machines better understand our situations and purposes?"
In the evolving landscape of human-AI interaction, context engineering is the practice of structuring information to enable machines to better understand human situations, intentions, and environments. Our scripts apply this principle to software development by:
- Transforming raw repositories into AI-optimized context
- Preserving essential information while removing noise
- Enabling efficient human-AI collaboration on complex codebases
- Bridging the gap between unstructured development artifacts and AI comprehension
Time Savings: Instead of sharing entire repositories or manually copying files, developers get single, optimized files ready for LLM consumption.
Enhanced AI Assistance: Clean, structured context leads to better code reviews, debugging help, documentation generation, and architectural guidance.
Workflow Integration: Scripts can be integrated into CI/CD pipelines, pre-commit hooks, or VS Code tasks for seamless operation.
Security-Conscious: Automatic redaction protects sensitive information while maintaining context utility.
Framework Agnostic: Works across JavaScript, Python, Java, and other tech stacks.
These scripts represent Phase 2 context engineering (human-agent interaction) as outlined in the context engineering research:
- Phase 1 (1990s): Basic human-computer interaction frameworks
- Phase 2 (Current): Human-agent interaction paradigms β Our scripts fit here
- Phase 3 (Future): Human-level/superhuman intelligence
By optimizing how developers share project context with AI agents, these scripts contribute to the broader field of context engineering in AI systems.
The context engineering field suggests expanding beyond static snapshots toward:
- Dynamic Context: AI agents requesting specific additional context
- Context Memory: Building on previous interaction history
- Multi-modal Context: Including diagrams, architecture docs, and visual elements
- Interactive Context Engineering: Real-time context adaptation based on AI needs
These scripts create consolidated text files containing all relevant code, documentation, and configuration from your project. Instead of sharing entire repositories or multiple files, you get a single, well-formatted file that Large Language Models (LLMs) can easily process for:
- Code reviews and analysis
- Architecture discussions
- Debugging assistance
- Documentation generation
- Refactoring suggestions
- Learning and onboarding
The scripts intelligently capture:
- Source code (
.js,.ts,.py,.java, etc.) - Documentation (
.md, README files) - Configuration (
.json,.yaml,.toml,package.json, etc.) - Schemas (
.sqldatabase files) - Notebooks (
.ipynbJupyter files, formatted as JSON) - Scripts (
.sh, deployment scripts)
To keep snapshots focused and secure:
- Dependencies (
node_modules/, lock files) - Build artifacts (
dist/,.next/, etc.) - Secrets (
.envfiles, API keys automatically redacted) - Large data (
data/,models/directories) - Git history (
.git/) - IDE files (
.vscode/,.idea/) - OS files (
.DS_Store, thumbnails)
Recommended for most projects
node capture_repo_snapshot.mjsFeatures:
- β Advanced security redaction
- β Comprehensive framework support
- β Modern ES modules
- β Fallback token counting
- β
Special
.env.examplehandling
Output: repo_snapshot_llm_distilled.txt
Great for Python-focused projects
python create_llm_snapshot.pyFeatures:
- β Precise token counting with tiktoken
- β Clean Python implementation
- β Excellent notebook support
- β pathlib-based file handling
Output: knowledge_weaver_snapshot.txt
-
Download the script:
curl -O https://raw.githubusercontent.com/YOUR_USERNAME/YOUR_REPO/main/capture_repo_snapshot.mjs
-
Make it executable:
chmod +x capture_repo_snapshot.mjs
-
Run it:
node capture_repo_snapshot.mjs
-
Share the generated file:
repo_snapshot_llm_distilled.txtwill be created- Upload to your LLM conversation or share via pastebin/git gist
-
Install dependencies:
pip install tiktoken
-
Download the script:
curl -O https://raw.githubusercontent.com/YOUR_USERNAME/YOUR_REPO/main/create_llm_snapshot.py
-
Run it:
python create_llm_snapshot.py
-
Share the generated file:
knowledge_weaver_snapshot.txtwill be created
Both scripts generate files with this structure:
# Repository Snapshot (LLM-Optimized)
Generated On: 2025-11-05T21:21:25.433Z
# Mnemonic Weight (Token Count): ~7,818 tokens
# Directory Structure (relative to project root)
./README.md
./src/app.js
./config.json
...
--- START OF FILE ./README.md ---
[File content here]
--- END OF FILE ---
--- START OF FILE ./src/app.js ---
[File content here]
--- END OF FILE ---
- Automatic redaction of API keys, tokens, and secrets
- Pattern matching for common secret formats:
OPENAI_API_KEY=valueβOPENAI_API_KEY=[REDACTED]Bearer sk-abc123...βBearer [REDACTED]- Long secret keys starting with
sk-
- Clean output with no sensitive data included
- Exclusion of
.envfiles and common secret locations
Edit the allowedExtensions array to include/exclude file types:
// In capture_repo_snapshot.mjs
const allowedExtensions = ['.md', '.js', '.ts', '.py', '.sql', '.json'];# In create_llm_snapshot.py
ALLOWED_EXTENSIONS = ['.py', '.md', '.json', '.yaml', '.sql']Add directories or files to exclude:
// In capture_repo_snapshot.mjs
const excludeDirNames = new Set([
'node_modules', '.git', 'dist', 'custom_exclude_dir'
]);# In create_llm_snapshot.py
EXCLUDE_DIR_NAMES = {
'node_modules', '.git', 'dist', 'custom_exclude_dir'
}Both scripts estimate token counts to help you understand LLM context usage:
- JavaScript: Estimates ~4 characters per token (fallback when
gpt-tokenizerunavailable) - Python: Precise counting using OpenAI's
tiktokenlibrary
Add to your CI/CD pipeline:
# .github/workflows/snapshot.yml
name: Generate Repository Snapshot
on: [push, pull_request]
jobs:
snapshot:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-node@v3
with:
node-version: '18'
- run: node capture_repo_snapshot.mjs
- uses: actions/upload-artifact@v3
with:
name: repo-snapshot
path: repo_snapshot_llm_distilled.txt#!/bin/bash
# .git/hooks/pre-commit
node capture_repo_snapshot.mjs
git add repo_snapshot_llm_distilled.txt// .vscode/tasks.json
{
"version": "2.0.0",
"tasks": [
{
"label": "Generate Snapshot",
"type": "shell",
"command": "node",
"args": ["capture_repo_snapshot.mjs"],
"group": "build"
}
]
}"Can you review this authentication system?" (attach snapshot)
"Help me understand the overall structure of this project" (attach snapshot)
"I'm getting this error, can you help debug?" (attach snapshot + error)
"Generate API documentation for these endpoints" (attach snapshot)
"Explain how this data processing pipeline works" (attach snapshot)
- Fork the repository
- Create a feature branch
- Add your improvements
- Test with the sample project
- Submit a pull request
MIT License - feel free to use in your own projects!
Inspired by the need for better context sharing with AI assistants. Special thanks to the open-source community for tokenizer libraries and file processing utilities.