Local AI coding assistant. No cloud. No leaks. No trust required.
AndesCode runs Gemma 4 26B entirely on your hardware. It indexes your codebase, understands your project structure, and answers questions about it β all locally, through its own native desktop interface. Your code is never uploaded anywhere.
Every cloud coding assistant has the same architecture: your code leaves your machine, hits someone else's server, and comes back as a suggestion. For most developers, that's a fine trade-off.
For some, it isn't.
| AndesCode | GitHub Copilot | Cursor | Claude | |
|---|---|---|---|---|
| Code stays on your machine | β | β | β | β |
| Works fully offline | β | β | β | β |
| No token bills | β | β | β | β |
| Local audit log | β | β | β | β |
| Frontier-class model | β | β | β | β |
| Deterministic / no outages | β | β | β | β |
AndesCode is built for developers who work with client code under NDA, operate in regulated industries (healthcare, legal, finance, defense), or simply believe their code is their own.
- Teams working with sensitive or proprietary code (NDA, IP-heavy projects)
- Companies in regulated environments (finance, healthcare, legal)
- Developers who want full control over their AI tooling and data flow
- π§ Gemma 4 26B β high-capability open-weight model running entirely on your hardware
- π Codebase-aware β indexes your project, builds a project map, injects relevant context automatically
- πΊοΈ Project intelligence β detects language, stack, entry points, domain, and key symbols on indexing
- π Smart retrieval β two-step planning (model selects relevant files first), query routing by filename/symbol/intent, and 4-axis re-ranking
β οΈ Coverage warnings β the model is told when it has a partial view of a file, so it never pretends to have context it doesn't- π Local inference β offline flags enforced at OS level before any library loads; your code never leaves the machine
- β‘ Fast β KV cache warm-up on startup, 30β40 tokens/second on Apple Silicon, streaming responses
- π₯οΈ Native desktop app β runs as a native window on macOS and Windows via the built-in web UI
- π Audit log β every request logged locally with metadata only; proof of isolation for compliance
| Platform | Hardware | RAM / VRAM |
|---|---|---|
| Apple Silicon Mac | M1 / M2 / M3 / M4 | 32GB unified memory |
| Windows / Linux | NVIDIA RTX 3090, 4090, 5090 | 24β32GB VRAM |
- Python 3.10+
- ~18GB free disk space
1. Clone
git clone https://github.com/yourusername/andescode
cd andescode2. Run the launcher
python3 launch.pyThat's it. On first run the launcher:
- Detects your hardware (Apple Silicon β Metal, NVIDIA β CUDA)
- Installs all dependencies with the correct GPU flags
- Opens the AndesCode native window, which automatically:
- Downloads Gemma 4 26B (~16GB) from Hugging Face β progress shown on screen, resumes if interrupted
- Loads the model into memory
- Starts the local server
From there, the app guides you through indexing your project and you can start asking questions immediately. On subsequent runs, python3 launch.py just starts the app β model already cached, ready in seconds.
Index your project
β
Files are chunked with language-aware boundary detection
Embeddings stored in ChromaDB (local)
Project map built: language, stack, domain, entry points, symbol index
β
You ask a question in the AndesCode window
β
Step 1 β Planning: model scans your project map and identifies
the most relevant files for your question
β
Step 2 β Retrieval: those files are loaded in full, plus
semantic search fills any gaps the planner missed
β
Project map + code context injected into system prompt
Coverage warnings added if any file is only partially retrieved
β
Gemma 4 generates a response grounded in your actual codebase
Streams to the UI with timing metadata
β
Everything logged locally. Code never uploaded.
- Your source code (never read by any external server)
- ChromaDB vector embeddings of your code
- Every query and every response
- The audit log at
audit.log - Project map, symbol index, and file hash cache
Offline environment flags are set at process startup before model libraries initialize, preventing outbound network calls during inference.
os.environ["TRANSFORMERS_OFFLINE"] = "1"
os.environ["HF_DATASETS_OFFLINE"] = "1"
os.environ["HF_HUB_OFFLINE"] = "1"| Item | Size | Source |
|---|---|---|
| Gemma 4 26B Q4 model | ~16 GB | Hugging Face |
all-MiniLM-L6-v2 embeddings |
~90 MB | Hugging Face |
Both are cached permanently after first run.
The audit log records metadata only β no code content, no query text, no responses. Absolute paths and usernames are stripped from all log entries.
2026-04-08 09:15:33 | REQUEST d24024dd | tokens=1024 | messages=1
2026-04-08 09:15:34 | CONTEXT d24024dd | planned=['server.py', 'indexer.py'] | loaded=['server.py', 'indexer.py'] | chunks=14
2026-04-08 09:15:42 | STREAM_DONE d24024dd | context=1.1s | think=2.3s | ttft=2.1s | total=8.4s | chunks=47
Logged: request ID, token count, file names of retrieved chunks, timing.
Never logged: query text, response text, code content, file paths, usernames.
| Phase | Network | Notes |
|---|---|---|
| First-run model download | β Once | ~16GB from Hugging Face |
| First-run embedding download | β Once | ~90MB from Hugging Face |
| Indexing | β Never | Fully local |
| Answering queries | β Never | Fully local |
| Hardware | Model | Speed |
|---|---|---|
| Apple M1/M2 Pro 32GB | Gemma 4 26B Q4 | ~20β30 t/s |
| Apple M3/M4 Pro 32GB | Gemma 4 26B Q4 | ~30β40 t/s |
| Apple M2/M3 Max 64GB | Gemma 4 31B Q4 | ~25β35 t/s |
| NVIDIA RTX 3090/4090 24GB | Gemma 4 26B Q4 | ~35β50 t/s |
| NVIDIA RTX 5090 32GB | Gemma 4 31B Q4 | ~50β70 t/s |
All configuration lives in .env:
MODEL_PATH=models/gemma-4-26B-A4B-it-UD-Q4_K_XL.gguf
PORT=8080
CONTEXT_CHUNKS=5 # code chunks injected per query
CACHE_SIZE_GB=2.0 # KV cache size allocated at startup
TRANSFORMERS_OFFLINE=1
HF_DATASETS_OFFLINE=1
HF_HUB_OFFLINE=1
TOKENIZERS_PARALLELISM=falseFor large projects or architectural questions, increase CONTEXT_CHUNKS to 7β10. The retrieval pipeline automatically widens its candidate pool for broad queries β this setting controls how many final chunks land in the prompt.
Python, JavaScript, TypeScript, JSX/TSX, Go, Rust, Java, Kotlin, Swift, C, C++, Ruby, PHP, C# β with language-aware chunking that respects function and class boundaries for each.
- File watcher β automatic incremental re-index on save
- AST-aware chunking β deeper boundary detection beyond regex
- KVTC context compression β fit larger codebases in context
- Private tunnel (Tailscale/WireGuard) for mobile access
- iOS/Android chat client
- Cryptographic egress proof for SOC 2 compliance
- Pre-configured hardware bundle (Mac Mini)
AndesCode is designed to run fully locally and offline during inference.
However, users are responsible for validating their own environment and dependencies for compliance requirements. AndesCode does not claim formal certification (e.g., SOC 2, ISO) at this stage.
Does any code leave my machine?
No. Inference is entirely local. The only outbound connections are the one-time model download (~16GB) and embedding weights (~90MB) from Hugging Face on first run. Both are cached permanently. Offline flags are enforced at the OS level so no library can phone home during inference.
Does it integrate with VS Code, Cursor, or other IDEs?
Not at this time. AndesCode is a standalone desktop app with its own interface. IDE plugin integration is on the roadmap but not currently supported.
Can I use a different model?
Yes β any GGUF model compatible with llama.cpp. Update MODEL_PATH in .env.
Does it work on Windows or Linux?
Yes, with an NVIDIA GPU. launch.py detects nvidia-smi and compiles llama-cpp-python with CUDA automatically. Metal acceleration is Apple Silicon only.
Answers seem generic or miss important files. What's wrong?
Check that indexing completed β you should see β
Done β X files. For large projects, increase CONTEXT_CHUNKS in .env. You can also reference a specific file by name in your question β AndesCode will load all indexed chunks from that file directly.
How do I re-index after changing files?
Run python3 indexer.py /path/to/your/project again. MD5 hashing ensures only changed files are re-processed β unchanged files are reused from the existing index instantly.
Is there a hosted version?
No. That would defeat the purpose.
AndesCode is source-available.
- Free for personal use and internal company use
- Commercial redistribution, resale, or offering AndesCode as a service requires a commercial license
See LICENSE for full terms.
This licensing model allows teams to use AndesCode freely inside their organization, while preventing third parties from reselling or hosting it as a competing service.
PRs welcome.
Highest-value contributions right now:
- Windows / Linux setup testing and documentation
- AST-aware chunking (deeper than current regex-based boundary detection)
- File watcher for automatic incremental re-indexing
AndesCode is built by an independent developer from Latin America. It exists because some teams require full control over their code, infrastructure, and data flow.
Source-available. Free to use internally. Commercial use requires a license.
Your AI runs at home. Your code never leaves.