🏔️ AndesCode

Local AI coding assistant. No cloud. No leaks. No trust required.

AndesCode runs Gemma 4 26B entirely on your hardware. It indexes your codebase, understands your project structure, and answers questions about it — all locally, through its own native desktop interface. Your code is never uploaded anywhere.

Why local AI for code?

Every cloud coding assistant has the same architecture: your code leaves your machine, hits someone else's server, and comes back as a suggestion. For most developers, that's a fine trade-off.

For some, it isn't.

	AndesCode	GitHub Copilot	Cursor	Claude
Code stays on your machine	✅	❌	❌	❌
Works fully offline	✅	❌	❌	❌
No token bills	✅	❌	❌	❌
Local audit log	✅	❌	❌	❌
Frontier-class model	✅	✅	✅	✅
Deterministic / no outages	✅	❌	❌	❌

AndesCode is built for developers who work with client code under NDA, operate in regulated industries (healthcare, legal, finance, defense), or simply believe their code is their own.

Who this is for

Teams working with sensitive or proprietary code (NDA, IP-heavy projects)
Companies in regulated environments (finance, healthcare, legal)
Developers who want full control over their AI tooling and data flow

Features

🧠 Gemma 4 26B — high-capability open-weight model running entirely on your hardware
🔍 Codebase-aware — indexes your project, builds a project map, injects relevant context automatically
🗺️ Project intelligence — detects language, stack, entry points, domain, and key symbols on indexing
🔎 Smart retrieval — two-step planning (model selects relevant files first), query routing by filename/symbol/intent, and 4-axis re-ranking
⚠️ Coverage warnings — the model is told when it has a partial view of a file, so it never pretends to have context it doesn't
🔒 Local inference — offline flags enforced at OS level before any library loads; your code never leaves the machine
⚡ Fast — KV cache warm-up on startup, 30–40 tokens/second on Apple Silicon, streaming responses
🖥️ Native desktop app — runs as a native window on macOS and Windows via the built-in web UI
📋 Audit log — every request logged locally with metadata only; proof of isolation for compliance

Requirements

Platform	Hardware	RAM / VRAM
Apple Silicon Mac	M1 / M2 / M3 / M4	32GB unified memory
Windows / Linux	NVIDIA RTX 3090, 4090, 5090	24–32GB VRAM

Python 3.10+
~18GB free disk space

Quick Start

1. Clone

git clone https://github.com/yourusername/andescode
cd andescode

2. Run the launcher

python3 launch.py

That's it. On first run the launcher:

Detects your hardware (Apple Silicon → Metal, NVIDIA → CUDA)
Installs all dependencies with the correct GPU flags
Opens the AndesCode native window, which automatically:
- Downloads Gemma 4 26B (~16GB) from Hugging Face — progress shown on screen, resumes if interrupted
- Loads the model into memory
- Starts the local server

From there, the app guides you through indexing your project and you can start asking questions immediately. On subsequent runs, python3 launch.py just starts the app — model already cached, ready in seconds.

How It Works

Index your project
        ↓
Files are chunked with language-aware boundary detection
Embeddings stored in ChromaDB (local)
Project map built: language, stack, domain, entry points, symbol index
        ↓
You ask a question in the AndesCode window
        ↓
Step 1 — Planning: model scans your project map and identifies
         the most relevant files for your question
        ↓
Step 2 — Retrieval: those files are loaded in full, plus
         semantic search fills any gaps the planner missed
        ↓
Project map + code context injected into system prompt
Coverage warnings added if any file is only partially retrieved
        ↓
Gemma 4 generates a response grounded in your actual codebase
Streams to the UI with timing metadata
        ↓
Everything logged locally. Code never uploaded.

Privacy Model

Always local

Your source code (never read by any external server)
ChromaDB vector embeddings of your code
Every query and every response
The audit log at audit.log
Project map, symbol index, and file hash cache

Offline enforcement

Offline environment flags are set at process startup before model libraries initialize, preventing outbound network calls during inference.

os.environ["TRANSFORMERS_OFFLINE"] = "1"
os.environ["HF_DATASETS_OFFLINE"]  = "1"
os.environ["HF_HUB_OFFLINE"]       = "1"

Downloaded once, then cached

Item	Size	Source
Gemma 4 26B Q4 model	~16 GB	Hugging Face
`all-MiniLM-L6-v2` embeddings	~90 MB	Hugging Face

Both are cached permanently after first run.

Audit log format

The audit log records metadata only — no code content, no query text, no responses. Absolute paths and usernames are stripped from all log entries.

2026-04-08 09:15:33 | REQUEST d24024dd | tokens=1024 | messages=1
2026-04-08 09:15:34 | CONTEXT d24024dd | planned=['server.py', 'indexer.py'] | loaded=['server.py', 'indexer.py'] | chunks=14
2026-04-08 09:15:42 | STREAM_DONE d24024dd | context=1.1s | think=2.3s | ttft=2.1s | total=8.4s | chunks=47

Logged: request ID, token count, file names of retrieved chunks, timing.
Never logged: query text, response text, code content, file paths, usernames.

Network access summary

Phase	Network	Notes
First-run model download	✅ Once	~16GB from Hugging Face
First-run embedding download	✅ Once	~90MB from Hugging Face
Indexing	❌ Never	Fully local
Answering queries	❌ Never	Fully local

Hardware Guide

Hardware	Model	Speed
Apple M1/M2 Pro 32GB	Gemma 4 26B Q4	~20–30 t/s
Apple M3/M4 Pro 32GB	Gemma 4 26B Q4	~30–40 t/s
Apple M2/M3 Max 64GB	Gemma 4 31B Q4	~25–35 t/s
NVIDIA RTX 3090/4090 24GB	Gemma 4 26B Q4	~35–50 t/s
NVIDIA RTX 5090 32GB	Gemma 4 31B Q4	~50–70 t/s

Configuration

All configuration lives in .env:

MODEL_PATH=models/gemma-4-26B-A4B-it-UD-Q4_K_XL.gguf
PORT=8080
CONTEXT_CHUNKS=5        # code chunks injected per query
CACHE_SIZE_GB=2.0       # KV cache size allocated at startup
TRANSFORMERS_OFFLINE=1
HF_DATASETS_OFFLINE=1
HF_HUB_OFFLINE=1
TOKENIZERS_PARALLELISM=false

For large projects or architectural questions, increase CONTEXT_CHUNKS to 7–10. The retrieval pipeline automatically widens its candidate pool for broad queries — this setting controls how many final chunks land in the prompt.

Supported Languages

Python, JavaScript, TypeScript, JSX/TSX, Go, Rust, Java, Kotlin, Swift, C, C++, Ruby, PHP, C# — with language-aware chunking that respects function and class boundaries for each.

Roadmap

File watcher — automatic incremental re-index on save
AST-aware chunking — deeper boundary detection beyond regex
KVTC context compression — fit larger codebases in context
Private tunnel (Tailscale/WireGuard) for mobile access
iOS/Android chat client
Cryptographic egress proof for SOC 2 compliance
Pre-configured hardware bundle (Mac Mini)

Security Model

AndesCode is designed to run fully locally and offline during inference.

However, users are responsible for validating their own environment and dependencies for compliance requirements. AndesCode does not claim formal certification (e.g., SOC 2, ISO) at this stage.

FAQ

Does any code leave my machine?
No. Inference is entirely local. The only outbound connections are the one-time model download (~16GB) and embedding weights (~90MB) from Hugging Face on first run. Both are cached permanently. Offline flags are enforced at the OS level so no library can phone home during inference.

Does it integrate with VS Code, Cursor, or other IDEs?
Not at this time. AndesCode is a standalone desktop app with its own interface. IDE plugin integration is on the roadmap but not currently supported.

Can I use a different model?
Yes — any GGUF model compatible with llama.cpp. Update MODEL_PATH in .env.

Does it work on Windows or Linux?
Yes, with an NVIDIA GPU. launch.py detects nvidia-smi and compiles llama-cpp-python with CUDA automatically. Metal acceleration is Apple Silicon only.

Answers seem generic or miss important files. What's wrong?
Check that indexing completed — you should see ✅ Done — X files. For large projects, increase CONTEXT_CHUNKS in .env. You can also reference a specific file by name in your question — AndesCode will load all indexed chunks from that file directly.

How do I re-index after changing files?
Run python3 indexer.py /path/to/your/project again. MD5 hashing ensures only changed files are re-processed — unchanged files are reused from the existing index instantly.

Is there a hosted version?
No. That would defeat the purpose.

License

AndesCode is source-available.

Free for personal use and internal company use
Commercial redistribution, resale, or offering AndesCode as a service requires a commercial license

See LICENSE for full terms.

This licensing model allows teams to use AndesCode freely inside their organization, while preventing third parties from reselling or hosting it as a competing service.

Contributing

PRs welcome.

Highest-value contributions right now:

Windows / Linux setup testing and documentation
AST-aware chunking (deeper than current regex-based boundary detection)
File watcher for automatic incremental re-indexing

Built in the Andes. Runs everywhere.

AndesCode is built by an independent developer from Latin America. It exists because some teams require full control over their code, infrastructure, and data flow.

Source-available. Free to use internally. Commercial use requires a license.

Your AI runs at home. Your code never leaves.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
static		static
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
build_mac.sh		build_mac.sh
build_windows.bat		build_windows.bat
indexer.py		indexer.py
launch.py		launch.py
requirements.txt		requirements.txt
server.py		server.py
test_andescode.py		test_andescode.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🏔️ AndesCode

Why local AI for code?

Who this is for

Features

Requirements

Quick Start

How It Works

Privacy Model

Always local

Offline enforcement

Downloaded once, then cached

Audit log format

Network access summary

Hardware Guide

Configuration

Supported Languages

Roadmap

Security Model

FAQ

License

Contributing

Built in the Andes. Runs everywhere.

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🏔️ AndesCode

Why local AI for code?

Who this is for

Features

Requirements

Quick Start

How It Works

Privacy Model

Always local

Offline enforcement

Downloaded once, then cached

Audit log format

Network access summary

Hardware Guide

Configuration

Supported Languages

Roadmap

Security Model

FAQ

License

Contributing

Built in the Andes. Runs everywhere.

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages