Skip to content

buster92/andes-code

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ”οΈ AndesCode

Local AI coding assistant. No cloud. No leaks. No trust required.

License Python Model Platform

AndesCode runs Gemma 4 26B entirely on your hardware. It indexes your codebase, understands your project structure, and answers questions about it β€” all locally, through its own native desktop interface. Your code is never uploaded anywhere.


Why local AI for code?

Every cloud coding assistant has the same architecture: your code leaves your machine, hits someone else's server, and comes back as a suggestion. For most developers, that's a fine trade-off.

For some, it isn't.

AndesCode GitHub Copilot Cursor Claude
Code stays on your machine βœ… ❌ ❌ ❌
Works fully offline βœ… ❌ ❌ ❌
No token bills βœ… ❌ ❌ ❌
Local audit log βœ… ❌ ❌ ❌
Frontier-class model βœ… βœ… βœ… βœ…
Deterministic / no outages βœ… ❌ ❌ ❌

AndesCode is built for developers who work with client code under NDA, operate in regulated industries (healthcare, legal, finance, defense), or simply believe their code is their own.


Who this is for

  • Teams working with sensitive or proprietary code (NDA, IP-heavy projects)
  • Companies in regulated environments (finance, healthcare, legal)
  • Developers who want full control over their AI tooling and data flow

Features

  • 🧠 Gemma 4 26B β€” high-capability open-weight model running entirely on your hardware
  • πŸ” Codebase-aware β€” indexes your project, builds a project map, injects relevant context automatically
  • πŸ—ΊοΈ Project intelligence β€” detects language, stack, entry points, domain, and key symbols on indexing
  • πŸ”Ž Smart retrieval β€” two-step planning (model selects relevant files first), query routing by filename/symbol/intent, and 4-axis re-ranking
  • ⚠️ Coverage warnings β€” the model is told when it has a partial view of a file, so it never pretends to have context it doesn't
  • πŸ”’ Local inference β€” offline flags enforced at OS level before any library loads; your code never leaves the machine
  • ⚑ Fast β€” KV cache warm-up on startup, 30–40 tokens/second on Apple Silicon, streaming responses
  • πŸ–₯️ Native desktop app β€” runs as a native window on macOS and Windows via the built-in web UI
  • πŸ“‹ Audit log β€” every request logged locally with metadata only; proof of isolation for compliance

Requirements

Platform Hardware RAM / VRAM
Apple Silicon Mac M1 / M2 / M3 / M4 32GB unified memory
Windows / Linux NVIDIA RTX 3090, 4090, 5090 24–32GB VRAM
  • Python 3.10+
  • ~18GB free disk space

Quick Start

1. Clone

git clone https://github.com/yourusername/andescode
cd andescode

2. Run the launcher

python3 launch.py

That's it. On first run the launcher:

  • Detects your hardware (Apple Silicon β†’ Metal, NVIDIA β†’ CUDA)
  • Installs all dependencies with the correct GPU flags
  • Opens the AndesCode native window, which automatically:
    • Downloads Gemma 4 26B (~16GB) from Hugging Face β€” progress shown on screen, resumes if interrupted
    • Loads the model into memory
    • Starts the local server

From there, the app guides you through indexing your project and you can start asking questions immediately. On subsequent runs, python3 launch.py just starts the app β€” model already cached, ready in seconds.


How It Works

Index your project
        ↓
Files are chunked with language-aware boundary detection
Embeddings stored in ChromaDB (local)
Project map built: language, stack, domain, entry points, symbol index
        ↓
You ask a question in the AndesCode window
        ↓
Step 1 β€” Planning: model scans your project map and identifies
         the most relevant files for your question
        ↓
Step 2 β€” Retrieval: those files are loaded in full, plus
         semantic search fills any gaps the planner missed
        ↓
Project map + code context injected into system prompt
Coverage warnings added if any file is only partially retrieved
        ↓
Gemma 4 generates a response grounded in your actual codebase
Streams to the UI with timing metadata
        ↓
Everything logged locally. Code never uploaded.

Privacy Model

Always local

  • Your source code (never read by any external server)
  • ChromaDB vector embeddings of your code
  • Every query and every response
  • The audit log at audit.log
  • Project map, symbol index, and file hash cache

Offline enforcement

Offline environment flags are set at process startup before model libraries initialize, preventing outbound network calls during inference.

os.environ["TRANSFORMERS_OFFLINE"] = "1"
os.environ["HF_DATASETS_OFFLINE"]  = "1"
os.environ["HF_HUB_OFFLINE"]       = "1"

Downloaded once, then cached

Item Size Source
Gemma 4 26B Q4 model ~16 GB Hugging Face
all-MiniLM-L6-v2 embeddings ~90 MB Hugging Face

Both are cached permanently after first run.

Audit log format

The audit log records metadata only β€” no code content, no query text, no responses. Absolute paths and usernames are stripped from all log entries.

2026-04-08 09:15:33 | REQUEST d24024dd | tokens=1024 | messages=1
2026-04-08 09:15:34 | CONTEXT d24024dd | planned=['server.py', 'indexer.py'] | loaded=['server.py', 'indexer.py'] | chunks=14
2026-04-08 09:15:42 | STREAM_DONE d24024dd | context=1.1s | think=2.3s | ttft=2.1s | total=8.4s | chunks=47

Logged: request ID, token count, file names of retrieved chunks, timing.
Never logged: query text, response text, code content, file paths, usernames.

Network access summary

Phase Network Notes
First-run model download βœ… Once ~16GB from Hugging Face
First-run embedding download βœ… Once ~90MB from Hugging Face
Indexing ❌ Never Fully local
Answering queries ❌ Never Fully local

Hardware Guide

Hardware Model Speed
Apple M1/M2 Pro 32GB Gemma 4 26B Q4 ~20–30 t/s
Apple M3/M4 Pro 32GB Gemma 4 26B Q4 ~30–40 t/s
Apple M2/M3 Max 64GB Gemma 4 31B Q4 ~25–35 t/s
NVIDIA RTX 3090/4090 24GB Gemma 4 26B Q4 ~35–50 t/s
NVIDIA RTX 5090 32GB Gemma 4 31B Q4 ~50–70 t/s

Configuration

All configuration lives in .env:

MODEL_PATH=models/gemma-4-26B-A4B-it-UD-Q4_K_XL.gguf
PORT=8080
CONTEXT_CHUNKS=5        # code chunks injected per query
CACHE_SIZE_GB=2.0       # KV cache size allocated at startup
TRANSFORMERS_OFFLINE=1
HF_DATASETS_OFFLINE=1
HF_HUB_OFFLINE=1
TOKENIZERS_PARALLELISM=false

For large projects or architectural questions, increase CONTEXT_CHUNKS to 7–10. The retrieval pipeline automatically widens its candidate pool for broad queries β€” this setting controls how many final chunks land in the prompt.


Supported Languages

Python, JavaScript, TypeScript, JSX/TSX, Go, Rust, Java, Kotlin, Swift, C, C++, Ruby, PHP, C# β€” with language-aware chunking that respects function and class boundaries for each.


Roadmap

  • File watcher β€” automatic incremental re-index on save
  • AST-aware chunking β€” deeper boundary detection beyond regex
  • KVTC context compression β€” fit larger codebases in context
  • Private tunnel (Tailscale/WireGuard) for mobile access
  • iOS/Android chat client
  • Cryptographic egress proof for SOC 2 compliance
  • Pre-configured hardware bundle (Mac Mini)

Security Model

AndesCode is designed to run fully locally and offline during inference.

However, users are responsible for validating their own environment and dependencies for compliance requirements. AndesCode does not claim formal certification (e.g., SOC 2, ISO) at this stage.


FAQ

Does any code leave my machine?
No. Inference is entirely local. The only outbound connections are the one-time model download (~16GB) and embedding weights (~90MB) from Hugging Face on first run. Both are cached permanently. Offline flags are enforced at the OS level so no library can phone home during inference.

Does it integrate with VS Code, Cursor, or other IDEs?
Not at this time. AndesCode is a standalone desktop app with its own interface. IDE plugin integration is on the roadmap but not currently supported.

Can I use a different model?
Yes β€” any GGUF model compatible with llama.cpp. Update MODEL_PATH in .env.

Does it work on Windows or Linux?
Yes, with an NVIDIA GPU. launch.py detects nvidia-smi and compiles llama-cpp-python with CUDA automatically. Metal acceleration is Apple Silicon only.

Answers seem generic or miss important files. What's wrong?
Check that indexing completed β€” you should see βœ… Done β€” X files. For large projects, increase CONTEXT_CHUNKS in .env. You can also reference a specific file by name in your question β€” AndesCode will load all indexed chunks from that file directly.

How do I re-index after changing files?
Run python3 indexer.py /path/to/your/project again. MD5 hashing ensures only changed files are re-processed β€” unchanged files are reused from the existing index instantly.

Is there a hosted version?
No. That would defeat the purpose.


License

AndesCode is source-available.

  • Free for personal use and internal company use
  • Commercial redistribution, resale, or offering AndesCode as a service requires a commercial license

See LICENSE for full terms.

This licensing model allows teams to use AndesCode freely inside their organization, while preventing third parties from reselling or hosting it as a competing service.


Contributing

PRs welcome.

Highest-value contributions right now:

  • Windows / Linux setup testing and documentation
  • AST-aware chunking (deeper than current regex-based boundary detection)
  • File watcher for automatic incremental re-indexing

Built in the Andes. Runs everywhere.

AndesCode is built by an independent developer from Latin America. It exists because some teams require full control over their code, infrastructure, and data flow.

Source-available. Free to use internally. Commercial use requires a license.


Your AI runs at home. Your code never leaves.

About

Local AI coding assistant that understands your codebase. Runs fully offline with no data leaving your machine.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages