A persistent, daemon-like AI agent that lives on your device.
Codey-v2 transforms Codey https://github.com/Ishabdullah/Codey from a session-based CLI tool into a continuous AI agentβmaintaining state, managing background tasks, and adapting to work without constant supervision. All while running locally on your Android device with dual-model hot-swap for thermal and memory efficiency.
βββββββ βββββββ βββββββ βββββββββββ βββ
βββββββββββββββββββββββββββββββββββββ ββββ
βββ βββ ββββββ βββββββββ βββββββ
βββ βββ ββββββ βββββββββ βββββ
βββββββββββββββββββββββββββββββββ βββ
βββββββ βββββββ βββββββ ββββββββ βββ
v2.4.0 Β· Learning AI Agent Β· Termux
- Export Interaction Data: Curate high-quality examples from your history
- Unsloth Colab Notebooks: Ready-to-run fine-tuning on free T4 GPU
- LoRA Adapter Import: Import trained adapters back to Codey-v2
- Off-device Training: Heavy compute on Colab, not your phone
codey2 --finetuneto export,codey2 --import-lorato import
- User Preference Learning: Automatically learns your coding style (test framework, naming, imports)
- Error Pattern Database: Remembers errors and fixes - suggests solutions for similar errors
- Strategy Tracking: Learns which recovery strategies work best over time
- Adaptive Behavior: Gets smarter with each interaction
/learningcommand to view learned preferences and statistics
- Shell Injection Prevention: Blocks
;,&&,||,|, backticks,$(),${},<(),>() - Self-Modification Opt-In: Requires
--allow-self-modflag orALLOW_SELF_MOD=1env var - Checkpoint Enforcement: Auto-creates checkpoint before modifying core files
- Workspace Boundaries: Files outside workspace blocked unless self-mod enabled
Codey-v2 is a persistent, autonomous coding agent that runs as a background daemon in Termux (or Linux), maintains long-term memory, executes shell commands via tools, supports self-modification (opt-in), and loads/runs local LLMs. These capabilities make it powerful but introduce non-trivial security risks compared to simple chat-based local LLMs.
This is early-stage open-source software β use with caution, especially on devices with sensitive data. Always review generated code/commands before execution, keep your device physically secure, and consider running in a restricted environment (e.g., dedicated Termux instance or container).
-
Persistent Daemon & Background Execution
- The daemon (
codeyd2) runs continuously with a Unix socket for IPC (codey-v2.sock). - Risk: If the socket file has permissive permissions or is in a shared location, unauthorized local processes could potentially send commands.
- Mitigations: Socket created with 0600 permissions (owner-only); daemon runs under your Termux/Linux user (no root required). Stop the daemon when not in use (
codeyd2 stop). Monitor withcodeyd2 statusorps. - Recommendation: Only start on trusted devices; avoid public/multi-user environments.
- The daemon (
-
Shell Command Execution & Tool Use
- Tools can execute shell commands (e.g., file ops, git, etc.) based on agent decisions.
- Risk: Prompt injection or hallucinated/malicious output could lead to unintended commands (e.g.,
rm -rf, data exfiltration if network tools added later). - Mitigations: Aggressive shell injection prevention (blocks
;,&&,||,|, backticks,\( (),\){},<(),>(), etc.); commands run in user context only. User must confirm high-risk actions in most flows (expandable). - Recommendation: Always review
--planoutput before execution; use--no-executeflag for dry runs.
-
Self-Modification & Code Alteration
- Opt-in feature allows the agent to patch its own code/files.
- Risk: If enabled and tricked (via clever prompts or bugs), it could introduce backdoors, delete data, or escalate damage persistently.
- Mitigations:
- Requires explicit
--allow-self-modflag orALLOW_SELF_MOD=1env var. - Auto-creates checkpoints + full backups before core changes.
- Git integration for versioning/rollback.
- Workspace boundaries enforced (outside files blocked unless self-mod active).
- Requires explicit
- Recommendation: Keep disabled by default. Only enable for experimentation; review diffs/checkpoints immediately after any mod.
-
Memory & State Persistence
- Hierarchical memory (SQLite for episodic/project state, embeddings for long-term).
- Risk: Sensitive code snippets, API keys (if you add tools), or personal data could be stored and potentially leaked if device compromised or backups mishandled.
- Mitigations: Data stored in Termux app-private dirs (
\~/.codey-v2/); no encryption yet (planned). No automatic exfiltration. - Recommendation: Avoid feeding sensitive info; periodically review/delete state (
codey2 memory clearor manual rm).
-
Model Loading & Fine-Tuning
- Loads external GGUF files; supports importing LoRA adapters from fine-tuning.
- Risk: Malicious/poisoned models/adapters could cause denial-of-service (OOM), unexpected behavior, or (theoretically) exploits if GGUF parsing has vulns.
- Mitigations: Models downloaded manually by user; no auto-download. Use trusted sources (Hugging Face official).
- Recommendation: Verify model hashes; run on isolated devices for testing untrusted adapters.
-
General Android/Termux Risks
- Runs with Termux permissions (storage, potentially network if tools expanded).
- Risk: Device-wide compromise if agent exploited (e.g., via generated malware code). Thermal/resource abuse possible on long runs.
- Mitigations: CPU-only inference; built-in thermal throttling (warnings + thread reduction). No root needed.
- Recommendation: Use on secondary/test device first; monitor battery/CPU with
topor Android settings.
- Shell metacharacter blocking
- Opt-in self-mod with checkpoints/git/rollback
- Workspace/file boundary enforcement
- Observability (
codey2 status, health checks) - No network calls by default (fully local)
- Audit report example in repo (
audit_report_2026-03-09_12-00-00.md)
- Encrypted memory/state storage
- Runtime sandboxing (e.g., bubblewrap/seccomp on Linux, better Termux isolation)
- Command confirmation prompts for more actions
- Model signature/hash verification
- Audit logs + anomaly detection
Transparency is key β the full source is open; feel free to audit, open issues, or submit PRs for hardening. If you spot vulnerabilities, report responsibly (DM or issue with security label). Contributions to security features are especially welcome!
Use at your own risk. This project is experimental β no warranties. Start small, monitor closely, and disable risky features until you're comfortable.
- Runs continuously in the background
- Unix socket for instant CLI communication
- Graceful shutdown and hot-reload support
- State persists across restarts
- Working Memory: Currently edited files (evicted after task)
- Project Memory: Key files like CODEY.md (never evicted)
- Long-term Memory: Embeddings for semantic search
- Episodic Memory: Complete action history
- Primary: Qwen2.5-Coder-7B for complex tasks
- Secondary: Qwen2.5-1.5B for simple queries
- Automatic routing based on input complexity
- LRU Cache: SIGSTOP/SIGCONT for quick restart (reduces 2-3s swap delay)
- Native task queue with dependency tracking
- Automatic task breakdown for complex requests
- Conversational Filters: Q&A queries don't trigger unnecessary planning
- Strategy adaptation on failure
- Background task scheduling
- Checkpoint before any core file modification
- Git integration for version control
- Rollback to any checkpoint
- Full file backup system
/statuscommand for full system state- Health monitoring (CPU, memory, uptime)
- Token usage tracking
- Task queue visibility
- Tracks continuous inference duration
- Warns after 5 minutes
- Reduces threads after 10 minutes
- Optimized for mobile devices (S24 Ultra)
- Strategy switching on failures
write_filefails β trypatch_file- Import error β suggest installation
- Test failure β debug with targeted fixes
- JSON Parser: Better escape sequence handling (
\n,\t,\",\\) - Hallucination Detection: Past/future tense analysis reduces false positives
- Context Budget: 4000 token cap prevents context overflow on large projects
./install.shThis single command installs everything:
- Python dependencies
- llama.cpp binary
- Both models (7B primary + 1.5B secondary)
- PATH configuration
After installation, restart your terminal and run:
codeyd2 start # Start the daemon
codey2 "create hello.py" # Send your first task
codey2 status # Check status anytimeIf you prefer to install components separately, follow these steps:
| Requirement | Specification |
|---|---|
| OS | Termux on Android (or Linux) |
| RAM | 6GB+ available (dual-model support) |
| Storage | ~10GB (7B model + 1.5B model + Codey) |
| Python | 3.12+ |
| Packages | rich, sentence-transformers, numpy, watchdog |
pkg install cmake ninja clang python
pip install rich sentence-transformers numpy watchdoggit clone https://github.com/ggerganov/llama.cpp ~/llama.cpp
cd ~/llama.cpp
cmake -B build -DLLAMA_CURL=OFF
cmake --build build --config Release -j4# Primary model (7B) - ~4.7GB
mkdir -p ~/models/qwen2.5-coder-7b
cd ~/models/qwen2.5-coder-7b
wget https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct-GGUF/resolve/main/qwen2.5-coder-7b-instruct-q4_k_m.gguf
# Secondary model (1.5B) - ~2GB
mkdir -p ~/models/qwen2.5-1.5b
cd ~/models/qwen2.5-1.5b
wget https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct-GGUF/resolve/main/qwen2.5-1.5b-instruct-q8_0.ggufgit clone https://github.com/Ishabdullah/Codey.git ~/codey-v2
cd ~/codey-v2
chmod +x codey2 codeyd2# Add to your shell config
echo 'export PATH="$HOME/codey-v2:$PATH"' >> ~/.bashrc
source ~/.bashrc
# Or run the setup script
./setup.shcodey2 --version
codeyd2 status| Command | Description |
|---|---|
codeyd2 start |
Start the daemon in background |
codeyd2 stop |
Stop the running daemon |
codeyd2 status |
Show daemon status |
codeyd2 restart |
Restart the daemon |
codeyd2 reload |
Send reload signal (SIGUSR1) |
codeyd2 config |
Create default config file |
| Command | Description |
|---|---|
codey2 "prompt" |
Send a task to the daemon |
codey2 status |
Show full daemon status |
codey2 task list |
List recent tasks |
codey2 task <id> |
Get details of a specific task |
codey2 cancel <id> |
Cancel a pending/running task |
codey2 --daemon |
Run in foreground daemon mode |
| Flag | Description |
|---|---|
--yolo |
Skip all confirmations |
--allow-self-mod |
Enable self-modification (with checkpoint enforcement) |
--threads N |
Override thread count |
--ctx N |
Override context window size |
--read <file> |
Pre-load file into context |
--init |
Generate CODEY.md and exit |
--fix <file> |
Run file, auto-fix any errors |
--tdd <file> |
TDD mode with test file |
--no-resume |
Start fresh (ignore saved session) |
--plan |
Enable plan mode for complex tasks |
--no-plan |
Disable orchestration/planning |
| Variable | Description |
|---|---|
ALLOW_SELF_MOD=1 |
Enable self-modification (alternative to --allow-self-mod) |
CODEY_MODEL |
Override model path |
CODEY_THREADS |
Override thread count |
./codey2 "What is 2+2?"./codey2 "Create a REST API with user authentication and JWT tokens"./codey2 statusOutput:
==================================================
Codey-v2 Status
==================================================
Version: 2.0.0
PID: 12345
Uptime: 3600s
Model:
Active: primary
Temperature: 0.2
Context: 4096
Tasks:
Pending: 0
Running: 0
Memory:
RSS: 45.2 MB
Files: 3
Health:
CPU: 2.5%
Model: Loaded
==================================================
./codey2 task listOutput:
Tasks:
[5] β Create Flask hello world app
[4] β Set up project structure
[3] β Write tests
[2] β Running: Install dependencies
[1] β Pending: Create requirements.txt
./codey2 cancel 2Codey-v2 supports personalizing the underlying model using your interaction data. Heavy training happens off-device (Google Colab free tier), while your phone only handles lightweight data export and file management.
# Export last 30 days with default quality threshold
codey2 --finetune
# Customize export
codey2 --finetune --ft-days 60 --ft-quality 0.6 --ft-model 7bOptions:
| Flag | Default | Description |
|---|---|---|
--ft-days |
30 | Days of history to include |
--ft-quality |
0.7 | Minimum quality (0.0-1.0) |
--ft-model |
both | Model variant: 1.5b, 7b, or both |
--ft-output |
~/Downloads/codey-finetune | Output directory |
Output:
codey-finetune-1.5b.jsonl- Dataset for 1.5B modelcodey-finetune-7b.jsonl- Dataset for 7B modelcodey-finetune-qwen-coder-1.5b.ipynb- Colab notebookcodey-finetune-qwen-coder-7b.ipynb- Colab notebook
- Go to https://colab.research.google.com
- Upload the generated notebook (
codey-finetune-*.ipynb) - Run all cells (takes 1-4 hours on free T4)
- Download the
codey-lora-adapter.zipwhen complete
Note: Training uses Unsloth for 2x speed and 70% less VRAM.
# Extract the downloaded adapter
unzip codey-lora-adapter.zip
# Import to Codey-v2
codey2 --import-lora /path/to/codey-lora-adapter --lora-model primaryOptions:
| Flag | Default | Description |
|---|---|---|
--lora-model |
primary | primary (7B) or secondary (1.5B) |
--lora-quant |
q4_0 | Quantization: q4_0, q5_0, q8_0, f16 |
--lora-merge |
false | Merge on-device (requires llama.cpp) |
If you have llama.cpp installed and want to merge on-device:
# Merge adapter with base model (requires ~8GB RAM for 7B)
codey2 --import-lora /path/to/adapter --lora-model primary --lora-merge
# Or manually with llama.cpp:
python ~/llama.cpp/convert-lora.py \
--base-model ~/models/qwen2.5-coder-7b/model.gguf \
--lora-adapter /path/to/adapter \
--output merged.gguf
./quantize merged.gguf merged-q4.gguf q4_0If the fine-tuned model performs worse:
# Backup is created automatically before import
# Restore original model
codey2 --rollback --lora-model primary- Higher
--ft-quality(0.8+): Only best examples, smaller dataset - Lower
--ft-quality(0.5-0.6): More examples, noisier training - More
--ft-days: More diverse data, longer training - 1.5B model: Faster training (~1 hour), good for style adaptation
- 7B model: Better reasoning (~4 hours), handles complex tasks
~/.codey-v2/config.json
./codeyd2 config{
"daemon": {
"pid_file": "~/.codey-v2/codey-v2.pid",
"socket_file": "~/.codey-v2/codey-v2.sock",
"log_file": "~/.codey-v2/codey-v2.log",
"log_level": "INFO"
},
"tasks": {
"max_concurrent": 1,
"task_timeout": 1800,
"max_retries": 3
},
"health": {
"check_interval": 60,
"max_memory_mb": 1500,
"stuck_task_threshold": 1800
},
"state": {
"db_path": "~/.codey-v2/state.db",
"cleanup_old_actions_hours": 24
}
}Edit ~/codey-v2/utils/config.py:
MODEL_CONFIG = {
"n_ctx": 4096, # Context window
"n_threads": 4, # CPU threads
"n_gpu_layers": 0, # GPU offload (0 = CPU only)
"temperature": 0.2, # Lower = more deterministic
"max_tokens": 1024, # Max response length
"repeat_penalty": 1.1,
}
ROUTER_CONFIG = {
"simple_max_chars": 50, # Under this β 1.5B model
"simple_keywords": ["hello", "hi", "thanks", "bye"],
"swap_cooldown_sec": 30, # Cooldown before swapping
}
THERMAL_CONFIG = {
"enabled": True,
"warn_after_sec": 300, # 5 min β warning
"reduce_threads_after_sec": 600, # 10 min β reduce threads
}βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CLI Client (codey2) β
β ββ User commands, flags, task queries, /status β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Daemon Core (codeyd2) β
β ββ Main event loop (asyncio) β
β ββ Signal handlers (SIGTERM, SIGUSR1) β
β ββ Unix socket listener β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββΌββββββββββββββββββββββ
βΌ βΌ βΌ
ββββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββ
β Planner β β Memory β β Tools β
β β’ Task queue β β β’ Working β β β’ Filesystem β
β β’ Dependencies β β β’ Project β β β’ Shell β
β β’ Adaptation β β β’ Long-term β β β’ Search β
β β’ Background β β β’ Episodic β β β
ββββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LLM Layer β
β ββ Model router (7B β 1.5B hot-swap) β
β ββ Hybrid backend (v2.4.0): β
β β’ Direct llama-cpp-python (if available) β
β β’ Unix socket HTTP (fallback) β
β β’ TCP localhost HTTP (fallback) β
β ββ Thermal management β
β ββ Backend auto-selection with graceful fallback β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β State Store (SQLite) β
β ββ Persistent memory, task queue, episodic log β
β ββ Model state, embeddings, checkpoints β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββ
β Working Memory (in-memory, evicted) β
β - Currently edited files β
β - Fast access, token-limited β
β - Cleared after task completes β
βββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββ
β Project Memory (persistent) β
β - Key files: CODEY.md, README.md β
β - Never evicted β
β - Loaded at daemon start β
βββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββ
β Long-term Memory (embeddings) β
β - sentence-transformers (all-MiniLM) β
β - Semantic search via similarity β
β - Stored in SQLite β
βββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββ
β Episodic Memory (action log) β
β - Append-only log of all actions β
β - "What did I do last week?" β
β - SQLite via state store β
βββββββββββββββββββββββββββββββββββββββββββ
from core.memory_v2 import get_memory
memory = get_memory()
# Working memory (temporary)
memory.add_to_working("file.py", content, tokens)
memory.clear_working() # After task
# Project memory (persistent)
memory.add_to_project("CODEY.md", content, is_protected=True)
# Long-term memory (semantic search)
memory.store_in_longterm("file.py", content)
results = memory.search("find authentication code", limit=5)
# Episodic memory (log)
memory.log_action("file_modified", "auth.py")Instead of fixed retries, Codey adapts its approach on failure:
| Error Type | Fallback Strategy | Confidence |
|---|---|---|
write_file fails |
Try patch_file |
0.9 |
| File not found | Create file first | 0.95 |
| Shell command fails | Search for solution | 0.8 |
| Import error | Install package | 0.9 |
| Syntax error | Fix syntax | 0.85 |
| Test failure | Debug test | 0.85 |
| Permission error | Fix permissions | 0.85 |
Error: "Failed to write file: permission denied"
β Recovery: Trying "use_patch" - Use patch instead of full write
Codey-v2 is optimized for mobile devices:
| Threshold | Action |
|---|---|
| 5 min continuous inference | Log warning |
| 10 min continuous inference | Reduce threads (4β2) |
| Cooldown period | Restore original threads |
./codey2 statusOutput includes:
Health:
CPU: 2.5%
Model: Loaded
Throttled: No
Before modifying any core file, Codey creates a checkpoint:
# Create checkpoint
from core.checkpoint import create_checkpoint
cp_id = create_checkpoint("Adding new feature")
# List checkpoints
from core.checkpoint import list_checkpoints
cps = list_checkpoints(limit=10)
# Rollback
from core.checkpoint import rollback
rollback(cp_id)~/.codey-v2/checkpoints/
βββ 1772934678/ # Timestamp ID
β βββ core/
β β βββ agent.py
β β βββ daemon.py
β β βββ ...
β βββ tools/
β β βββ ...
β βββ main.py
βββ 1772934700/
βββ ...
~/codey-v2/
βββ codey2 # CLI client script
βββ codeyd2 # Daemon manager script
βββ main.py # Main entrypoint
βββ codey-v2.md # This implementation plan
βββ core/
β βββ daemon.py # Daemon core with socket server
β βββ daemon_config.py # Configuration manager
β βββ state.py # SQLite state store
β βββ task_executor.py # Task execution with recovery
β βββ planner_v2.py # Internal task planner
β βββ background.py # Background tasks & file watches
β βββ filesystem.py # Direct filesystem access
β βββ memory_v2.py # Four-tier memory system
β βββ embeddings.py # Sentence-transformers integration
β βββ router.py # Model routing heuristic
β βββ loader_v2.py # Model loading/hot-swap
β βββ inference_v2.py # Dual-model inference
β βββ checkpoint.py # Self-modification safety
β βββ observability.py # Self-state queries
β βββ recovery.py # Error recovery strategies
β βββ thermal.py # Thermal management
βββ tools/
β βββ file_tools.py # Refactored file operations
βββ utils/
β βββ config.py # All configuration
β βββ logger.py # Logging with levels
βββ prompts/
βββ system_prompt.py # System prompt
from core.daemon import daemon_status, daemon_health, daemon_ping
# Check if daemon is running
status = daemon_status()
# Get health metrics
health = daemon_health()
# Ping daemon
pong = daemon_ping()from core.state import get_state_store
state = get_state_store()
# Key-value operations
state.set("key", "value")
value = state.get("key")
state.delete("key")
# Task operations
task_id = state.add_task("description")
state.start_task(task_id)
state.complete_task(task_id, "result")
state.fail_task(task_id, "error")
# Episodic log
state.log_action("action", "details")
actions = state.get_recent_actions(limit=50)from core.planner_v2 import get_planner
planner = get_planner()
# Add tasks
task_id = planner.add_task("Build a REST API")
task_ids = planner.add_tasks([
"Set up project structure",
"Create main application",
"Write tests",
])
# Get next ready task
task = planner.get_next_task()
# Break down complex task
subtasks = planner.breakdown_complex_task("Build a Flask app")
# Adapt on failure
alternative = planner.adapt(task_id, "Permission denied")from core.observability import get_state
state = get_state()
# Query properties
tokens = state.tokens_used
memory = state.memory_loaded
pending = state.tasks_pending
model = state.model_active
temp = state.temperature
# Full status
status = state.get_full_status()| Metric | Value |
|---|---|
| Primary Model | Qwen2.5-Coder-7B-Instruct Q4_K_M |
| Secondary Model | Qwen2.5-1.5B-Instruct Q8_0 |
| RAM Usage (idle) | ~200MB |
| RAM Usage (7B) | ~4.4GB |
| RAM Usage (1.5B) | ~1.2GB |
| Context Window | 4096 tokens |
| Threads | 4 (reducible to 2) |
| Speed (7B) | ~7-8 t/s |
| Speed (1.5B) | ~20-25 t/s |
| Hot-swap Delay | 2-3 seconds (LRU cached) |
| Backend | Overhead per Call | Availability |
|---|---|---|
| Direct binding | ~50-100ms | Attempted first; fails on Termux/Android |
| Unix socket HTTP | ~200-300ms | Available if llama-server supports --socket |
| TCP HTTP | ~400-600ms | Always available (fallback) |
Note: Actual latency depends on prompt length, model size, and device thermal state. Multi-turn agent loops benefit most from reduced overhead (e.g., 5 calls Γ 500ms = 2.5s saved with direct binding).
# Check for stale PID file
rm -f ~/.codey-v2/codey-v2.pid
# Check logs
cat ~/.codey-v2/codey-v2.log
# Restart
./codeyd2 restart# Verify daemon is running
./codeyd2 status
# Check socket exists
ls -la ~/.codey-v2/codey-v2.sock# Verify model paths
ls -la ~/models/qwen2.5-coder-7b/
ls -la ~/models/qwen2.5-1.5b/# Check status
./codey2 status
# Restart daemon (clears working memory)
./codeyd2 restart| Limitation | Impact | Workaround / Status |
|---|---|---|
| Direct binding on Termux | llama-cpp-python fails with "Unsupported platform" on Android | Hybrid backend (v2.4.0) attempts direct binding, falls back to HTTP automatically |
| HTTP API overhead | ~400-600ms per inference call (TCP), ~200-300ms (Unix socket) | Unix socket backend reduces latency by ~50%; direct binding (~50-100ms) attempted first |
File watches require watchdog |
Background file monitoring disabled if not installed | Install with pip install watchdog (optional) |
| No NPU acceleration | CPU-only inference (~3-5 t/s at 4 threads) | Thermal management prevents throttling |
| Single-device only | State not synced across devices | Intentional design for local-only privacy |
Hybrid Backend Architecture:
- Priority 1: Direct
llama_cpp.Llamabinding (~50-100ms overhead)- Attempted first on every startup
- Falls back gracefully if import fails (Termux/Android: "Unsupported platform")
- Priority 2: Unix domain socket HTTP (~200-300ms overhead)
- Uses
llama-server --socketinstead of TCP - Reduces latency by ~50% vs TCP localhost
- Uses
- Priority 3: TCP localhost HTTP (~400-600ms overhead)
- Original reliable fallback
- Always available if llama-server binary exists
Backend Selection:
- Automatic with graceful degradation
- Logs backend used and latency metrics
- Use
/statuscommand orget_backend_info()to check active backend
Why not direct binding on Termux?
llama-cpp-pythonuses pybind11 which requires platform-specific wheels- Termux/Android lacks prebuilt wheels; compilation often fails due to:
- Missing CMake toolchain for Android NDK
- Incompatible libc vs glibc expectations
- Shared library loading issues (
ctypes.CDLLfails on Android)
- Hybrid approach ensures functionality while attempting optimization
Migration Notes (v2.3.x β v2.4.0):
- No breaking changes; HTTP backend remains default fallback
- Existing
--finetune,--import-lora, learning features unchanged - To check backend:
codey2 --backend-info(new command in development)
| Version | Highlights |
|---|---|
| v2.4.0 | Hybrid Inference Backend - Direct llama-cpp-python + Unix socket HTTP + TCP HTTP fallback; accurate architecture diagram; documented Termux constraints |
| v2.3.0 | Fine-tuning Support - Export interaction data, Unsloth Colab notebooks, LoRA adapter import, off-device training workflow |
| v2.2.0 | Machine Learning - User preference learning, error pattern database, strategy effectiveness tracking, adaptive behavior |
| v2.1.0 | Security & Reliability Hardening - Shell injection prevention, self-mod opt-in, LRU model cache, JSON parser improvements, hallucination detection, orchestration filters, context budget |
| v2.0.0 | Complete 7-phase implementation - Daemon, Memory, Dual-Model, Planner, Checkpoints, Observability, Recovery |
| v1.0.0 | Original Codey - Session-based CLI with ReAct agent |
Codey-v2 includes a comprehensive test suite:
# Run all tests
pytest tests/ -v
# Run security tests
pytest tests/security/ -v
# Run specific test modules
pytest tests/test_shell_injection.py -v
pytest tests/test_hallucination.py -v
pytest tests/test_orchestration.py -v
pytest tests/test_json_parser.py -v
pytest tests/test_self_modification.py -v| Module | Tests | Coverage |
|---|---|---|
| Shell Injection | 16 | Command validation, metacharacter blocking |
| Self-Modification | 8 | Opt-in enforcement, checkpoint creation |
| JSON Parser | 16 | Escape handling, malformed input recovery |
| Hallucination Detection | 18 | Past/future tense analysis |
| Orchestration | 24 | Conversational filters, complexity heuristics |
| Learning Systems | 25 | Preferences, error database, strategy tracking |
MIT License - See LICENSE file for details.
- Fork the repository
- Create a feature branch
- Make your changes
- Run tests
- Submit a pull request
- llama.cpp for efficient LLM inference
- Qwen for the excellent code models
- sentence-transformers for embeddings