Sharpness of perception, keenness of mind.
A fully local AI assistant platform built for precision, depth, and extensibility — no cloud, no compromise.
Acuity bridges the gap between a polished consumer chat interface and a powerful developer-centric local AI orchestrator. It runs entirely on your machine — ensuring absolute privacy, zero cloud latency, and full control over your data and models.
Under the hood, Acuity orchestrates llama.cpp via a robust child-process manager, uses SQLite (WAL mode) for fast local state, and integrates LanceDB for native, local-first Retrieval-Augmented Generation (RAG). It's designed to be a serious alternative to proprietary tools — built open, built local.
- 🔒 100% Local & Private — No cloud, no telemetry. Your models, your data, your rules.
- 👁️ Native Multimodal Support — Automatic detection and binding of vision projectors (
mmproj) for seamless image analysis. - 📚 Built-in RAG & Long-Term Memory — LanceDB vector storage automatically indexes conversations for semantic search. The AI autonomously learns and persists your preferences across sessions.
- 🛠️ Extensible Tooling via AcuitySDK
- Sandboxed Python Execution — Safe data analysis and math, right in the chat.
- Web Browsing — DuckDuckGo integration with smart page extraction.
- Custom JS/TS Tools — Write, save, and hot-reload your own tools directly in the UI using the Monaco Editor and the unified
AcuitySDK.
- 🔀 Non-Linear Conversations — In-place message editing, versioning, and branching. Never lose a train of thought.
- ⚙️ Dynamic Prompting — Inject real-time context with slots like
{{datetime}},{{memory}}, and{{semantic_context}}. - 🎨 Polished UI — Built with Tailwind v4, featuring custom typography, smart interruptible auto-scroll, and dynamic theming (Dark, Light, OLED).
- Node.js v20+
- llama.cpp binaries compiled for your system (
llama-server/llama-server.exe) - At least one GGUF model (e.g., Llama 3, Mistral, Phi, Qwen)
Acuity itself is lightweight. Performance depends on the model you load:
| Setup | Minimum | Recommended |
|---|---|---|
| RAM (CPU inference) | 16 GB | 32 GB+ |
| VRAM (GPU inference) | 6 GB (7B Q4) | 12 GB+ (13B+) |
| Storage | 5 GB + model size | SSD strongly recommended |
GPU offloading via llama.cpp is supported and strongly recommended for any model above 7B.
-
Clone the repository:
git clone https://github.com/Biscotto58/Acuity.git cd acuity -
Install dependencies:
npm install
-
Start the development server:
npm run dev
-
Initial Configuration:
- Open
http://localhost:3000in your browser. - Go to Settings → Server and point Acuity to your
llama.cppbinary and models directory. - Go to Settings → AI and select your default inference and embedding models.
- Open
Custom tools are written directly in the browser using the Monaco Editor. They hot-reload instantly and run in a sandboxed Node.js VM. Every tool receives the AcuitySDK — giving it direct access to the LLM, the vector database, and long-term memory.
module.exports = {
name: "my_custom_tool",
uiDescription: "Example Tool",
iconName: "Wrench",
isAutonomous: false,
schema: {
name: "my_custom_tool",
description: "Instructions for the LLM on when to use this tool.",
parameters: {
type: "object",
properties: {
query: { type: "string", description: "What to process" }
},
required: ["query"]
}
},
execute: async (args, toolSettings, acuity) => {
// Generate an embedding
const vector = await acuity.vector.getEmbedding(args.query);
// Ask the LLM a sub-query
const analysis = await acuity.llm.chat([
{ role: "user", content: `Analyze this: ${args.query}` }
]);
// Persist a preference to long-term memory
acuity.memory.savePreference("last_analyzed", args.query);
return `Analysis complete: ${analysis}`;
}
};| Layer | Tech |
|---|---|
| Frontend | Next.js 16 App Router, React 19, Tailwind CSS v4 |
| Backend | Next.js API Routes (proxy + orchestrator) |
| Process Manager | Custom Node.js event emitter — manages llama.cpp lifecycle, port allocation, and health polling |
| Relational DB | better-sqlite3 in WAL mode (sessions, messages, settings) |
| Vector DB | @lancedb/lancedb (embeddings, semantic search, memory) |
| Model Parsing | Custom GGUF metadata reader — extracts context size and model type without loading weights |
There is no fixed roadmap. Acuity is built iteratively based on what's actually useful.
Have a feature idea or found something missing? Open an issue or start a discussion — suggestions from the community are always welcome and considered. What gets implemented is ultimately my call, but good ideas get built.
Contributions are welcome. If you want to add a core tool, improve the RAG pipeline, optimize the UI, or fix something that's been bugging you — go for it.
- Fork the project
- Create a feature branch (
git checkout -b feature/your-feature) - Commit your changes (
git commit -m 'Add your feature') - Push and open a Pull Request
If you're unsure whether something fits the project's direction, open an issue first to discuss it.
Acuity provides a platform for running AI models and executing tools locally on your machine. You are solely responsible for any tools you run, code you execute, and actions taken by the AI on your behalf. This includes — but is not limited to — custom tools written via the AcuitySDK, web browsing actions, file system access, and any Python code executed within the sandboxed environment.
The author(s) of Acuity assume no liability for any damage, data loss, security breaches, or unintended consequences resulting from the use of this software or any tools run through it. Use at your own risk.
Distributed under the PolyForm Noncommercial License 1.0.0. Free for personal, educational, and non-commercial use. See LICENSE for details.
