Ambi is a flexible, highly customizable AI Agent framework built entirely in Rust. It empowers you to create production‑grade agents with minimal boilerplate, trait‑first design, and zero‑cost abstractions.
- Dual‑engine architecture – Seamlessly switch between local inference (via
llama.cppwith GPU acceleration) and cloud APIs (OpenAI‑compatible endpoints) without changing your agent code. - Advanced tool system – Parallel multi‑tool execution, per‑tool timeouts and retries, automatic JSON Schema generation from Rust structs.
- Intelligent context management – Safe eviction algorithm that preserves conversation logic, preventing token overflow while keeping your agent focused.
- Rust native – Memory safety, async/await everywhere, minimal dependencies, and fast compilation times.
The best way to learn Ambi is to write an agent. The examples/
directory contains complete, runnable examples covering basic chat, custom tools, local GPU inference, streaming, and
multi‑tool parallel execution.
Add this to your Cargo.toml:
[dependencies]
ambi = "0.3"For cloud‑only usage (faster compilation, no llama.cpp dependency):
ambi = { version = "0.3", default-features = false, features = ["openai-api"] }Ambi is built on the Tokio async runtime. Ensure your project uses Tokio with rt-multi-thread enabled. Without this,
Agent::make and all async methods will not function.
Ambi also provides native bindings for other languages:
Python – Install the pre-built wheel from PyPI:
pip install ambi-pythonfrom ambi import Agent, AgentState, Pipeline, LLMEngineConfigNode.js – Install the npm package with prebuilt binaries:
npm install ambi-nodeconst { Engine, Agent, AgentState, ChatRunner } = require('ambi-node');Prebuilt binaries are available for Windows, Linux (glibc & musl), and macOS on x64 & arm64 architectures. No Rust toolchain required on the consuming machine.
use ambi::{Agent, AgentState, ChatRunner, LLMEngineConfig};
use std::sync::Arc;
use tokio::sync::RwLock;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// 1. Pick an engine configuration
let config = LLMEngineConfig::OpenAI(ambi::OpenAIEngineConfig {
api_key: std::env::var("OPENAI_API_KEY")?,
base_url: "https://api.openai.com/v1".into(),
model_name: "gpt-4o".into(),
temp: 0.7,
top_p: 0.95,
});
// 2. Build an agent
let agent = Agent::make(config).await?
.preamble("You are a helpful assistant.")
.template(ambi::ChatTemplateType::Chatml);
// 3. Create a shared state with a unique session ID
let state = Arc::new(RwLock::new(AgentState::new("session-001")));
// 4. Run the chat pipeline
let runner = ChatRunner::default();
let response = runner.chat(&agent, &state, "Hello, world!").await?;
println!("{}", response);
Ok(())
}Enable the llama-cpp feature and optionally a GPU backend:
ambi = { version = "0.3", features = ["llama-cpp", "cuda"] }Then swap the engine configuration:
let config = LLMEngineConfig::Llama(ambi::LlamaEngineConfig {
model_path: "./models/llama-3-8b.gguf".into(),
max_tokens: 4096,
buffer_size: 32,
use_gpu: true,
n_gpu_layers: 100,
n_ctx: 8192,
n_tokens: 512,
n_seq_max: 1,
penalty_last_n: 64,
penalty_repeat: 1.1,
penalty_freq: 0.0,
penalty_present: 0.0,
temp: 0.7,
top_p: 0.9,
seed: 42,
min_keep: 1,
});Define a tool by implementing the Tool trait. Ambi automatically generates the JSON Schema for you.
use ambi::{Tool, ToolDefinition, ToolErr};
use serde::{Deserialize, Serialize};
use async_trait::async_trait;
#[derive(Deserialize)]
struct WeatherArgs {
city: String,
}
#[derive(Serialize)]
struct WeatherResult {
temperature: f64,
condition: String,
}
struct WeatherTool;
#[async_trait]
impl Tool for WeatherTool {
const NAME: &'static str = "get_weather";
type Args = WeatherArgs;
type Output = WeatherResult;
fn definition(&self) -> ToolDefinition {
ToolDefinition {
name: "get_weather".into(),
description: "Get current weather for a city".into(),
parameters: serde_json::json!({
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "City name"
}
},
"required": ["city"]
}),
timeout_secs: Some(10),
max_retries: Some(2),
is_idempotent: true,
}
}
async fn call(&self, args: WeatherArgs) -> Result<WeatherResult, ToolErr> {
// Your implementation here
Ok(WeatherResult {
temperature: 22.5,
condition: "Sunny".into(),
})
}
}Attach the tool to your agent:
let agent = Agent::make(config).await?
.preamble("You are a weather assistant.")
.tool(WeatherTool) ?;Now the agent can seamlessly invoke get_weather when the user asks about the weather. Ambi handles retries, timeouts,
and parallel execution automatically.
use futures::StreamExt;
let mut stream = runner.chat_stream(&agent, &state, "Tell me a story").await?;
while let Some(chunk) = stream.next().await {
match chunk {
Ok(text) => print!("{}", text),
Err(e) => eprintln!("Stream error: {}", e),
}
}WASM targets (browser) support the same streaming API natively via fetch and ReadableStream – see
examples/webAssembly for a live demo.
Ambi's context management automatically evicts old messages when the token budget is exceeded, while completely decoupling system instructions from the eviction FIFO queue for maximum KV Cache hit rates.
Volatile background knowledge like RAG results or environment variables can be injected safely into AgentState
without touching the static system_prompt:
// Inject RAG results for the current turn
state.write().await.set_dynamic_context("Relevant docs: ...");
// Or stack multiple sources
state.write().await.append_dynamic_context("Current time: 2025-01-01");Use clear_dynamic_context() to reset between turns.
use ambi::config::EvictionStrategy;
let agent = Agent::make(config).await?
.with_eviction_strategy(EvictionStrategy { max_safe_tokens: 4096 });The callback now receives &AgentState, giving you safe access to identifiers and connection pools from state
extensions for async database archiving:
let agent = Agent::make(config).await?
.on_evict(|state: &AgentState, evicted: Vec<Arc<Message>>| {
let session_id = &state.session_id;
// Spawn an async task to archive evicted messages
tokio::spawn(async move {
// persist evicted messages to DB
});
});// Find messages containing a keyword
let results = state.read().await.chat_history.search_by_keyword("weather");
// Get the last user message
if let Some(msg) = state.read().await.chat_history.last_user_message() {
// inspect the user's latest intent
}
// Get the last assistant message
if let Some(msg) = state.read().await.chat_history.last_assistant_message() {
// inspect the latest response
}By default Ambi uses [TOOL_CALL] ... [/TOOL_CALL] tags. You can bring your own parser:
use ambi::tool::{ToolCallParser, DefaultToolParser};
use ambi::types::StreamFormatter;
struct MyParser;
impl ToolCallParser for MyParser {
fn format_instruction(&self, tools_json: &str) -> String {
// instruct the model how to call tools
format!("Use tools: {}", tools_json)
}
fn parse(&self, text: &str) -> Vec<(String, serde_json::Value)> {
// extract tool calls from the model's output
vec![]
}
fn create_stream_formatter(&self) -> Box<dyn StreamFormatter> {
Box::new(ambi::agent::processor::PassThroughFormatter)
}
}
let agent = Agent::make(config).await?
.with_tool_parser(MyParser);Ambi uses thiserror to provide clear, actionable error types:
pub enum AmbiError {
EngineError(String),
AgentError(String),
ToolError(String),
ContextError(String),
PipelineError(String),
MaxIterationsReached(usize),
Other(anyhow::Error),
}All public APIs return Result<T, AmbiError>, making it easy to pattern‑match or propagate errors.
Ambi comes with comprehensive unit and integration tests. We recommend using cargo test during development. When
testing agents, consider using a mock engine to avoid real API calls:
struct MockEngine;
#[async_trait]
impl LLMEngineTrait for MockEngine {
async fn chat(&self, _: LLMRequest) -> Result<String> {
Ok("Hello, I am a mock.".into())
}
// ...
}
let agent = Agent::make(LLMEngineConfig::Custom(Box::new(MockEngine))).await?;Ambi uses Cargo features to keep compile times low:
openai-api(enabled by default) – OpenAI‑compatible cloud backend powered byasync-openai.llama-cpp– Local inference viallama.cpp(supportscuda,vulkan,metal,rocmsub‑features).cuda,vulkan,metal,rocm– GPU acceleration for the local engine (choose exactly one).macro– Enables#[tool]attribute macro for zero-boilerplate tool definitions withparams(...)support.mtmd– Multimodal (vision) support for local VLM models (impliesllama-cpp).
Licensed under the Apache License, Version 2.0.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this crate by you, as defined in the Apache-2.0 license, shall be licensed as above, without any additional terms or conditions.