Course: ECE1724 - Special Topics in Software Engineering - Performant Software Systems with Rust
Project Name: Simple LLM-Powered CLI
Date: December 2025
| Name | Student Number | Preferred Email |
|---|---|---|
| Peixuan Li | 1006719464 | adampeixuan.li@mail.utoronto.ca |
| Zhengyang Li | 1012373977 | zhengyang.li@mail.utoronto.ca |
| Yanchi Wang | 1006085264 | yanchi.wang@mail.utoronto.ca |
In recent years, the intersection of Artificial Intelligence and software engineering has exploded. Large Language Models (LLMs) like GPT-4 and Claude 3 have transformed how developers write code. However, the current landscape of AI tools for developers is polarized. On one end, we have web-based chatbots (ChatGPT, Claude.ai) that are powerful but isolated—they cannot see your local file system, check your git status, or run your tests. On the other end, we have desktop applications wrapped in web technologies (Electron), such as VS Code extensions or standalone AI clients. While functional, these applications are notoriously resource-heavy. A simple chat client can easily consume 500MB+ of RAM simply because it bundles an entire Chromium browser and Node.js runtime.
As students of system engineering, we asked ourselves: Why must we sacrifice performance for intelligence? The terminal is the native habitat of developers—it is lightweight, fast, and composable. Yet, most "CLI AI" tools today are merely thin wrappers around Python scripts that pipe text to an API, lacking state management, interactivity, or the ability to perform complex, multi-step tasks autonomously.
This project was born from the desire to build a "System 2" thinker for the command line using Rust. We chose Rust not just for its hype, but for specific technical advantages that align perfectly with an AI agent's needs:
- Memory Safety without GC: AI agents often run long-lived background processes (monitoring, waiting for tokens). Rust’s ownership model ensures we don't leak memory or suffer from garbage collection pauses that ruin the TUI experience.
- Fearless Concurrency: An effective agent needs to stream LLM tokens, query a database, and execute shell commands simultaneously. Rust’s
Tokioruntime allows us to handle thousands of async tasks with a tiny footprint. - Type-Driven Robustness: Dealing with LLM outputs is messy (hallucinated JSON, malformed strings). Rust’s strong type system (Serde, Enums) forces us to handle every edge case at compile time, resulting in a system that rarely crashes in production.
Our motivation is to prove that high-performance systems engineering principles can be applied to AI tooling, creating an assistant that feels like a native extension of the operating system rather than a bloated web page.
The overarching goal of this project was to engineer a Simple LLM-Powered CLI that brings "Agentic" capabilities to the local terminal environment. We broke this down into four concrete technical objectives:
- Target: Achieve a startup time of under 200ms and a resting memory footprint of less than 50MB.
- Implementation: Eliminate heavy runtimes. Use a compiled binary with zero external dependencies (other than the OS and a database).
- Target: Move beyond simple "Chat" to "Action". The system must be able to plan a sequence of actions to solve a goal.
- Challenge: LLMs are stateless predictors. We needed to build a "Planner" engine that can translate a user's high-level intent (e.g., "Refactor this module") into a Directed Acyclic Graph (DAG) of atomic steps (read file -> analyze -> write file -> run tests).
- Target: Give the AI "hands" to interact with the OS.
- Scope: Implement a secure "Tool Registry" that allows the LLM to call filesystem APIs, shell commands, and git operations. Critically, this must be secure—preventing the AI from running dangerous commands like
rm -rf /without explicit user oversight.
- Target: Persistent context awareness.
- Implementation: Unlike simple CLIs that lose context when the window closes, our system must use an embedded or local database (MySQL) to store conversation history, allowing the user to resume complex tasks days later.
Our final deliverable is a comprehensive CLI application that is more than just a chatbot. It is a workspace assistant with the following key features:
The defining feature of our project is its agentic loop. When a user inputs a complex request, the system does not immediately stream a text response. Instead:
- Planning Phase: The Planner module prompts the LLM to generate a JSON-structured plan. For example, if asked to "Summarize the README," it creates a plan:
[Step 1: fs_read(README.md), Step 2: summarize(content)]. It automatically fills tool parameters from natural language hints. - Execution Phase: The Executor processes these steps. It handles dependencies—Step 2 cannot start until Step 1 succeeds. It essentially acts as a task scheduler for the AI.
- Verification Phase: A Verifier analyzes the output of the execution. Did the tool return an error code? Did the file actually get created? Only after verification does the system formulate a final response to the user.
We implemented a modular "Tool" trait system, enabling us to easily plug in new capabilities. The system currently ships with:
- Filesystem Tools:
filesystem,fs_ls,fs_cat,fs_write,fs_mkdir,fs_rm. These provide a safe abstraction overstd::fs. - Developer Tools:
git(for status checks and logs),shell(for safe, non-interactive command execution). - Network & Data:
web_fetch(for retrieving documentation or web pages),database(for querying its own chat history). - Extensions:
editor(ACP-like local file ops),mcp(Model Context Protocol proxy), andtext_writer. - Security: A hardcoded "Deny List" middleware intercepts every tool call. If the LLM attempts to invoke a blacklisted command (e.g.,
delete,format), the execution is halted immediately with a security warning.
We recognized that different contexts require different interfaces, so we built three:
- TUI (Text User Interface): Built with
Ratatui. This is the primary mode. It features:- Async Event Loop: The UI remains responsive (scrollable, resizeable) even while the LLM is streaming tokens or a heavy database query is running in the background.
- Rich Layout: Includes panels for conversation list, message log, execution plan visualization, and tool output status.
- History Sidebar: Allows navigation through past conversations stored in MySQL.
- CLI Mode: A standard REPL (Read-Eval-Print Loop) for users who prefer a raw terminal experience or want to pipe input/output (
cargo run -- --cli). - One-Shot Mode: A "fire and forget" mode (e.g.,
cargo run -- --once "Check git status"). This is designed to be integrated into shell scripts or other tools.
- Streaming LLM Client: We implemented a robust HTTP client to communicate with OpenAI-compatible endpoints (
/v1/chat/completions). This supports full streaming response parsing for responsive output and health checks. - Local MCP Server: An opt-in feature (
ENABLE_LOCAL_MCP_SERVER=true) where our CLI acts as a server, exposing its local tools viaPOST /tools/{name}/invoketo other AI clients (like Claude Desktop). - Persistence: MySQL-backed storage for
conversationsandmessages, ensuring seamless resumption of tasks across sessions.
Since this is a Rust binary, installation is straightforward.
- Environment: Ensure you have a MySQL server running and an OpenAI-compatible LLM endpoint (like LM Studio or Ollama).
- Config: Create a
.envfile (copy fromthis.env) withDATABASE_URLandLLM_BASE_URL. - Run:
cargo run --release.
- Navigation: Use
Up/Downkeys to scroll the conversation list.PageUp/PageDownto scroll messages. - Input: The input box supports multi-line editing. Press
Enterto send. - Shortcuts:
Ctrl+Nto start a new conversation,Ctrl+Dto delete the current one,qto quit. - Workflow Visualization: When the agent enters "Agentic Mode," you will see a collapsible section showing the "Plan," "Execution," and "Tool Results" in real-time.
You don't need to explicitly invoke tools. Just ask in natural language:
- "What files are in the src directory?" -> The Agent automatically calls
fs_ls. - "Check my git status and tell me what's changed." -> The Agent calls
git status. - "Fetch the content of rust-lang.org." -> The Agent calls
web_fetch.
- "Database Connection Failed": Check if your MySQL user has permissions. The app tries to auto-migrate schema, but it needs
CREATE TABLEprivileges. - "LLM returns nonsense": Ensure you are using a model capable of tool calling (e.g., Mistral-Instruct or Llama 3). Smaller models may struggle to generate valid JSON plans.
- "Connection Refused": Check if your local LLM server (e.g., LM Studio) is running and the port in
.envmatches.
To ensure full reproducibility, we have strictly defined our dependencies and build process.
- OS: Tested on Ubuntu 22.04 LTS and macOS Sonoma 14.x. Windows is supported but requires a properly configured terminal (Windows Terminal) for TUI rendering.
- Rust: Version 1.75.0 or later.
- Database: MySQL 8.0 or MariaDB 10.5.
- LLM: Any server compliant with OpenAI Chat Completions API. We verified reproducibility using Ollama v0.1.30 running
mistral:latestand LM Studio runningqwen2.5-7b-instruct.
-
Clone Source:
git clone <repo_url> cd ece1724Rust
-
Setup Database:
mysql -u root -p -e "CREATE DATABASE IF NOT EXISTS llm_cli; CREATE USER IF NOT EXISTS 'llmuser'@'%' IDENTIFIED BY 'llmpassword'; GRANT ALL ON llm_cli.* TO 'llmuser'@'%'; FLUSH PRIVILEGES;"
-
Compile: We use
cargofor reproducible builds. TheCargo.lockfile ensures you get the exact same dependency versions we used.cargo build --release
Note: The first build pulls crates like
tokioandsqlx, which may take 2-5 minutes depending on network speed. -
Verification: Run the integration test suite. We included a specific test that mocks an agent workflow.
cargo testIf all tests pass, the system is correctly set up.
Our team adopted a modular development strategy. We defined the Agent and Tool traits early on, allowing us to work in parallel on different components without stepping on each other's toes.
Primary Roles: Database layer, agent planner, MCP server wiring.
- Database Architecture: Designed the MySQL schema (
conversations,messages) and implemented the persistence layer usingsqlxwith connection pooling. - Planner Logic: Implemented the prompt engineering strategy that forces the LLM to output valid JSON plans. He wrote the robust parsing logic that handles malformed LLM responses and wire up the planner to the agent loop.
- MCP Server: Implemented the
mcp_server.rsmodule, creating a lightweight HTTP endpoint that exposes local tools to external clients.
Primary Roles: Tool registry, executor, LLM client integration.
- Executor Engine: Built the DAG execution logic. He handled the dependency resolution (ensuring steps wait for prerequisites) and error propagation (short-circuiting dependent steps on failure).
- Tool Registry: Implemented the dynamic dispatch system for tools and the base set of executors (
Shell,Git,Filesystem). He also implemented the "Deny List" safety mechanism. - LLM Client: Developed the async HTTP client (
src/websocket) to handle OpenAI-compatible chat completions, implementing the streaming response parser and health checks.
Primary Roles: Ratatui UI/UX, verifier, CLI/TUI glue.
- TUI Implementation: Built the
Ratatuiinterface (src/tui), handling complex layouts (conversation list, message pane, status bar) and async user input events. - Verification Layer: Implemented the
Verifiermodule to ensure all steps succeed before the agent responds. - Integration: Acted as the glue between the backend agent pipeline and the frontend. He ensured that tool execution results and streaming tokens were correctly rendered in the UI without blocking the main thread.
- The "Async" Learning Curve: Coming from Python/JS, we initially underestimated the complexity of Async Rust. We struggled with "future cannot be sent between threads safely" errors. We learned that in Rust, data shared across
.awaitpoints must beSendandSync. This forced us to restructure our application state to be more thread-safe, utilizingArcandTokiochannels (mpsc) effectively. - Structured Concurrency: We learned that spawning threads (or tasks) blindly is dangerous. We adopted patterns to manage groups of tasks (e.g., executing parallel tool calls) and ensure they are all cleaned up properly on exit.
- LLM Reliability: We found that letting the agent auto-fill tool parameters from natural language hints significantly reduced planner brittleness compared to expecting perfect JSON every time. We also learned to guard schema creation and seeding to keep startup reliable on fresh databases.
This project demonstrated that Rust is an exceptional choice for building AI agents. The strict compiler saved us countless times from concurrency bugs that would have plagued a Python or Node.js implementation. The resulting binary is snappy, instant-on, and memory-efficient.
We successfully achieved our goal of building a "System 2" CLI agent. It's not just a chatbot; it's a tool that respects the developer's environment. By exposing a local tool registry via a small HTTP endpoint, we've also opened the door for future integrations with other AI clients, making our tool a versatile part of the developer toolkit.