Final Report: Simple LLM-Powered CLI

Course: ECE1724 - Special Topics in Software Engineering - Performant Software Systems with Rust
Project Name: Simple LLM-Powered CLI
Date: December 2025

Video Slide Presentation

https://youtu.be/ZKFJP1hVH1U

Video Demo

https://youtu.be/7lKnnqnLKbU

Team Members

Name	Student Number	Preferred Email
Peixuan Li	1006719464	adampeixuan.li@mail.utoronto.ca
Zhengyang Li	1012373977	zhengyang.li@mail.utoronto.ca
Yanchi Wang	1006085264	yanchi.wang@mail.utoronto.ca

1. Motivation

The Gap in the Ecosystem

In recent years, the intersection of Artificial Intelligence and software engineering has exploded. Large Language Models (LLMs) like GPT-4 and Claude 3 have transformed how developers write code. However, the current landscape of AI tools for developers is polarized. On one end, we have web-based chatbots (ChatGPT, Claude.ai) that are powerful but isolated—they cannot see your local file system, check your git status, or run your tests. On the other end, we have desktop applications wrapped in web technologies (Electron), such as VS Code extensions or standalone AI clients. While functional, these applications are notoriously resource-heavy. A simple chat client can easily consume 500MB+ of RAM simply because it bundles an entire Chromium browser and Node.js runtime.

As students of system engineering, we asked ourselves: Why must we sacrifice performance for intelligence? The terminal is the native habitat of developers—it is lightweight, fast, and composable. Yet, most "CLI AI" tools today are merely thin wrappers around Python scripts that pipe text to an API, lacking state management, interactivity, or the ability to perform complex, multi-step tasks autonomously.

The Rust Advantage

This project was born from the desire to build a "System 2" thinker for the command line using Rust. We chose Rust not just for its hype, but for specific technical advantages that align perfectly with an AI agent's needs:

Memory Safety without GC: AI agents often run long-lived background processes (monitoring, waiting for tokens). Rust’s ownership model ensures we don't leak memory or suffer from garbage collection pauses that ruin the TUI experience.
Fearless Concurrency: An effective agent needs to stream LLM tokens, query a database, and execute shell commands simultaneously. Rust’s Tokio runtime allows us to handle thousands of async tasks with a tiny footprint.
Type-Driven Robustness: Dealing with LLM outputs is messy (hallucinated JSON, malformed strings). Rust’s strong type system (Serde, Enums) forces us to handle every edge case at compile time, resulting in a system that rarely crashes in production.

Our motivation is to prove that high-performance systems engineering principles can be applied to AI tooling, creating an assistant that feels like a native extension of the operating system rather than a bloated web page.

2. Objectives

The overarching goal of this project was to engineer a Simple LLM-Powered CLI that brings "Agentic" capabilities to the local terminal environment. We broke this down into four concrete technical objectives:

1. High-Performance Architecture

Target: Achieve a startup time of under 200ms and a resting memory footprint of less than 50MB.
Implementation: Eliminate heavy runtimes. Use a compiled binary with zero external dependencies (other than the OS and a database).

2. Autonomous Agentic Workflow

Target: Move beyond simple "Chat" to "Action". The system must be able to plan a sequence of actions to solve a goal.
Challenge: LLMs are stateless predictors. We needed to build a "Planner" engine that can translate a user's high-level intent (e.g., "Refactor this module") into a Directed Acyclic Graph (DAG) of atomic steps (read file -> analyze -> write file -> run tests).

3. Native Tool Integration

Target: Give the AI "hands" to interact with the OS.
Scope: Implement a secure "Tool Registry" that allows the LLM to call filesystem APIs, shell commands, and git operations. Critically, this must be secure—preventing the AI from running dangerous commands like rm -rf / without explicit user oversight.

4. Robust State Management

Target: Persistent context awareness.
Implementation: Unlike simple CLIs that lose context when the window closes, our system must use an embedded or local database (MySQL) to store conversation history, allowing the user to resume complex tasks days later.

3. Features

Our final deliverable is a comprehensive CLI application that is more than just a chatbot. It is a workspace assistant with the following key features:

Core Feature: The Plan-Execute-Verify Loop

The defining feature of our project is its agentic loop. When a user inputs a complex request, the system does not immediately stream a text response. Instead:

Planning Phase: The Planner module prompts the LLM to generate a JSON-structured plan. For example, if asked to "Summarize the README," it creates a plan: [Step 1: fs_read(README.md), Step 2: summarize(content)]. It automatically fills tool parameters from natural language hints.
Execution Phase: The Executor processes these steps. It handles dependencies—Step 2 cannot start until Step 1 succeeds. It essentially acts as a task scheduler for the AI.
Verification Phase: A Verifier analyzes the output of the execution. Did the tool return an error code? Did the file actually get created? Only after verification does the system formulate a final response to the user.

Comprehensive Tool Registry (15+ Tools)

We implemented a modular "Tool" trait system, enabling us to easily plug in new capabilities. The system currently ships with:

Filesystem Tools: filesystem, fs_ls, fs_cat, fs_write, fs_mkdir, fs_rm. These provide a safe abstraction over std::fs.
Developer Tools: git (for status checks and logs), shell (for safe, non-interactive command execution).
Network & Data: web_fetch (for retrieving documentation or web pages), database (for querying its own chat history).
Extensions: editor (ACP-like local file ops), mcp (Model Context Protocol proxy), and text_writer.
Security: A hardcoded "Deny List" middleware intercepts every tool call. If the LLM attempts to invoke a blacklisted command (e.g., delete, format), the execution is halted immediately with a security warning.

Hybrid User Interface

We recognized that different contexts require different interfaces, so we built three:

TUI (Text User Interface): Built with Ratatui. This is the primary mode. It features:
- Async Event Loop: The UI remains responsive (scrollable, resizeable) even while the LLM is streaming tokens or a heavy database query is running in the background.
- Rich Layout: Includes panels for conversation list, message log, execution plan visualization, and tool output status.
- History Sidebar: Allows navigation through past conversations stored in MySQL.
CLI Mode: A standard REPL (Read-Eval-Print Loop) for users who prefer a raw terminal experience or want to pipe input/output (cargo run -- --cli).
One-Shot Mode: A "fire and forget" mode (e.g., cargo run -- --once "Check git status"). This is designed to be integrated into shell scripts or other tools.

Infrastructure & Connectivity

Streaming LLM Client: We implemented a robust HTTP client to communicate with OpenAI-compatible endpoints (/v1/chat/completions). This supports full streaming response parsing for responsive output and health checks.
Local MCP Server: An opt-in feature (ENABLE_LOCAL_MCP_SERVER=true) where our CLI acts as a server, exposing its local tools via POST /tools/{name}/invoke to other AI clients (like Claude Desktop).
Persistence: MySQL-backed storage for conversations and messages, ensuring seamless resumption of tasks across sessions.

4. User's Guide

Installation & Setup

Since this is a Rust binary, installation is straightforward.

Environment: Ensure you have a MySQL server running and an OpenAI-compatible LLM endpoint (like LM Studio or Ollama).
Config: Create a .env file (copy from this.env) with DATABASE_URL and LLM_BASE_URL.
Run: cargo run --release.

Using the TUI

Navigation: Use Up/Down keys to scroll the conversation list. PageUp/PageDown to scroll messages.
Input: The input box supports multi-line editing. Press Enter to send.
Shortcuts: Ctrl+N to start a new conversation, Ctrl+D to delete the current one, q to quit.
Workflow Visualization: When the agent enters "Agentic Mode," you will see a collapsible section showing the "Plan," "Execution," and "Tool Results" in real-time.

Using Tools in Conversation

You don't need to explicitly invoke tools. Just ask in natural language:

"What files are in the src directory?" -> The Agent automatically calls fs_ls.
"Check my git status and tell me what's changed." -> The Agent calls git status.
"Fetch the content of rust-lang.org." -> The Agent calls web_fetch.

Troubleshooting

"Database Connection Failed": Check if your MySQL user has permissions. The app tries to auto-migrate schema, but it needs CREATE TABLE privileges.
"LLM returns nonsense": Ensure you are using a model capable of tool calling (e.g., Mistral-Instruct or Llama 3). Smaller models may struggle to generate valid JSON plans.
"Connection Refused": Check if your local LLM server (e.g., LM Studio) is running and the port in .env matches.

5. Reproducibility Guide

To ensure full reproducibility, we have strictly defined our dependencies and build process.

Runtime Environment

OS: Tested on Ubuntu 22.04 LTS and macOS Sonoma 14.x. Windows is supported but requires a properly configured terminal (Windows Terminal) for TUI rendering.
Rust: Version 1.75.0 or later.
Database: MySQL 8.0 or MariaDB 10.5.
LLM: Any server compliant with OpenAI Chat Completions API. We verified reproducibility using Ollama v0.1.30 running mistral:latest and LM Studio running qwen2.5-7b-instruct.

Build Instructions

Clone Source:
```
git clone <repo_url>
cd ece1724Rust
```

Setup Database:

mysql -u root -p -e "CREATE DATABASE IF NOT EXISTS llm_cli; CREATE USER IF NOT EXISTS 'llmuser'@'%' IDENTIFIED BY 'llmpassword'; GRANT ALL ON llm_cli.* TO 'llmuser'@'%'; FLUSH PRIVILEGES;"

Compile: We use cargo for reproducible builds. The Cargo.lock file ensures you get the exact same dependency versions we used.
```
cargo build --release
```
Note: The first build pulls crates like tokio and sqlx, which may take 2-5 minutes depending on network speed.
Verification: Run the integration test suite. We included a specific test that mocks an agent workflow.
```
cargo test
```
If all tests pass, the system is correctly set up.

6. Contributions by Team Member

Our team adopted a modular development strategy. We defined the Agent and Tool traits early on, allowing us to work in parallel on different components without stepping on each other's toes.

Peixuan Li (Database & Planner Architect)

Primary Roles: Database layer, agent planner, MCP server wiring.

Database Architecture: Designed the MySQL schema (conversations, messages) and implemented the persistence layer using sqlx with connection pooling.
Planner Logic: Implemented the prompt engineering strategy that forces the LLM to output valid JSON plans. He wrote the robust parsing logic that handles malformed LLM responses and wire up the planner to the agent loop.
MCP Server: Implemented the mcp_server.rs module, creating a lightweight HTTP endpoint that exposes local tools to external clients.

Zhengyang Li (Executor & Infrastructure Lead)

Primary Roles: Tool registry, executor, LLM client integration.

Executor Engine: Built the DAG execution logic. He handled the dependency resolution (ensuring steps wait for prerequisites) and error propagation (short-circuiting dependent steps on failure).
Tool Registry: Implemented the dynamic dispatch system for tools and the base set of executors (Shell, Git, Filesystem). He also implemented the "Deny List" safety mechanism.
LLM Client: Developed the async HTTP client (src/websocket) to handle OpenAI-compatible chat completions, implementing the streaming response parser and health checks.

Yanchi Wang (Frontend & Integration Specialist)

Primary Roles: Ratatui UI/UX, verifier, CLI/TUI glue.

TUI Implementation: Built the Ratatui interface (src/tui), handling complex layouts (conversation list, message pane, status bar) and async user input events.
Verification Layer: Implemented the Verifier module to ensure all steps succeed before the agent responds.
Integration: Acted as the glue between the backend agent pipeline and the frontend. He ensured that tool execution results and streaming tokens were correctly rendered in the UI without blocking the main thread.

7. Lessons Learned and Concluding Remarks

Technical Lessons

The "Async" Learning Curve: Coming from Python/JS, we initially underestimated the complexity of Async Rust. We struggled with "future cannot be sent between threads safely" errors. We learned that in Rust, data shared across .await points must be Send and Sync. This forced us to restructure our application state to be more thread-safe, utilizing Arc and Tokio channels (mpsc) effectively.
Structured Concurrency: We learned that spawning threads (or tasks) blindly is dangerous. We adopted patterns to manage groups of tasks (e.g., executing parallel tool calls) and ensure they are all cleaned up properly on exit.
LLM Reliability: We found that letting the agent auto-fill tool parameters from natural language hints significantly reduced planner brittleness compared to expecting perfect JSON every time. We also learned to guard schema creation and seeding to keep startup reliable on fresh databases.

Conclusion

This project demonstrated that Rust is an exceptional choice for building AI agents. The strict compiler saved us countless times from concurrency bugs that would have plagued a Python or Node.js implementation. The resulting binary is snappy, instant-on, and memory-efficient.

We successfully achieved our goal of building a "System 2" CLI agent. It's not just a chatbot; it's a tool that respects the developer's environment. By exposing a local tool registry via a small HTTP endpoint, we've also opened the door for future integrations with other AI clients, making our tool a versatile part of the developer toolkit.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
src		src
test		test
.env		.env
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
note.txt		note.txt
proposal.md		proposal.md
this.env		this.env

Folders and files

Latest commit

History

Repository files navigation

Final Report: Simple LLM-Powered CLI

Video Slide Presentation

Video Demo

Team Members

1. Motivation

The Gap in the Ecosystem

The Rust Advantage

2. Objectives

1. High-Performance Architecture

2. Autonomous Agentic Workflow

3. Native Tool Integration

4. Robust State Management

3. Features

Core Feature: The Plan-Execute-Verify Loop

Comprehensive Tool Registry (15+ Tools)

Hybrid User Interface

Infrastructure & Connectivity

4. User's Guide

Installation & Setup

Using the TUI

Using Tools in Conversation

Troubleshooting

5. Reproducibility Guide

Runtime Environment

Build Instructions

6. Contributions by Team Member

Peixuan Li (Database & Planner Architect)

Zhengyang Li (Executor & Infrastructure Lead)

Yanchi Wang (Frontend & Integration Specialist)

7. Lessons Learned and Concluding Remarks

Technical Lessons

Conclusion

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages