BotSmith

A Governed Multi-Agent System for Building Bots from Natural Language

BotSmith is a modular, production-oriented multi-agent framework that converts natural language requests into planned, validated, governed, and executable workflows. It is designed to demonstrate how autonomous agents can be built safely, with strong separation of concerns, deterministic execution, and full observability.

🎥 See It In Action

▶️ Watch Full Demo (YouTube) - See BotSmith orchestrate a multi-agent workflow from natural language to executable code.

What's demonstrated:

Natural language parsing & intent extraction
Multi-agent orchestration (Router → Planner → Validator → Executor)
Governance gates (cost estimation, security scanning)
Real-time workflow visualization
Memory persistence & session management

Quick Start

git clone https://github.com/blexyyyyy/botsmith.git
cd botsmith

# Set environment
export PYTHONPATH=$(pwd)

# Run integration tests
pytest tests/integration

# Run a sample workflow
python main.py

BotSmith is designed to be deterministic, observable, and safe by default.

Demo

Watch the BotSmith Demo Video to see the system in action.

Key Capabilities

Natural Language Interface

Parses user intent from plain English with confidence and ambiguity handling.

Planner–Compiler–Executor Architecture

Translates intent to plan to workflow to execution deterministically.

Hybrid Agent System

Logic-first agents for planning, validation, routing, execution
LLM-assisted agents where language reasoning is useful
Model-agnostic design via LLM abstraction

Governance Built In

Validation gates
Cost estimation
Security scanning
Workflow optimization

[NEW] Dynamic Tool Generation Engine

Automatically plans, builds, and validates safe, immutable, and test-verified custom tools during the build phase.

End-to-End Tested

Full integration tests covering the complete lifecycle, including the dynamic tool pipeline.

3-Layer Memory System (State + Preferences + Persistence)

Execution Context: Ephemeral, step-level state for tools and reasoning.
Session Memory: Short-term, workflow-scoped coordination across agents.
Long-Term Memory: Persistent, policy-gated storage for user preferences and project knowledge.

High-Level Architecture

Note

For a deep dive into our design philosophy, compiler-style lifecycle, and safety invariants, see architecture.md.

graph TD
    A[Natural Language Input] --> B[NLP Interpreter]
    B --> C[Router Agent]
    C --> D[Planner Agent]
    D --> E[Validator Agent]
    
    subgraph "Dynamic Tool Pipeline (v1.1)"
        E --> T1[Tool Plan Agent]
        T1 --> T2[Tool Builder Agent]
        T2 --> T3[Tool Validator Agent]
    end
    
    T3 --> F[Workflow Compiler]
    F --> G[Optimizer / Cost / Security]
    G --> H[Workflow Executor]
    H --> I[Memory Manager]
    I --> J[(SQLite Persistence)]

Each stage is explicit, testable, and replaceable.

Core Design Principles

SOLID and Clean Architecture
Dependency Inversion (interfaces over implementations)
Deterministic execution
No blind trust in LLMs
Governed autonomy over raw autonomy

Agent Types

Core Logic Agents

RouterAgent: selects the appropriate workflow
PlannerAgent: generates a structured execution plan
ValidatorAgent: validates plans and invariants
WorkflowCompilerAgent: compiles plans into executable workflows
WorkflowExecutor: executes workflows step by step
CostEstimatorAgent: estimates and gates execution cost
SecurityAgent: blocks unsafe operations
WorkflowOptimizerAgent: reorders and deduplicates steps

Tool Generation Agents (NEW)

ToolPlanAgent: plans tool specifications from requirements
ToolBuilderAgent: generates isolated Python tool code and unit tests
ToolValidatorAgent: validates generated tools via test execution and AST scanning

NLP Agent

NLPInterpreterAgent: LLM-assisted semantic parsing, Schema validation, Confidence and ambiguity handling

LLM Support

Gated Memory: Multi-layer scoped storage with policy-enforced writes.
Local inference via Ollama
Cloud-ready design (Groq, Gemini, OpenAI supported via abstraction)

Example End-to-End Flow

User: "Build a Python weather bot"

NLP extracts intent
Router selects bot_creation_workflow
Planner generates steps
Validator enforces correctness
Compiler builds workflow
Cost and security checks pass
Executor runs steps
Execution persisted to database

Running Tests

Create and activate a virtual environment

python -m venv .venv
source .venv/bin/activate
# Windows: .venv\Scripts\activate

Install dependencies

pip install -r requirements.txt

Run integration tests

pytest tests/integration

Run full end-to-end test

pytest tests/integration/test_end_to_end_creation.py

Current Status

3-Layer Memory System: Execution, Session, and Long-Term Persistent memory implemented.
Hybrid Multi-Agent Core: Fully implemented and end-to-end tested.
Workflow Governance: Cost estimation, security scanning, and optimization gates active.
[UPGRADE] Dynamic Tool Generation: Fully integrated Plan-Build-Validate engine for custom tools.
[UPGRADE] Dependency Injection Architecture: Refactored AgentFactory as a production-grade Composition Root for better testability and decoupled agent logic.
API & UI: FastAPI implementation and React/Vite visualization (botsmith-ui) in progress.
Workflow Persistence: SQLite-backed audit trails and session history functional.

Planned

Dynamic Agent Synthesis: Auto-generation of specialized agents based on task complexity.
Human-in-the-Loop: Interactive control gates for high-risk operations.
Adaptive Execution: Real-time workflow adjustment based on tool feedback.
Advanced visualization: Enhanced pipeline and agent state monitoring.

Architecture & Verification

BotSmith follows a strictly decoupled, local-first architecture designed for stability and auditability.

🛠 Refined Structure

Core Abstractions: Foundational interfaces remain in botsmith/core/.
Concrete Packages: Driver-level logic is promote to botsmith/llm/, botsmith/memory/, and botsmith/utils/.
Standardized Imports: 100% absolute import paths ensure reliable module resolution.

Advanced Memory Contract

Policy-Gated: Agents only propose state changes; MemoryManager enforces MemoryPolicy.
Multi-Layer Persistence:
- EXECUTION: Ephemeral step state.
- SESSION: Workflow coordination.
- PROJECT/USER: SQLite-backed long-term storage (verified to survive restarts).

Verification Baseline

All core systems are verified via automated integration suites:

Persistence: test_agent_memory_persistence.py confirms interaction logs route to disk.
Workflow: test_workflow_execution.py validates the full Factory -> Executor pipeline.
Governance: test_governance_agents.py verifies cost/security gates.

Codebase frozen at v1.1.0-tool-generation-upgrade

Project Structure

botsmith/
├── core/               # Interfaces, base classes, utilities
├── agents/             # Specialized agents
├── workflows/          # Workflow compiler and executor
├── nlp/                # NLP parsing and intent normalization
├── persistence/        # SQLite persistence layer
├── factory/            # Agent and workflow factories
├── tests/              # Unit, integration, end-to-end tests
├── botsmith-ui/        # React + Vite + Tailwind CSS Frontend
└── main.py

Why This Project Exists

Most AI agent projects focus on prompting. BotSmith focuses on systems design.

The goal is to demonstrate how autonomous systems can be structured, governed, tested, and safely extended.

This repository is intended as a portfolio-grade systems project, not a product demo.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.vscode		.vscode
botsmith-ui		botsmith-ui
botsmith		botsmith
scripts		scripts
tests		tests
.gitignore		.gitignore
README.md		README.md
architecture.md		architecture.md
pyproject.toml		pyproject.toml
run.py		run.py

blexyyyyy/botsmith

Folders and files

Latest commit

History

Repository files navigation