Skip to content

HKUDS/FastAgent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

FastAgent Logo

FastAgent: Simple, Fast, and Strong LLM Agents

Platform Python License Feishu WeChat

🎯 FastAgent's Mission

FastAgent is designed to tackle complex tasks that require both DeepResearch and Computer Use capabilities. While DeepResearch excels at web search, knowledge summarization, and reasoning, and Computer Use focuses on operating complex software applications, many real-world tasks demand both capabilities β€” such as:

πŸ“Š Business Intelligence Reports:

  • Research industry trends and competitor insights across multiple data sources.
  • Generate automated PowerPoint reports and dynamic dashboards with real-time analytics and visual insights.

πŸ“… Event Planning & Management:

  • Research and compare venues and vendors based on cost and requirements.
  • Create Excel budgets with cost tracking and manage project schedules.

πŸ›’ Smart Shopping & Price Optimization:

  • Compare products, reviews, and prices across e-commerce platforms.
  • Automate shopping, apply discounts, and track delivery schedules.

FastAgent bridges this gap by seamlessly integrating intelligent research capabilities with sophisticated computer operation, enabling users to complete end-to-end workflows that span from information gathering and analysis to practical software manipulationβ€”all within a Unified, Simple, and Fast framework.


πŸ’‘ Current Challenges in Multi-Agent Systems

Current agent frameworks face significant challenges when tackling complex, multi-step, real-world tasks:

⚑ Performance Bottlenecks:

  • Slow & Unreliable GUI Operations. Complex tasks that humans complete in dozens of steps require hundreds of observe-decide-execute cycles. GUI grounding is particularly slow, error-prone, and frequently freezes when operating real-world interfaces.
  • Limited Task Scope. Current approaches constrain agents to narrow, predefined use cases rather than generalizing across diverse scenarios.

❌ High Failure Rates:

  • Error Accumulation. Multi-step workflows accumulate cascading errors across execution stages.
  • Fragile Cross-Application Transitions. Transitions between heterogeneous software and data sources often cause complete task breakdowns.

🧩 Complex Context Management:

  • Overwhelming Multi-Context Load. Multi-faceted tasks generate overwhelming knowledge and tool contexts beyond current systems' capacity.
  • Lack of Unified Processing. Systems cannot handle heterogeneous contexts from DeepResearch and Computer Use paradigms, or dynamically integrate diverse user-provided tools, severely limiting extensibility.

πŸ”’ Limited Orchestration & Adaptability:

  • Lack of Unified Coordination. Current frameworks lack generalized coordination mechanisms and cannot dynamically adjust to varying task requirements. Each workflow type demands different coordination patterns.
  • Manual Workflow Design. Existing systems require complex, task-specific workflow engineering for each scenario, making generalized multi-agent orchestration extremely difficult.

πŸš€ FastAgent's Key Innovations

FastAgent addresses these critical challenges through three Efficient and Effective solutions focusing on Memory Mechanism, Tool Integration, and Multi-Agent Coordination:

🧠 Advanced Memory & Context Management:

Solving Complex Context Handling Challenges

  • Adaptive Multi-Tier Memory: Maintains step-, agent-, task- and response-level stores, revealing only the granularity each reasoning hop needs while guarding against knowledge dilution across sprawling workflows
  • Intelligent Context Switching: Seamlessly bridges heterogeneous contexts from DeepResearch and Computer Use paradigms, auto-classifying outputs so agents pivot between data mining and GUI actions without losing critical breadcrumbs
  • Smart Compression with Dynamic Budgeting: LLM-powered summarizer adaptively compresses based on content purpose: keeps structured data and key decisions for planning tasks, preserves execution details for validation, prunes repetitive operations. Triggers precisely at token thresholds so massive multi-facet contexts never clog the framework
  • Incremental Cross-Task Knowledge: Promotes vector embeddings and structured findings to a shared, ever-growing pool updated on every tool call, so future tasks query prior insights instantly and slash redundant computation

πŸ”§ Effortless Tool Integration with Smart Orchestration:

Eliminating Integration Difficulties & Performance Bottlenecks

  • Smart Tool RAG System: Precisely retrieves relevant tools from hundreds of available options across all backends (Shell, GUI, MCP, Web), enabling efficient tool selection and reducing context overhead
  • Unified Backend Architecture: Provides consistent interface for Shell, GUI, MCP, and Web. Dynamically integrates diverse user-provided tools without manual adaptation, solving extensibility challenges through generalized tool abstraction and lifecycle management
  • Zero-Config MCP Support: Plug-and-play MCP server integration without complex setup. Just declare servers in config and FastAgent handles connections, protocol negotiation, and session pooling

🎯 Dynamic Multi-Agent Coordination

Overcoming Limited Orchestration & High Failure Rates

  • Event-Driven Kanban Architecture: Provides generalized coordination mechanisms with rule-based workflow routing that dynamically adjusts to varying task requirements. Enables seamless agent communication through shared task states, eliminating manual workflow engineering for every use case
  • Selective Quality Assurance: Dedicated EvalAgent activates only when needed (e.g., critical operations, error-prone backends) rather than every step, preventing error accumulation while maintaining efficiency. Ensures reliable transitions between software applications and data sources

πŸ“‹ Table of Contents


πŸ“Š System Overview

FastAgent Framework

FastAgent employs a simple and fast multi-agent event-driven architecture that seamlessly coordinates research and execution capabilities through five core components:

πŸ—οΈ Core Architecture

1. 🀝 Multi-Agent Coordination

Dynamic orchestration with intelligent task management

Specialized Agents:

  • HostAgent: High-level planning and task decomposition with dependency tracking
  • GroundingAgent: Cross-backend execution (Shell, GUI, MCP, Web) with smart tool selection
  • EvalAgent: Continuous quality assurance and automatic replanning

Event-Driven Workflow:

  • State Management: TODO β†’ IN_PROGRESS β†’ DONE/BLOCKED lifecycle
  • Dependency Tracking: Automatic execution ordering based on task dependencies
  • Rule-Based Routing: Dynamic agent triggering based on task type and state
  • Failure Recovery: Automatic replanning when tasks fail validation

2. πŸ”§ Unified Tool Integration

Seamless plug-and-play ecosystem with 100+ tools

  • Multi-Backend Architecture: Unified interface across Shell, GUI, MCP, and Web
  • Smart Tool RAG: Semantic search retrieves relevant tools from hundreds of options
  • Zero-Config MCP: Add servers to config file, FastAgent handles discovery and routing
  • Session Pooling: Efficient resource management and connection reuse

3. 🧠 Advanced Memory & Context

Intelligent compression and cross-phase awareness

  • Multi-Level Storage: Task context, agent history, and processed results
  • Smart Compression: Automatically manages overwhelming multi-faceted contexts
  • Context Switching: Seamless transitions between research and operation phases
  • Historical Awareness: Agents access and learn from previous execution results

4. πŸ” Intelligent Search

Multi-source information retrieval and aggregation

  • Web Integration: Built-in search and browsing capabilities
  • Tool Discovery: Semantic matching across all available tools and MCP servers
  • Knowledge Retrieval: Context-aware search through execution history
  • Source Fusion: Combines web, local files, APIs, and databases

5. πŸ›‘οΈ Security & Control

Enterprise-grade safety with comprehensive audit

  • Command Filtering: Whitelist/blacklist with pattern matching
  • User Approval: Optional confirmation for sensitive operations
  • Access Control: Granular permissions per backend and operation
  • Complete Audit: Full logging with trajectory recording and video capture

🎯 Quick Start

1. Environment Setup

# Clone repository
git clone https://github.com/HKUDS/FastAgent.git
cd FastAgent

# Create and activate conda environment
conda create -n fastagent python=3.12 -y
conda activate fastagent

# Install dependencies
pip install -r requirements.txt

Note

Create a .env file and add your API keys (refer to fastagent/.env.example).

2. Launch FastAgent

Start Local Server (Required for Computer Control)

The local server is a lightweight Flask service that enables FastAgent to interact with your computer (GUI automation, Python/Bash execution, file operations, screen capture, etc.).

Note

See fastagent/local_server/README.md for complete API documentation and advanced configuration.

Important

Platform-specific setup required: Different operating systems need different dependencies for desktop control. Please install the required dependencies for your OS before starting the local server:

macOS Setup
# Install macOS-specific dependencies
pip install pyobjc-core pyobjc-framework-cocoa pyobjc-framework-quartz atomacos

Permissions Required: macOS will automatically prompt for permissions when you first run the local server. Grant the following:

  • Accessibility (for GUI control)
  • Screen Recording (for screenshots and video capture)

If prompts don't appear, manually grant permissions in System Settings β†’ Privacy & Security.

Linux Setup
# Install Linux-specific dependencies
pip install python-xlib pyatspi numpy

# Install system packages
sudo apt install at-spi2-core python3-tk scrot
Windows Setup
# Install Windows-specific dependencies
pip install pywinauto pywin32 PyGetWindow

After installing the platform-specific dependencies, start the local server:

python -m fastagent.local_server.main

Tip

Local server is required for GUI automation and Python/Bash execution. Without it, only MCP servers and web research capabilities are available.

Start FastAgent

Then, launch the FastAgent main process in another terminal:

python -m fastagent

3. Execute Any Task You Want πŸ€—

Simply type your task in natural language. FastAgent seamlessly combines DeepResearch and Computer Use to handle complex end-to-end workflows:

Tip

MCP Server Configuration: For tasks requiring specific tools, add relevant MCP servers to fastagent/config/config_mcp.json. Unsure which servers to add? Simply add all potentially useful ones, FastAgent's Smart Tool RAG will automatically select the appropriate tools for your task. See MCP Configuration for details.

Simple Example:

>>> Please help me search HKUDS on Google.

Complex Example - AI Coding Assistants Competitive Analysis

>>> Create a competitive analysis report for AI coding assistants.

Research these 3 products: GitHub Copilot, Cursor, Claude Code.

For each product, find and verify:
1. Supported programming languages and IDE integrations
2. Pricing tiers (in USD) and token limits
3. Key differentiating features (refactoring, test generation, chat capabilities)
4. Security and privacy guarantees

Then create an Excel workbook named "AI_Coding_Assistants_Analysis.xlsx" with:
- Sheet 1: Feature comparison matrix (products as rows, capabilities as columns)
- Sheet 2: Pricing breakdown with cost analysis

Also create a "executive_summary.md" file highlighting the best choice and why.

πŸ—οΈ Code Structure

πŸ“– Quick Overview

Legend: ⚑ Core modules | πŸ”§ Supporting modules

FastAgent/
β”œβ”€β”€ fastagent/
β”‚   β”œβ”€β”€ __init__.py                       # Package exports
β”‚   β”œβ”€β”€ __main__.py                       # CLI entry point
β”‚   β”œβ”€β”€ fastagent.py                      # Main FastAgent class
β”‚   β”‚
β”‚   β”œβ”€β”€ ⚑ agents/                         # Multi-Agent System 
β”‚   β”œβ”€β”€ ⚑ workflow/                       # Event-Driven Workflow 
β”‚   β”œβ”€β”€ ⚑ kanban/                         # Task Management System 
β”‚   β”œβ”€β”€ ⚑ grounding/                      # Unified Backend System
β”‚   β”‚   β”œβ”€β”€ core/                         # Core abstractions
β”‚   β”‚   └── backends/                     # Backend implementations
β”‚   β”‚       β”œβ”€β”€ shell/                    # Shell command execution
β”‚   β”‚       β”œβ”€β”€ gui/                      # Anthropic Computer Use
β”‚   β”‚       β”œβ”€β”€ mcp/                      # Model Context Protocol
β”‚   β”‚       └── web/                      # Web search & browsing
β”‚   β”‚
β”‚   β”œβ”€β”€ ⚑ memory/                         # Memory & Storage
β”‚   β”œβ”€β”€ πŸ”§ llm/                           # LLM Integration
β”‚   β”œβ”€β”€ πŸ”§ config/                        # Configuration System 
β”‚   β”œβ”€β”€ πŸ”§ local_server/                  # GUI Backend Server 
β”‚   β”œβ”€β”€ πŸ”§ recording/                     # Execution Recording
β”‚   β”œβ”€β”€ πŸ”§ platform/                      # Platform Integration
β”‚   └── πŸ”§ utils/                         # Utilities
β”‚
β”œβ”€β”€ .fastagent/                           # Runtime cache
β”‚   └── embedding_cache/                  # Tool embeddings for Smart Tool RAG
β”‚
β”œβ”€β”€ logs/                                 # Execution logs and recordings
β”‚   β”œβ”€β”€ __main__/                         # Application logs
β”‚   └── recordings/                       # Complete execution audit trail
β”‚
β”œβ”€β”€ requirements.txt                      # Python dependencies
└── README.md                            

πŸ“‚ Detailed Module Structure

⚑ agents/ - Multi-Agent System (Core Architecture)
agents/
β”œβ”€β”€ __init__.py
β”œβ”€β”€ base.py                         # Base agent class with common functionality
β”œβ”€β”€ coordinator.py                  # Agent coordination & resource management
β”œβ”€β”€ host_agent.py                   # High-level planning and task decomposition
β”œβ”€β”€ grounding_agent.py              # Cross-backend task execution
β”œβ”€β”€ eval_agent.py                   # Automatic evaluation and quality assurance
β”œβ”€β”€ content_processor.py            # Intelligent content processing
└── agent_data_manager.py           # Agent data storage and retrieval

Key Responsibilities: Task planning, execution, evaluation, and inter-agent coordination.

⚑ workflow/ - Event-Driven Workflow Engine
workflow/
β”œβ”€β”€ __init__.py
β”œβ”€β”€ engine.py                       # Event-driven workflow orchestration
β”œβ”€β”€ rules.py                        # Workflow rule definitions and routing
└── context_manager.py              # Cross-execution context management

Key Responsibilities: Event processing, rule-based routing, state transitions, and context preservation.

⚑ kanban/ - Task Management System
kanban/
β”œβ”€β”€ __init__.py
β”œβ”€β”€ kanban.py                       # Task board and state management
└── enums.py                        # Task types, states, and events

Key Responsibilities: Task lifecycle management, dependency tracking, and state transitions (TODO β†’ IN_PROGRESS β†’ DONE/BLOCKED).

⚑ grounding/ - Unified Backend System (Core Integration Layer)

Core Abstractions

grounding/core/
β”œβ”€β”€ grounding_client.py             # Unified interface across all backends
β”œβ”€β”€ provider.py                     # Abstract provider base class
β”œβ”€β”€ session.py                      # Session lifecycle management
β”œβ”€β”€ search_tools.py                 # Smart Tool RAG for semantic search
β”œβ”€β”€ exceptions.py                   # Custom exception definitions
β”œβ”€β”€ types.py                        # Shared type definitions
β”‚
β”œβ”€β”€ tool/                           # Tool abstraction layer
β”‚   β”œβ”€β”€ base.py                     # Tool base class
β”‚   β”œβ”€β”€ local_tool.py               # Local tool implementation
β”‚   └── remote_tool.py              # Remote tool implementation
β”‚
β”œβ”€β”€ security/                       # Security & sandboxing πŸ”§
β”‚   β”œβ”€β”€ policies.py                 # Security policy enforcement
β”‚   β”œβ”€β”€ sandbox.py                  # Sandbox abstraction
β”‚   └── e2b_sandbox.py              # E2B sandbox integration
β”‚
β”œβ”€β”€ system/                         # System-level provider
β”‚   β”œβ”€β”€ provider.py
β”‚   └── tool.py
β”‚
└── transport/                      # Transport layer abstractions πŸ”§
    β”œβ”€β”€ connectors/
    β”‚   β”œβ”€β”€ base.py
    β”‚   └── aiohttp_connector.py
    └── task_managers/
        β”œβ”€β”€ base.py
        β”œβ”€β”€ async_ctx.py
        β”œβ”€β”€ aiohttp_connection_manager.py
        └── placeholder.py

Backend Implementations

Shell Backend - Command execution via local server
backends/shell/
β”œβ”€β”€ provider.py                     # Shell provider implementation
β”œβ”€β”€ session.py                      # Shell session management
└── transport/
    └── connector.py                # HTTP connector to local server
GUI Backend - Anthropic Computer Use integration
backends/gui/
β”œβ”€β”€ provider.py                     # GUI provider implementation
β”œβ”€β”€ session.py                      # GUI session management
β”œβ”€β”€ tool.py                         # GUI-specific tools
β”œβ”€β”€ anthropic_client.py             # Anthropic API client wrapper
β”œβ”€β”€ anthropic_utils.py              # Utility functions
β”œβ”€β”€ config.py                       # GUI configuration
└── transport/
    β”œβ”€β”€ connector.py                # Computer Use API connector
    └── actions.py                  # Action execution logic
MCP Backend - Model Context Protocol servers
backends/mcp/
β”œβ”€β”€ provider.py                     # MCP provider implementation
β”œβ”€β”€ session.py                      # MCP session management
β”œβ”€β”€ client.py                       # MCP client
β”œβ”€β”€ config.py                       # MCP configuration loader
β”œβ”€β”€ installer.py                    # MCP server installer
β”œβ”€β”€ tool_converter.py               # Convert MCP tools to unified format
└── transport/
    β”œβ”€β”€ connectors/                 # Multiple transport types
    β”‚   β”œβ”€β”€ base.py
    β”‚   β”œβ”€β”€ stdio.py                # Standard I/O connector
    β”‚   β”œβ”€β”€ http.py                 # HTTP connector
    β”‚   β”œβ”€β”€ websocket.py            # WebSocket connector
    β”‚   β”œβ”€β”€ sandbox.py              # Sandboxed connector
    β”‚   └── utils.py
    └── task_managers/              # Protocol-specific managers
        β”œβ”€β”€ stdio.py
        β”œβ”€β”€ sse.py
        β”œβ”€β”€ streamable_http.py
        └── websocket.py
Web Backend - Search and browsing
backends/web/
β”œβ”€β”€ provider.py                     # Web provider implementation
└── session.py                      # Web session management

Key Responsibilities: Unified tool abstraction, backend routing, session pooling, and Smart Tool RAG.

⚑ memory/ - Memory & Storage
memory/
β”œβ”€β”€ __init__.py
β”œβ”€β”€ storage_manager.py              # Unified storage interface
β”œβ”€β”€ memory.py                       # Memory management and context
└── summarizer.py                   # Content compression and summarization

Key Responsibilities: Multi-level storage, context compression, and historical awareness.

πŸ”§ llm/ - LLM Integration
llm/
β”œβ”€β”€ __init__.py
└── client.py                       # LiteLLM wrapper with retry logic
πŸ”§ config/ - Configuration System
config/
β”œβ”€β”€ __init__.py
β”œβ”€β”€ loader.py                       # Configuration file loader
β”œβ”€β”€ constants.py                    # System constants
β”œβ”€β”€ grounding.py                    # Grounding configuration dataclasses
β”œβ”€β”€ utils.py                        # Configuration utilities
β”‚
β”œβ”€β”€ config_grounding.json           # Backend-specific settings
β”œβ”€β”€ config_agents.json              # Agent configurations
β”œβ”€β”€ config_workflow.json            # Workflow engine settings
β”œβ”€β”€ config_mcp.json                 # MCP server definitions
β”œβ”€β”€ config_security.json            # Security policies
└── config_dev.json.example         # Development config template

Key Configuration Files:

config_mcp.json - MCP Server Registry

{
  "mcpServers": {
    "ppt": {
      "command": "uvx",
      "args": ["--from", "office-powerpoint-mcp-server", "ppt_mcp_server"],
      "env": {}
    }
  }
}

Defines all MCP servers with connection details (command/args for stdio, url for HTTP/WebSocket). Supports environment variable substitution via ${VAR_NAME}.

config_agents.json - Agent Definitions

{
  "agents": [
    {
      "name": "HostAgent",
      "class_name": "HostAgent",
      "backend_scope": []
    },
    {
      "name": "GroundingAgent",
      "class_name": "GroundingAgent",
      "backend_scope": ["gui", "shell", "mcp", "system", "web"],
      "max_iterations": 20
    },
    {
      "name": "EvalAgent",
      "class_name": "EvalAgent",
      "backend_scope": ["shell"]
    }
  ]
}

Configures agent roles, backend access scope, and execution limits.

config_security.json - Security Policies

{
  "security_policies": {
    "global": {
      "allow_shell_commands": true,
      "blocked_commands": {
        "common": ["rm", "-rf", "shutdown", "reboot"],
        "linux": ["mkfs", "dd", "iptables"],
        "darwin": ["diskutil", "dd", "pfctl"],
        "windows": ["del", "format", "rd"]
      },
      "sandbox_enabled": false
    },
    "backend": {
      "shell": {
        "blocked_commands": { /* platform-specific */ }
      },
      "mcp": {
        "sandbox_enabled": true
      }
    }
  }
}

Defines command whitelists/blacklists, sandbox settings, and access control per backend.

config_dev.json.example - Development Override Template

{
  "shell": {
    "working_dir": "/path/to/your/workspace"
  },
  "debug": true,
  "log_level": "DEBUG"
}

Optional development overrides (copy to config_dev.json and customize). Merged with base config for local development.

πŸ”§ local_server/ - GUI Backend Server
local_server/
β”œβ”€β”€ __init__.py
β”œβ”€β”€ main.py                         # Flask application entry point
β”œβ”€β”€ config.json                     # Server configuration
β”œβ”€β”€ feature_checker.py              # Platform feature detection
β”œβ”€β”€ health_checker.py               # Server health monitoring
β”œβ”€β”€ platform_adapters/              # OS-specific implementations
β”‚   β”œβ”€β”€ macos_adapter.py            # macOS automation (atomacos, pyobjc)
β”‚   β”œβ”€β”€ linux_adapter.py            # Linux automation (pyatspi, xlib)
β”‚   β”œβ”€β”€ windows_adapter.py          # Windows automation (pywinauto)
β”‚   └── pyxcursor.py                # Custom cursor handling
β”œβ”€β”€ utils/
β”‚   β”œβ”€β”€ accessibility.py            # Accessibility tree utilities
β”‚   └── screenshot.py               # Screenshot capture
└── README.md

Purpose: Lightweight Flask service enabling computer control (GUI, Shell, Files, Screen capture).

πŸ”§ recording/ - Execution Recording & Analysis
recording/
β”œβ”€β”€ __init__.py
β”œβ”€β”€ recorder.py                     # Main recording manager
β”œβ”€β”€ manager.py                      # Recording lifecycle management
β”œβ”€β”€ action_recorder.py              # Action-level logging
β”œβ”€β”€ kanban_recorder.py              # Kanban state recording
β”œβ”€β”€ video.py                        # Video capture integration
β”œβ”€β”€ viewer.py                       # Trajectory viewer and analyzer
└── utils.py                        # Recording utilities

Purpose: Full execution audit with trajectory recording and video capture.

πŸ”§ platform/ - Platform Integration
platform/
β”œβ”€β”€ __init__.py
β”œβ”€β”€ config.py                       # Platform-specific configuration
β”œβ”€β”€ recording.py                    # Recording integration
β”œβ”€β”€ screenshot.py                   # Screenshot utilities
└── system_info.py                  # System information gathering
πŸ”§ utils/ - Shared Utilities
utils/
β”œβ”€β”€ logging.py                      # Structured logging system
β”œβ”€β”€ ui.py                           # Terminal UI components
β”œβ”€β”€ display.py                      # Display formatting utilities
β”œβ”€β”€ cli_display.py                  # CLI-specific display
β”œβ”€β”€ ui_integration.py               # UI integration helpers
└── telemetry/                      # Usage analytics (opt-in)
    β”œβ”€β”€ __init__.py
    β”œβ”€β”€ events.py
    β”œβ”€β”€ telemetry.py
    └── utils.py
πŸ“Š logs/ - Execution Logs & Recordings
logs/
β”œβ”€β”€ __main__/                              # Main application logs
β”‚   └── fastagent_YYYY-MM-DD_HH-MM-SS.log  # Timestamped log files
β”‚
└── recordings/                            # Execution recordings
    └── init_YYYYMMDD_HHMMSS_YYYYMMDD_HHMMSS/  # Individual recording session
        β”œβ”€β”€ metadata.json                  # Session metadata (task, timestamps, config)
        β”œβ”€β”€ traj.jsonl                     # Complete execution trajectory
        β”œβ”€β”€ agent_actions.jsonl            # Agent action history
        β”œβ”€β”€ kanban_events.jsonl            # Kanban state changes
        β”œβ”€β”€ screenshots/                   # Visual execution record
        β”‚   β”œβ”€β”€ init.png                   # Initial screenshot
        β”‚   β”œβ”€β”€ step_001.png               # Screenshot after step 1
        β”‚   β”œβ”€β”€ step_002.png               # Screenshot after step 2
        β”‚   └── ...                        # Sequential screenshots
        β”œβ”€β”€ workspace/                     # Task workspace
        β”‚   └── [generated files]          # Files created during execution
        └── screen_recording.mp4           # Video recording (if enabled)

Log Files Structure:

  • Main Logs (__main__/): Detailed application logs with initialization, configuration, agent coordination, and execution details
  • Recording Sessions (recordings/): Complete audit trail for each execution with:
    • metadata.json: Task description, start/end times, success status, configuration
    • traj.jsonl: Line-delimited JSON with full execution trajectory (tool calls, results, timestamps)
    • agent_actions.jsonl: Agent-level actions (planning, grounding, evaluation steps)
    • kanban_events.jsonl: Task state transitions (TODO β†’ IN_PROGRESS β†’ DONE/BLOCKED)
    • screenshots/: Visual snapshots at each execution step
    • workspace/: All files created/modified during task execution
    • screen_recording.mp4: Full video capture (optional)

πŸ”§ Advanced Usage

CLI Arguments

# Use different model
python -m fastagent --model "anthropic/claude-sonnet-4-5"

# Single query mode
python -m fastagent --query "Your task"

# Debug mode
python -m fastagent --log-level DEBUG

# Fast mode (disable evaluation)
python -m fastagent --no-eval

Tip

See Evaluation Control for detailed evaluation configuration options.

All Options
Argument Description
--model LLM model name
--query Single query mode
--timeout Max execution time (seconds)
--log-level DEBUG/INFO/WARNING/ERROR
--no-eval Disable evaluation
--no-workflow Disable workflow
--max-iterations Max iterations
--no-ui Disable UI

Configuration Overview

FastAgent uses a layered configuration system:

  • config_dev.json (highest priority): Local development overrides. Overrides all other configurations.
  • config_agents.json: Agent definitions and backend access control
  • config_mcp.json: MCP server registry
  • config_grounding.json: Backend-specific settings
  • config_workflow.json: Workflow engine and execution control
  • config_security.json: Security policies with runtime user confirmation for sensitive operations

Agent Configuration

Path: fastagent/config/config_agents.json

Purpose: Define agent roles, control backend access scope, and set execution limits to prevent infinite loops.

Example configuration:

{
  "agents": [
    {
      "name": "HostAgent",
      "class_name": "HostAgent",
      "backend_scope": []
    },
    {
      "name": "GroundingAgent",
      "class_name": "GroundingAgent",
      "backend_scope": ["gui", "shell", "mcp", "system", "web"],
      "max_iterations": 20
    },
    {
      "name": "EvalAgent",
      "class_name": "EvalAgent",
      "backend_scope": ["shell"]
    }
  ]
}

Key Fields:

Field Description Options/Example
name Agent identifier "HostAgent", "GroundingAgent", "EvalAgent"
backend_scope Accessible backends [] (none) or any combination of ["gui", "shell", "mcp", "system", "web"]
max_iterations Maximum execution cycles Any integer (e.g., 20, 50) or null (unlimited)

MCP Configuration

Path: fastagent/config/config_mcp.json

Purpose: Register MCP servers with connection details. FastAgent automatically discovers tools from all registered servers and makes them available through Smart Tool RAG.

Example configuration:

{
  "mcpServers": {
    "github": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": {
        "GITHUB_PERSONAL_ACCESS_TOKEN": "${GITHUB_TOKEN}"
      }
    }
  }
}

Other Configuration Files

Backend Configuration

Path: fastagent/config/config_grounding.json

Purpose: Configure backend-specific behaviors, timeouts, and Smart Tool RAG system for efficient tool selection.

Key Fields:

Backend Field Description Options/Default
shell timeout Command timeout (seconds) Any integer (default: 60)
conda_env Auto-activate conda environment Environment name or null (default: "fastagent")
working_dir Working directory for command execution Any valid path (default: current directory)
default_shell Shell to use "/bin/bash", "/bin/zsh", etc.
gui timeout Operation timeout (seconds) Any integer (default: 90)
screenshot_on_error Capture screenshot on failure true or false (default: true)
driver_type GUI automation driver "pyautogui" or other supported drivers
mcp timeout Request timeout (seconds) Any integer (default: 30)
sandbox Run in E2B sandbox true or false (default: false)
eager_sessions Pre-connect all servers at startup true or false (default: false, lazy connection)
tool_search search_mode Tool retrieval strategy "semantic", "hybrid" (semantic + LLM filter), or "llm" (default: "hybrid")
max_tools Maximum tools to index Any integer (default: 300)
enable_cache_persistence Persist embedding cache true or false (default: true)

Workflow Configuration

Path: fastagent/config/config_workflow.json

Purpose: Control workflow engine behavior, evaluation policies, and global execution limits. Determines when and how agents are triggered.

Key Fields:

Section Field Description Options/Default
workflow enable Enable event-driven workflow engine true or false (default: true)
auto_evaluate Enable EvalAgent for quality assurance true or false (default: true)
poll_interval Workflow polling interval (seconds) Any float (default: 1.0)
task_default_timeout Per-task timeout (seconds) Any float (default: 1800)
execution max_execution_time Global timeout for entire task (seconds) Any float or null (default: 3600)
max_iterations Maximum total iterations across all agents Any integer or null for unlimited (default: null)

Security Configuration

Path: fastagent/config/config_security.json

Purpose: Define security policies with command filtering and access control. When sensitive operations are detected, FastAgent will prompt for user confirmation at runtime before execution.

Key Fields:

Section Field Description Options
global allow_shell_commands Enable shell command execution true or false (default: true)
blocked_commands Platform-specific command blacklist Object with common, linux, darwin, windows arrays
sandbox_enabled Enable sandboxing for all operations true or false (default: false)
require_user_approval Prompt user before sensitive operations true or false (default: false)
backend shell, mcp, gui Per-backend security overrides Same fields as global, backend-specific

Example blocked commands: rm -rf, shutdown, reboot, mkfs, dd, format, iptables

Behavior:

  • Blocked commands are rejected automatically
  • When require_user_approval is true, sensitive operations pause execution and prompt for user confirmation
  • Sandbox mode isolates operations in secure environments (E2B sandbox for MCP)

Development Configuration

Path: fastagent/config/config_dev.json (copy from config_dev.json.example)

Purpose: Highest priority local development overrides. This file is git-ignored and overrides ALL other configuration files. Perfect for testing, debugging, and personal workspace customization without affecting the repository.

Loading Priority: config_grounding.json β†’ config_security.json β†’ config_dev.json (dev.json overrides the former ones)


Evaluation Control

from fastagent import EvaluationConfig

# Disable evaluation (fastest)
config = FastAgentConfig(
    auto_evaluate=False
)

# Evaluate all steps
config = FastAgentConfig(
    auto_evaluate=True,
    evaluation_config=EvaluationConfig.all()
)

# Evaluate last step only
config = FastAgentConfig(
    auto_evaluate=True,
    evaluation_config=EvaluationConfig.last_only()
)

# Selective evaluation (recommended)
config = FastAgentConfig(
    auto_evaluate=True,
    evaluation_config=EvaluationConfig.selective(
        backends=["gui", "mcp"],  # Only eval these backends
        always_eval_last=True     # Always eval final step
    )
)

Custom Workflow Rules

Add custom rules to control agent triggers and workflow behavior:

Basic Rule:

from fastagent import FastAgent, FastAgentConfig, WorkflowRule, CardType, CardStatus

# Create custom rule
rule = WorkflowRule(
    name="custom_rule",
    card_type=CardType.EXECUTION,
    card_status=CardStatus.TODO,
    agent_name="GroundingAgent",
    priority=100,  # Higher priority = executed first
)

# Add to workflow
config = FastAgentConfig()
agent = FastAgent(config)
await agent.initialize()
agent.workflow_engine.add_rule(rule)

Conditional Rule:

def check_backend(card):
    """Only trigger for MCP backend tasks"""
    return card.metadata.get("backend") == "mcp"

rule = WorkflowRule(
    name="mcp_only_rule",
    card_type=CardType.EXECUTION,
    card_status=CardStatus.TODO,
    agent_name="GroundingAgent",
    condition=check_backend,  # Custom condition function
    timeout=300.0
)
agent.workflow_engine.add_rule(rule)

Python API

Basic Usage & Batch Processing

Single Task:

import asyncio
from fastagent import FastAgent, FastAgentConfig

async def main():
    config = FastAgentConfig(
        llm_model="anthropic/claude-sonnet-4-5",
        max_execution_time=1800.0
    )
    
    agent = FastAgent(config)
    await agent.initialize()
    
    result = await agent.run("Your task")
    print(result["user_response"])
    
    await agent.cleanup()

asyncio.run(main())

Batch Processing:

async def batch_process(tasks: list[str]):
    """Process multiple tasks in the same session"""
    agent = FastAgent(FastAgentConfig())
    await agent.initialize()  # Create sessions once
    
    results = []
    for i, task in enumerate(tasks, 1):
        print(f"\nProcessing task {i}/{len(tasks)}: {task}")
        result = await agent.run(task)
        results.append(result)
    
    await agent.cleanup()  # Clean up sessions once
    return results

# Example usage
tasks = ["Task 1", "Task 2", "Task 3"]
results = asyncio.run(batch_process(tasks))

πŸ’‘ Note: Tasks in batch processing share the same sessions (GUI, Shell, MCP servers, etc.) and are executed sequentially. This is efficient for related tasks that need to maintain context across executions.


πŸ”— Related Projects

FastAgent builds upon excellent open-source projects, we sincerely thank their authors and contributors:

  • OSWorld: Comprehensive benchmark for evaluating computer-use agents across diverse operating system tasks, providing standardized evaluation environments and metrics.
  • mcp-use: Platform that simplifies MCP agent development with client SDKs, hosted gateway for routing/authentication, and single-endpoint aggregation of multiple MCP servers.

🌟 If this project helps you, please give us a Star!

πŸ€– Experience AI's full potential in unified research, analysis, and computer automation!


❀️ Thanks for visiting ✨ FastAgent!

Views

About

"FastAgent: Simple, Fast, and Strong LLM Agents"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •