Skip to content

100rabhkr/AgentGuard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

AgentGuard

PyPI License: MIT

Deadlock prevention for multi-AI-agent systems -- beyond the textbook.

AgentGuard is a C++17 library with first-class Python bindings that started as a clean implementation of the Banker's Algorithm (Dijkstra, 1965) for preventing deadlocks when multiple AI agents compete for shared resources. Use it from C++ or pip install agentguard-ai into your Python project -- with native LangGraph integration. But classical Banker's has real gaps when applied to AI agents. We identified three, and built solutions for each:

  1. Agents get stuck with no one noticing. The #1 complaint across LangGraph, CrewAI, and AutoGen is infinite loops. Current solutions are dumb counters (kill after N steps). AgentGuard's Progress Monitor detects stuck agents using progress invariants and auto-releases their resources.

  2. Authority deadlocks are invisible. Agent A delegates to B, B delegates to C, C delegates back to A -- everyone is politely waiting, no resource is held, and no existing tool detects it. AgentGuard's Delegation Tracker maintains a directed graph of active delegations and detects cycles in real time.

  3. AI agents don't know their max resource needs. Banker's requires every process to declare its maximum needs upfront. That's fine for OS processes, but an LLM agent has no idea how many API calls it will make. AgentGuard's Demand Estimator learns resource patterns at runtime and runs a probabilistic Banker's Algorithm -- no upfront declarations needed.

The core guarantee still holds: before granting any resource request, AgentGuard checks whether doing so would leave the system in a safe state. If granting would risk deadlock, the request is queued or denied. This works with agents joining and leaving at runtime, concurrent multi-threaded access, and multiple resource types requested atomically.

Table of Contents

The Problem

When multiple AI agents operate concurrently, they share limited resources:

Resource Example
API rate limits 60 OpenAI requests/minute shared across 5 agents
Tool access A code interpreter that only 1 agent can use at a time
Token budgets 100K tokens/minute pooled across agents
Database connections 10 connections shared by 20 agents
GPU compute 4 GPU slots shared by 8 training jobs

Without coordination, agents deadlock. Agent A holds the code interpreter and waits for the browser. Agent B holds the browser and waits for the code interpreter. Neither can proceed.

AgentGuard eliminates this class of failure.

How It Works

AgentGuard adapts the Banker's Algorithm from operating systems theory:

  1. Each agent declares its maximum resource needs when it registers -- or, in adaptive mode, the system learns them automatically from usage patterns.
  2. Before granting any request, the SafetyChecker simulates: "If I grant this, can all agents still complete?" It does this by iteratively finding agents whose remaining needs can be met with available resources, simulating their completion, and reclaiming their resources.
  3. If the resulting state is safe (a valid completion sequence exists), the request is granted immediately.
  4. If the resulting state is unsafe, the request is queued and re-evaluated whenever resources are released.
  5. Agents join and leave dynamically -- unlike classical Banker's, which assumes a fixed process set.
  6. Stuck agents are detected via progress monitoring, and their resources are auto-released to unblock others.
  7. Authority deadlock cycles (A delegates to B delegates to C delegates to A) are detected in real time and can be automatically broken.

The safety check runs in O(n^2 * m) time where n = number of agents and m = number of resource types. For typical multi-agent systems (n < 100, m < 50), this completes in microseconds.

What Makes AgentGuard Different

We started with a textbook Banker's Algorithm. It worked, but it wasn't enough. Real AI agent systems fail in ways that Dijkstra never anticipated.

The journey

V1: Classical Banker's. We implemented the full algorithm with dynamic agent registration, multi-resource batch requests, pluggable scheduling policies, and comprehensive monitoring. 109 tests, clean architecture. But when we looked at how multi-agent systems actually fail in production, three gaps became obvious.

Gap 1: Agents hang and nobody notices. An LLM agent enters an infinite reasoning loop, or a tool call blocks forever. The agent holds resources, other agents wait, and the whole system grinds to a halt. Every framework has this problem -- LangGraph, CrewAI, AutoGen. Their solution? Kill after N iterations. That's a timer, not a safety system.

Gap 2: Authority deadlocks. Agent A says "I need B to handle this." B says "C is better suited." C says "Let me check with A." Nobody holds a resource. Nobody is blocked in the traditional sense. But the system is completely stuck. No existing tool detects this.

Gap 3: Banker's requires crystal balls. The algorithm needs every process to declare its maximum resource needs upfront. An OS process can do this (it knows it needs at most N file descriptors). An LLM agent cannot -- it has no idea how many API calls a complex research task will require. This makes classical Banker's impractical for AI.

V2: What we built.

Feature Solves How
Progress Monitor Gap 1: Stuck agents Tracks named progress metrics per agent. A background thread detects stalls (no progress within a configurable threshold). On stall, resources are auto-released and events are emitted.
Delegation Tracker Gap 2: Authority deadlocks Maintains a directed graph of active delegations. On each new delegation, runs BFS cycle detection from the target back to the source. Configurable actions: notify, reject the delegation, or cancel the latest edge.
Demand Estimator Gap 3: Unknown max needs Records every resource request per agent. Computes a statistical estimate of max needs: mean + k * stddev where k comes from the desired confidence level (inverse normal CDF). Runs Banker's with these estimates instead of declared maximums. Cold-start handling included.

All three features are opt-in (disabled by default), backward-compatible, and independently toggleable. The original 109 tests pass unchanged.

V3: Python bindings + LangGraph integration. A C++ library nobody in the Python AI ecosystem will link. We added pybind11 bindings exposing the full API to Python (pip install agentguard-ai) and a high-level LangGraph integration layer with context managers, a @guarded_tool decorator, and drop-in GuardedToolNode/AgentGuardCallbackHandler for LangGraph and LangChain.

Version What Tests
V1 Classical Banker's Algorithm 109 C++
V2 + Progress Monitor, Delegation Tracker, Adaptive Demands 189 C++
V3 + Python bindings, LangGraph integration 189 C++ + 96 Python

Quick Start (Python)

pip install agentguard-ai              # from PyPI
pip install "agentguard-ai[langgraph]" # + LangGraph/LangChain integration
from agentguard.langgraph import AgentGuard, guarded_tool

# 1. Create a guard and register shared resources
guard = AgentGuard()
guard.add_resource("openai_api", capacity=60, category="api_rate_limit")
guard.add_resource("browser", capacity=2, category="tool_slot")

# 2. Register agents with their maximum resource needs
researcher = guard.register_agent("researcher", max_needs={"openai_api": 10, "browser": 1})
summarizer = guard.register_agent("summarizer", max_needs={"openai_api": 5})

# 3. Use context managers for automatic acquire/release
with guard.acquire(researcher, "openai_api", 3, timeout=5.0):
    # ... use 3 API slots, auto-released on exit ...
    pass

# 4. Or use the decorator for zero-boilerplate tool wrapping
@guarded_tool(guard, researcher, {"openai_api": 2, "browser": 1}, timeout=10.0)
def research(query: str) -> str:
    return call_openai(query)  # resources auto-acquired and released

result = research("latest AI papers")

# 5. Atomic multi-resource acquisition
with guard.acquire_batch(researcher, {"openai_api": 5, "browser": 1}):
    # ... all-or-nothing, auto-released ...
    pass

guard.stop()

LangGraph integration

from agentguard.langgraph import AgentGuard, GuardedToolNode

guard = AgentGuard()
guard.add_resource("api", 60)
agent_id = guard.register_agent("agent", max_needs={"api": 10})

# Drop-in replacement for LangGraph's ToolNode
node = GuardedToolNode(
    tools=[search_tool, calculator_tool],
    guard=guard,
    agent_id=agent_id,
    tool_resources={"search_tool": {"api": 2}, "calculator_tool": {}},
)
# graph.add_node("tools", node)

LangChain callback handler

from agentguard.langgraph import AgentGuard, AgentGuardCallbackHandler

guard = AgentGuard()
guard.add_resource("api", 60)
agent_id = guard.register_agent("agent", max_needs={"api": 10})

handler = AgentGuardCallbackHandler(
    guard=guard,
    agent_id=agent_id,
    tool_resources={"search": {"api": 1}},
)
# Pass handler to any LangChain agent/chain

Quick Start (C++)

#include <agentguard/agentguard.hpp>
#include <iostream>

using namespace agentguard;
using namespace std::chrono_literals;

int main() {
    // 1. Create the resource manager
    ResourceManager manager;

    // 2. Register shared resources
    manager.register_resource(Resource(1, "API-Slots", ResourceCategory::ApiRateLimit, 10));
    manager.register_resource(Resource(2, "Tool-Access", ResourceCategory::ToolSlot, 2));

    // 3. Register agents with their maximum resource needs
    Agent agent_a(0, "ResearchAgent", PRIORITY_HIGH);
    agent_a.declare_max_need(1, 5);   // needs at most 5 API slots
    agent_a.declare_max_need(2, 1);   // needs at most 1 tool slot
    AgentId id_a = manager.register_agent(agent_a);

    Agent agent_b(0, "SummaryAgent", PRIORITY_NORMAL);
    agent_b.declare_max_need(1, 4);   // needs at most 4 API slots
    agent_b.declare_max_need(2, 1);   // needs at most 1 tool slot
    AgentId id_b = manager.register_agent(agent_b);

    // 4. Start the background processor
    manager.start();

    // 5. Request resources -- the Banker's Algorithm ensures safety
    auto status = manager.request_resources(id_a, 1, 3, 5s);  // 3 API slots
    if (status == RequestStatus::Granted) {
        // ... do work ...
        manager.release_resources(id_a, 1, 3);
    }

    // 6. Atomic multi-resource request (all-or-nothing)
    auto batch_status = manager.request_resources_batch(id_b,
        {{1, 2}, {2, 1}},   // 2 API slots AND 1 tool slot
        5s);

    if (batch_status == RequestStatus::Granted) {
        // ... do work with both resources ...
        manager.release_all_resources(id_b);
    }

    manager.stop();
    return 0;
}

Installation & Building

Python (recommended)

pip install agentguard-ai              # from PyPI (prebuilt)
pip install "agentguard-ai[langgraph]" # + LangGraph/LangChain integration

Or from source:

pip install .                          # core library
pip install ".[langgraph]"             # + LangGraph/LangChain integration
pip install ".[dev]"                   # + pytest for development

Requires Python 3.9+, a C++17 compiler, and CMake 3.16+. The build happens automatically via scikit-build-core and pybind11.

C++ Prerequisites

  • C++17 compiler (GCC 7+, Clang 5+, AppleClang 10+, MSVC 19.14+)
  • CMake 3.16+
  • (Tests only) Internet connection for GoogleTest download via FetchContent

C++ Build

mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
cmake --build . --parallel

Build Options

Option Default Description
AGENTGUARD_BUILD_TESTS ON Build unit and integration tests
AGENTGUARD_BUILD_EXAMPLES ON Build example programs
AGENTGUARD_BUILD_PYTHON OFF Build Python bindings (auto-enabled by pip install)
AGENTGUARD_BUILD_BENCHMARKS OFF Build benchmark programs
AGENTGUARD_ENABLE_ASAN OFF Enable AddressSanitizer
AGENTGUARD_ENABLE_TSAN OFF Enable ThreadSanitizer
AGENTGUARD_ENABLE_UBSAN OFF Enable UndefinedBehaviorSanitizer
AGENTGUARD_INSTALL ON Generate install targets

Build with sanitizers (recommended for development)

cmake .. -DCMAKE_BUILD_TYPE=Debug -DAGENTGUARD_ENABLE_TSAN=ON
cmake --build . --parallel
ctest --output-on-failure

Run tests

cd build
ctest --output-on-failure

All 189 tests should pass.

Run examples

cd build
./examples/example_basic
./examples/example_llm_rate_limits
./examples/example_tool_sharing
./examples/example_priority_agents
./examples/example_adaptive_agents

Install

cmake --install . --prefix /usr/local

Python API

AgentGuard Wrapper

The AgentGuard class provides a Pythonic interface with string-based resource names, context managers, and automatic lifecycle management.

from agentguard.langgraph import AgentGuard
import agentguard as ag

# Create with auto-start (background processor starts immediately)
guard = AgentGuard()

# Or with custom config
config = ag.Config()
config.progress.enabled = True
config.delegation.enabled = True
config.adaptive.enabled = True
guard = AgentGuard(config=config)

# Register resources by name
guard.add_resource("openai_api", capacity=60, category="api_rate_limit")
guard.add_resource("browser", capacity=2, category="tool_slot")

# Register agents with string-based max needs
aid = guard.register_agent("researcher",
    priority=ag.PRIORITY_HIGH,
    max_needs={"openai_api": 10, "browser": 1},
    demand_mode=ag.DemandMode.Adaptive,  # optional
)

# Context manager: auto-acquire and release
with guard.acquire(aid, "openai_api", 3, timeout=5.0) as status:
    assert status == ag.RequestStatus.Granted
    # resources released on exit, even on exception

# Batch acquire: all-or-nothing
with guard.acquire_batch(aid, {"openai_api": 5, "browser": 1}, timeout=10.0):
    pass  # all released on exit

# Delegation tracking
guard.delegate(from_agent=aid1, to_agent=aid2, task="summarize")
guard.complete_delegation(aid1, aid2)

# Progress reporting
guard.report_progress(aid, "steps", 42)

# Queries
guard.is_safe()           # bool
guard.snapshot()          # ag.SystemSnapshot
guard.is_stalled(aid)     # bool
guard.manager             # access underlying ag.ResourceManager

# Use as context manager for automatic cleanup
with AgentGuard() as g:
    # ... use g ...
    pass  # g.stop() called automatically

Category strings: "api_rate_limit", "token_budget", "tool_slot", "memory_pool", "database_conn", "gpu_compute", "file_handle", "network_socket", "custom"

@guarded_tool Decorator

Wrap any function with automatic resource acquire/release.

from agentguard.langgraph import AgentGuard, guarded_tool

guard = AgentGuard()
guard.add_resource("api", 60)
aid = guard.register_agent("worker", max_needs={"api": 10})

# Single resource
@guarded_tool(guard, aid, "api")
def call_api(prompt: str) -> str:
    return openai.chat(prompt)

# Multiple resources (uses acquire_batch)
@guarded_tool(guard, aid, {"api": 2, "browser": 1}, timeout=10.0)
def research(query: str) -> str:
    return search_and_summarize(query)

# Adaptive mode
@guarded_tool(guard, aid, "api", adaptive=True)
def flexible_call(prompt: str) -> str:
    return openai.chat(prompt)

result = call_api("hello")  # resources acquired/released automatically

GuardedToolNode (LangGraph)

Drop-in replacement for LangGraph's ToolNode that wraps each tool invocation with resource guards.

from agentguard.langgraph import AgentGuard, GuardedToolNode

guard = AgentGuard()
guard.add_resource("api", 60)
aid = guard.register_agent("agent", max_needs={"api": 10})

node = GuardedToolNode(
    tools=[search_tool, calculator_tool],
    guard=guard,
    agent_id=aid,
    tool_resources={
        "search_tool": {"api": 2},
        "calculator_tool": {},      # no resources needed
    },
    timeout=10.0,
)

# Use in a LangGraph StateGraph:
# graph.add_node("tools", node)

Requires pip install "agentguard-ai[langgraph]". Falls back to a placeholder that raises ImportError if langgraph is not installed.

AgentGuardCallbackHandler (LangChain)

LangChain callback handler that auto-instruments tool calls with resource acquisition/release.

from agentguard.langgraph import AgentGuard, AgentGuardCallbackHandler

guard = AgentGuard()
guard.add_resource("api", 60)
aid = guard.register_agent("agent", max_needs={"api": 10})

handler = AgentGuardCallbackHandler(
    guard=guard,
    agent_id=aid,
    tool_resources={"search": {"api": 1}, "calculator": {"api": 1}},
    timeout=5.0,
    report_progress=True,  # auto-reports tool_calls metric
)

# Pass to any LangChain agent or chain:
# agent.invoke(input, config={"callbacks": [handler]})
  • on_tool_start() acquires resources for the tool
  • on_tool_end() releases resources and reports progress
  • on_tool_error() releases resources (cleanup on failure)

Requires pip install "agentguard-ai[langgraph]".

Low-Level Bindings

The full C++ API is available directly via import agentguard:

import agentguard as ag
import datetime

# All C++ types are available
config = ag.Config()
manager = ag.ResourceManager(config)
manager.register_resource(ag.Resource(1, "API", ag.ResourceCategory.ApiRateLimit, 10))

agent = ag.Agent(0, "test", ag.PRIORITY_HIGH)
agent.declare_max_need(1, 5)
aid = manager.register_agent(agent)

manager.start()
status = manager.request_resources(aid, 1, 3, datetime.timedelta(seconds=5))
assert status == ag.RequestStatus.Granted
manager.release_resources(aid, 1, 3)
manager.stop()

# Enums, structs, exceptions, monitors, policies, AI types all available
# ag.RequestStatus, ag.AgentState, ag.ResourceCategory, ag.DemandMode, ...
# ag.SafetyChecker, ag.DemandEstimator, ag.MetricsMonitor, ...
# ag.FifoPolicy, ag.PriorityPolicy, ag.FairnessPolicy, ...
# ag.ai.TokenBudget, ag.ai.RateLimiter, ag.ai.ToolSlot, ag.ai.MemoryPool

# Python subclassing of Monitor and SchedulingPolicy is supported
class MyMonitor(ag.Monitor):
    def on_event(self, event):
        print(f"Event: {event.type}")
    def on_snapshot(self, snapshot):
        pass

manager.set_monitor(MyMonitor())

GIL safety: blocking C++ calls (e.g., request_resources) release the GIL so other Python threads can run. Callbacks from C++ background threads properly acquire the GIL before invoking Python code.

C++ API Reference

ResourceManager

The central coordinator. Manages agents, resources, and the Banker's Algorithm.

#include <agentguard/resource_manager.hpp>

// Construction
ResourceManager manager;                    // default config
ResourceManager manager(Config{});          // custom config

// Resource registration
manager.register_resource(Resource(1, "MyResource", ResourceCategory::Custom, 10));
manager.unregister_resource(1);                        // fails if any agent holds it
manager.adjust_resource_capacity(1, 20);               // dynamic scaling
std::optional<Resource> r = manager.get_resource(1);   // query
std::vector<Resource> all = manager.get_all_resources();

// Agent lifecycle
Agent a(0, "MyAgent", PRIORITY_HIGH);
a.declare_max_need(1, 5);
AgentId id = manager.register_agent(a);       // returns assigned ID
manager.deregister_agent(id);                 // releases all held resources
manager.update_agent_max_claim(id, 1, 3);     // reduce max claim (must be >= current alloc)
std::optional<Agent> agent = manager.get_agent(id);
std::vector<Agent> agents = manager.get_all_agents();
std::size_t count = manager.agent_count();

// Synchronous requests (blocking)
RequestStatus s = manager.request_resources(id, resource_type, quantity, timeout);
RequestStatus s = manager.request_resources_batch(id, {{rt1, qty1}, {rt2, qty2}}, timeout);

// Asynchronous requests
std::future<RequestStatus> f = manager.request_resources_async(id, rt, qty, timeout);
RequestId rid = manager.request_resources_callback(id, rt, qty, callback, timeout);

// Release
manager.release_resources(id, resource_type, quantity);
manager.release_all_resources(id, resource_type);  // release all of one type
manager.release_all_resources(id);                 // release everything

// Queries
bool safe = manager.is_safe();
SystemSnapshot snap = manager.get_snapshot();
std::size_t pending = manager.pending_request_count();

// Configuration
manager.set_scheduling_policy(std::make_unique<PriorityPolicy>());
manager.set_monitor(std::make_shared<ConsoleMonitor>());
manager.start();   // launch background queue processor thread
manager.stop();    // drain queue and stop
bool running = manager.is_running();

Request behavior

Scenario Behavior
Resources available, state safe Granted immediately
Resources available, state unsafe, processor running Queued, retried on each release until timeout
Resources available, state unsafe, processor not running Denied immediately
Resources unavailable, processor running Queued until available and safe, or timeout
Request exceeds agent's declared max claim Throws MaxClaimExceededException
Request exceeds resource total capacity Throws ResourceCapacityExceededException
Agent not found Throws AgentNotFoundException
Resource type not found Throws ResourceNotFoundException

Agent

Represents an AI agent in the system.

#include <agentguard/agent.hpp>

Agent a(0, "MyAgent", PRIORITY_HIGH);       // id=0 means "assign me one"

// Declare maximum resource needs (required before requesting)
a.declare_max_need(1, 5);                   // resource type 1, up to 5 units
a.declare_max_need(2, 3);                   // resource type 2, up to 3 units

// Priority levels
a.set_priority(PRIORITY_CRITICAL);          // PRIORITY_LOW=0, NORMAL=50, HIGH=100, CRITICAL=200

// AI-specific metadata
a.set_model_identifier("claude-opus-4-6");
a.set_task_description("Research recent papers on transformers");

// Queries
AgentId id = a.id();
const std::string& name = a.name();
Priority p = a.priority();
AgentState state = a.state();               // Registered, Active, Waiting, Releasing, Deregistered
ResourceQuantity need = a.remaining_need(1);
const auto& alloc = a.current_allocation(); // map of resource_type -> quantity held
const auto& maxn = a.max_needs();           // map of resource_type -> max declared need

Resource

Represents a shared resource type.

#include <agentguard/resource.hpp>

Resource r(1, "OpenAI-API", ResourceCategory::ApiRateLimit, 60);

// Queries
ResourceTypeId id = r.id();                 // 1
const std::string& name = r.name();         // "OpenAI-API"
ResourceCategory cat = r.category();        // ResourceCategory::ApiRateLimit
ResourceQuantity total = r.total_capacity(); // 60
ResourceQuantity alloc = r.allocated();     // 0 (initially)
ResourceQuantity avail = r.available();     // 60 (initially)

// Dynamic capacity adjustment
bool ok = r.set_total_capacity(100);        // false if new_capacity < allocated

// AI-specific metadata
r.set_replenish_interval(std::chrono::minutes(1));   // for rate-limited resources
r.set_cost_per_unit(0.003);                          // for usage accounting

Resource categories

enum class ResourceCategory {
    ApiRateLimit,    // API calls per time window
    TokenBudget,     // LLM token allocation
    ToolSlot,        // Tool access (code interpreter, browser, etc.)
    MemoryPool,      // Shared memory / context window
    DatabaseConn,    // Database connection pool
    GpuCompute,      // GPU compute units
    FileHandle,      // File system handles
    NetworkSocket,   // Network connections
    Custom           // User-defined
};

SafetyChecker

The core Banker's Algorithm implementation. Stateless and thread-safe (no internal locking needed).

#include <agentguard/safety_checker.hpp>

SafetyChecker checker;

// Build a state snapshot
SafetyCheckInput input;
input.total[1] = 10;
input.available[1] = 3;
input.allocation[agent_1][1] = 4;
input.max_need[agent_1][1] = 7;
input.allocation[agent_2][1] = 3;
input.max_need[agent_2][1] = 9;

// Core safety check
SafetyCheckResult result = checker.check_safety(input);
// result.is_safe       -- true if a safe sequence exists
// result.safe_sequence -- one valid completion order [agent_1, agent_2]
// result.reason        -- human-readable explanation

// Hypothetical: "If I grant 2 units of resource 1 to agent_1, is it still safe?"
SafetyCheckResult h = checker.check_hypothetical(input, agent_1, 1, 2);

// Batch hypothetical: "If I grant all of these simultaneously?"
std::vector<ResourceRequest> batch = { ... };
SafetyCheckResult hb = checker.check_hypothetical_batch(input, batch);

// Which of these pending requests can be safely granted?
std::vector<RequestId> grantable = checker.find_grantable_requests(input, candidates);

// Which agents are resource bottlenecks? (sorted by impact, descending)
std::vector<AgentId> bottlenecks = checker.identify_bottleneck_agents(input);

Progress Monitoring

Detect stuck agents and auto-release their resources.

// Enable in config
Config cfg;
cfg.progress.enabled = true;
cfg.progress.default_stall_threshold = std::chrono::seconds(120);
cfg.progress.check_interval = std::chrono::seconds(5);
cfg.progress.auto_release_on_stall = true;   // free resources from stuck agents

ResourceManager manager(cfg);
manager.start();

// Agents report progress as they work
manager.report_progress(agent_id, "steps_completed", 5);
manager.report_progress(agent_id, "tokens_generated", 1200);

// Per-agent stall threshold override
manager.set_agent_stall_threshold(agent_id, std::chrono::seconds(30));

// Query stall state
bool stuck = manager.is_agent_stalled(agent_id);
std::vector<AgentId> stalled = manager.get_stalled_agents();

If an agent stops reporting progress for longer than its stall threshold, AgentGuard emits AgentStalled and (if configured) releases all resources held by that agent. When the agent resumes progress, AgentStallResolved is emitted.

Delegation Tracking

Detect authority deadlock cycles where agents delegate to each other in a loop.

// Enable in config
Config cfg;
cfg.delegation.enabled = true;
cfg.delegation.cycle_action = DelegationCycleAction::RejectDelegation;
// Options: NotifyOnly, RejectDelegation, CancelLatest

ResourceManager manager(cfg);

// Report delegations as they happen
DelegationResult r = manager.report_delegation(agent_a, agent_b, "Summarize document");
// r.accepted     -- true if the delegation was added
// r.cycle_detected -- true if this delegation would create a cycle
// r.cycle_path   -- the cycle (e.g., [A, B, C, A])

// Complete or cancel delegations
manager.complete_delegation(agent_a, agent_b);
manager.cancel_delegation(agent_b, agent_c);

// Query delegation state
std::vector<DelegationInfo> all = manager.get_all_delegations();
std::optional<std::vector<AgentId>> cycle = manager.find_delegation_cycle();
Cycle Action Behavior
NotifyOnly Accept the delegation, emit DelegationCycleDetected event
RejectDelegation Refuse to add the edge, return accepted=false
CancelLatest Add then immediately remove the edge, emit DelegationCancelled

Adaptive Demands

Run Banker's Algorithm without requiring agents to declare max resource needs upfront.

// Enable in config
Config cfg;
cfg.adaptive.enabled = true;
cfg.adaptive.default_confidence_level = 0.95;   // 95th percentile estimate
cfg.adaptive.cold_start_default_demand = 3;      // assume 3 until we have data
cfg.adaptive.cold_start_headroom_factor = 1.5;   // multiply first observation by 1.5x
cfg.adaptive.adaptive_headroom_factor = 1.2;     // cap at 1.2x observed max cumulative

ResourceManager manager(cfg);

// Set agents to adaptive mode (no declare_max_need needed)
manager.set_agent_demand_mode(agent_id, DemandMode::Adaptive);

// Use adaptive resource requests
auto status = manager.request_resources_adaptive(agent_id, resource_type, 3, 5s);

// Check system safety probabilistically
ProbabilisticSafetyResult result = manager.check_safety_probabilistic(0.95);
// result.is_safe            -- safe at this confidence level?
// result.confidence_level   -- the confidence used
// result.estimated_max_needs -- what the estimator computed per agent
// result.safe_sequence      -- completion order (if safe)

// No-argument version uses config default confidence
auto result2 = manager.check_safety_probabilistic();

Three demand modes are available per agent:

Mode Behavior
Static Classical Banker's: uses declare_max_need() only (default, backward-compatible)
Adaptive No upfront declaration needed. Max needs estimated from usage history.
Hybrid Uses the minimum of the statistical estimate and the declared max need.

Scheduling Policies

Control the order in which queued requests are processed. All implement the SchedulingPolicy interface.

#include <agentguard/policy.hpp>

// Set on the ResourceManager
manager.set_scheduling_policy(std::make_unique<PriorityPolicy>());
Policy Behavior
FifoPolicy First-come, first-served (default)
PriorityPolicy Higher-priority agents served first, FIFO within same priority
ShortestNeedPolicy Agents closest to finishing go first (maximizes throughput)
DeadlinePolicy Requests with nearest timeout deadline go first
FairnessPolicy Longest-waiting requests go first (prevents starvation)

Custom policies

class MyPolicy : public SchedulingPolicy {
public:
    std::vector<ResourceRequest> prioritize(
        const std::vector<ResourceRequest>& pending,
        const SystemSnapshot& state) const override
    {
        auto result = pending;
        // ... your ordering logic ...
        return result;
    }

    std::string name() const override { return "MyPolicy"; }
};

manager.set_scheduling_policy(std::make_unique<MyPolicy>());

Monitoring

Observe every significant event in the system.

#include <agentguard/monitor.hpp>

// Console logger with verbosity levels
manager.set_monitor(
    std::make_shared<ConsoleMonitor>(ConsoleMonitor::Verbosity::Verbose));
    // Verbosity: Quiet, Normal, Verbose, Debug

// Metrics collector
auto metrics_mon = std::make_shared<MetricsMonitor>();
manager.set_monitor(metrics_mon);

// Later, query collected metrics
MetricsMonitor::Metrics m = metrics_mon->get_metrics();
// m.total_requests, m.granted_requests, m.denied_requests,
// m.timed_out_requests, m.unsafe_state_detections,
// m.resource_utilization_percent

// Threshold alerts
metrics_mon->set_utilization_alert_threshold(0.9, [](const std::string& msg) {
    std::cerr << "ALERT: " << msg << "\n";
});
metrics_mon->set_queue_size_alert_threshold(100, [](const std::string& msg) {
    std::cerr << "ALERT: " << msg << "\n";
});

// Combine multiple monitors
auto composite = std::make_shared<CompositeMonitor>();
composite->add_monitor(std::make_shared<ConsoleMonitor>());
composite->add_monitor(metrics_mon);
manager.set_monitor(composite);

Event types

// Core events
AgentRegistered, AgentDeregistered, ResourceRegistered, ResourceCapacityChanged,
RequestSubmitted, RequestGranted, RequestDenied, RequestTimedOut, RequestCancelled,
ResourcesReleased, SafetyCheckPerformed, UnsafeStateDetected, QueueSizeChanged,

// Progress monitoring
AgentProgressReported, AgentStalled, AgentStallResolved, AgentResourcesAutoReleased,

// Delegation tracking
DelegationReported, DelegationCompleted, DelegationCancelled, DelegationCycleDetected,

// Adaptive demands
DemandEstimateUpdated, ProbabilisticSafetyCheck, AdaptiveDemandModeChanged

Custom monitors

class SlackMonitor : public Monitor {
public:
    void on_event(const MonitorEvent& event) override {
        if (event.type == EventType::UnsafeStateDetected) {
            // send Slack alert
        }
    }
    void on_snapshot(const SystemSnapshot& snapshot) override {
        // post dashboard update
    }
};

AI-Specific Resource Types

Higher-level resource types with AI-relevant metadata. Each produces a Resource via .as_resource().

TokenBudget

#include <agentguard/ai/token_budget.hpp>

using namespace agentguard::ai;

TokenBudget budget(1, "GPT4-Tokens", 100000, std::chrono::minutes(1));
budget.set_input_output_ratio(0.7);   // 70% input, 30% output

manager.register_resource(budget.as_resource());

// Queries
double rate = budget.tokens_per_second_rate();   // ~1666.67

RateLimiter

#include <agentguard/ai/rate_limiter.hpp>

RateLimiter limiter(2, "OpenAI-API", 60, RateLimiter::WindowType::PerMinute);
limiter.set_burst_allowance(10);                          // allow bursts up to 70
limiter.add_endpoint_sublimit("/v1/chat/completions", 40); // endpoint sub-limit

manager.register_resource(limiter.as_resource());
// Resource capacity = requests_per_window + burst_allowance = 70

ToolSlot

#include <agentguard/ai/tool_slot.hpp>

// Exclusive tool: only 1 agent at a time
ToolSlot interpreter(3, "CodeInterpreter", ToolSlot::AccessMode::Exclusive);

// Concurrent tool: up to 3 agents
ToolSlot browser(4, "WebBrowser", ToolSlot::AccessMode::Concurrent, 3);

browser.set_estimated_usage_duration(std::chrono::seconds(30));
browser.set_fallback_tool(5);   // try tool ID 5 if browser is full

manager.register_resource(interpreter.as_resource());
manager.register_resource(browser.as_resource());

MemoryPool

#include <agentguard/ai/memory_pool.hpp>

MemoryPool pool(5, "SharedContext", 1024, MemoryPool::MemoryUnit::Megabytes);
pool.set_eviction_policy("LRU");
pool.set_fragmentation_threshold(0.3);

manager.register_resource(pool.as_resource());
// MemoryUnit: Bytes, Kilobytes, Megabytes, Tokens, Entries

Configuration

#include <agentguard/config.hpp>

Config cfg;
cfg.max_agents = 1024;                                   // max concurrent agents
cfg.max_resource_types = 256;                             // max resource types
cfg.max_queue_size = 10000;                               // request queue capacity
cfg.default_request_timeout = std::chrono::seconds(30);   // default blocking timeout
cfg.processor_poll_interval = std::chrono::milliseconds(10); // queue check interval
cfg.snapshot_interval = std::chrono::seconds(5);          // monitor snapshot interval
cfg.enable_timeout_expiration = true;                     // expire queued requests
cfg.starvation_threshold = std::chrono::seconds(60);      // starvation warning
cfg.thread_safe = true;                                   // set false for single-threaded use

ResourceManager manager(cfg);

Exceptions

All exceptions inherit from AgentGuardException (which inherits from std::runtime_error).

Exception Thrown when
AgentNotFoundException Operating on an unregistered agent ID
ResourceNotFoundException Operating on an unregistered resource type ID
MaxClaimExceededException Requesting more than the agent's declared max need
ResourceCapacityExceededException Requesting more than the resource's total capacity
QueueFullException Enqueueing to a full request queue
AgentAlreadyRegisteredException Registering an agent with a duplicate ID
try {
    manager.request_resources(agent_id, resource_type, quantity);
} catch (const MaxClaimExceededException& e) {
    std::cerr << e.what() << "\n";
    // "Agent 3 requested 10 of resource 1 but max claim is 5"
} catch (const AgentNotFoundException& e) {
    std::cerr << "Unknown agent: " << e.agent_id() << "\n";
}

Examples

01 -- Basic Usage

Minimal example: 2 resources, 3 agents, sequential requests with a ConsoleMonitor showing the Banker's Algorithm decisions in real time.

./examples/example_basic

02 -- LLM API Rate Limits

3 LLM agents (Researcher, Summarizer, Indexer) at different priority levels sharing OpenAI and Anthropic API rate limits. Uses RateLimiter, PriorityPolicy, and MetricsMonitor to track throughput.

./examples/example_llm_rate_limits

03 -- Tool Sharing

4 agents sharing a code interpreter (exclusive, 1 slot), web browser (concurrent, 2 slots), and filesystem (concurrent, 3 slots). Demonstrates that the Banker's Algorithm prevents the classic tool-sharing deadlock where agents hold some tools and wait for others.

./examples/example_tool_sharing

04 -- Priority Agents

4 agents at CRITICAL, HIGH, NORMAL, and LOW priority competing for a scarce token budget. Shows that PriorityPolicy serves high-priority agents first, with MetricsMonitor alerts on utilization thresholds.

./examples/example_priority_agents

05 -- Adaptive Agents (all three novel features)

3 agents in adaptive demand mode (no declare_max_need calls) sharing API tokens and tool slots. Demonstrates all three novel features working together:

  • Adaptive demands: agents request resources without upfront max declarations; a probabilistic safety check passes at 90% confidence.
  • Delegation cycle detection: A delegates to B, B delegates to C, C tries to delegate back to A -- cycle detected and rejected.
  • Progress monitoring: Agent B stops reporting progress, is detected as stalled within 200ms, and its resources are auto-released.
./examples/example_adaptive_agents

Architecture

Request processing flow

Agent Thread                 ResourceManager                SafetyChecker
    |                              |                              |
    |-- request_resources() ------>|                              |
    |                              |-- acquire shared_mutex ----  |
    |                              |-- build SafetyCheckInput     |
    |                              |-- check_hypothetical() ----->|
    |                              |                              |-- Banker's Algorithm
    |                              |<-- SafetyCheckResult --------|   O(n^2 * m)
    |                              |                              |
    |                         [if safe]                           |
    |                              |-- upgrade to write lock      |
    |                              |-- update allocation matrices |
    |<-- Granted ------------------|                              |
    |                              |                              |
    |                         [if unsafe]                         |
    |                              |-- queue request              |
    |                              |-- wait on condition_variable |
    |                              |   (re-checked on release)    |

Concurrency design

  • std::shared_mutex protects the Banker's matrices. Reads (safety checks, snapshots) take shared locks. Writes (allocations, registrations) take exclusive locks. This is optimal for read-heavy workloads.
  • std::condition_variable_any wakes blocked request threads when resources are released.
  • Background processor thread (start()/stop()) handles callback-based async requests and timeout expiration from the RequestQueue.
  • SafetyChecker is stateless -- no internal locks, can be called concurrently.
  • Each Monitor implementation handles its own thread safety.

Key design decisions

Decision Rationale
shared_mutex Read-heavy workload (safety checks >> writes)
Stateless SafetyChecker Testable in isolation, no coupling to ResourceManager locking
unordered_map for matrices Dynamic agent count (agents join/leave at runtime)
Pluggable SchedulingPolicy Different deployments need different strategies
AI types in ai/ subnamespace Core stays generic; AI resources are opt-in
Header + source (not header-only) Non-trivial implementation; faster downstream compiles
C++17 Broad compiler support across GCC, Clang, MSVC

Testing

285 total tests (189 C++ + 96 Python) across unit, integration, and concurrent categories.

C++ tests (189)

Category Tests Coverage
Unit: Resource 12 Construction, capacity, metadata
Unit: Agent 17 Construction, max needs, allocation, metadata
Unit: SafetyChecker 21 Safe/unsafe states, hypothetical checks, batch, bottlenecks, edge cases
Unit: ResourceManager 23 Registration, requests, releases, batch, snapshots, exceptions
Unit: RequestQueue 17 Priority ordering, cancellation, timeouts, capacity
Unit: Policy 10 FIFO, Priority, Fairness, Deadline, ShortestNeed
Unit: ProgressTracker 10 Registration, stall detection/resolution, per-agent thresholds, monitor events
Unit: DelegationTracker 18 Cycles (2-node, 3-node, self), notify/reject/cancel actions, deregister cleanup
Unit: DemandEstimator 22 Statistics (mean/variance/stddev), cold start, confidence levels, rolling window, modes
Unit: Probabilistic Safety 10 Probabilistic wrappers, confidence recording, hypothetical checks, multi-resource
Integration: Deadlock Prevention 4 Dining philosophers, circular wait, incremental requests
Integration: Concurrent 5 10-agent stress test, registration races, batch concurrency, async, high contention
Integration: Delegation Cycles 6 Cycle detection through ResourceManager, reject/cancel config, disabled no-ops
Integration: Adaptive Demands 8 Adaptive/hybrid/static modes, probabilistic safety, backward compatibility
Integration: Progress Monitor 6 Stall detection, auto-release, monitor events, multi-agent stall states
cd build && ctest --output-on-failure
# 189/189 tests pass in ~2.7 seconds

Python tests (96)

Category Tests Coverage
Bindings: Basic 31 Enums, structs, config defaults, priority constants, exception hierarchy, UsageStats
Bindings: Manager 17 Full lifecycle: register, request, release, batch, overloads, snapshots
Bindings: Threading 7 GIL release on blocking calls, concurrent agents, FutureRequestStatus, callbacks
Bindings: Monitors 8 Python Monitor subclass, ConsoleMonitor, MetricsMonitor alerts, CompositeMonitor
Bindings: Subsystems 13 SafetyChecker, DemandEstimator, progress, delegation, adaptive through bindings
LangGraph: Guard 14 AgentGuard wrapper: add_resource, register_agent, acquire, batch, delegation
LangGraph: Decorator 6 @guarded_tool single/multi resource, exception cleanup, functools.wraps
LangGraph: Node 5* GuardedToolNode construction (skipped if langgraph not installed)

* 5 tests skipped when langgraph/langchain-core are not installed.

pip install ".[dev]" && pytest python/tests/ -v
# 96 passed, 5 skipped in ~1.7 seconds

Deadlock prevention proof tests

The integration tests construct scenarios that would deadlock without the Banker's Algorithm and verify that AgentGuard prevents them:

  • Dining Philosophers: 5 agents, 5 resources (capacity 1 each), each agent needs 2 adjacent resources. All 5 complete.
  • Circular Wait: 3 agents forming a circular resource dependency chain. All 3 complete.
  • Incremental Requests: 3 agents incrementally requesting from a shared pool. All complete via serialization.

Project Structure

agentguard/
|-- CMakeLists.txt                      # Root build configuration
|-- pyproject.toml                      # Python packaging (scikit-build-core + pybind11)
|-- cmake/
|   |-- CompilerWarnings.cmake          # -Wall -Wextra -Wpedantic etc.
|   |-- Sanitizers.cmake                # ASan / TSan / UBSan support
|-- include/agentguard/
|   |-- agentguard.hpp                  # Umbrella header (includes everything)
|   |-- types.hpp                       # AgentId, ResourceTypeId, enums, structs
|   |-- exceptions.hpp                  # Exception hierarchy
|   |-- config.hpp                      # Config struct (+ ProgressConfig, DelegationConfig, AdaptiveConfig)
|   |-- resource.hpp                    # Resource class
|   |-- agent.hpp                       # Agent class
|   |-- safety_checker.hpp              # Core Banker's Algorithm + probabilistic extensions
|   |-- request_queue.hpp               # Priority queue for pending requests
|   |-- resource_manager.hpp            # Central coordinator
|   |-- monitor.hpp                     # Monitor interface + ConsoleMonitor + MetricsMonitor
|   |-- policy.hpp                      # Scheduling policies
|   |-- progress_tracker.hpp            # Stuck agent detection via progress invariants
|   |-- delegation_tracker.hpp          # Authority deadlock cycle detection
|   |-- demand_estimator.hpp            # Statistical max-need estimation
|   |-- ai/
|       |-- token_budget.hpp            # LLM token pool resource
|       |-- rate_limiter.hpp            # API rate limit resource
|       |-- tool_slot.hpp               # Tool access resource
|       |-- memory_pool.hpp             # Shared memory resource
|-- src/
|   |-- CMakeLists.txt                  # Library target
|   |-- resource.cpp, agent.cpp, safety_checker.cpp, resource_manager.cpp,
|   |-- request_queue.cpp, monitor.cpp, policy.cpp, config.cpp
|   |-- progress_tracker.cpp, delegation_tracker.cpp, demand_estimator.cpp
|   |-- ai/
|       |-- token_budget.cpp, rate_limiter.cpp, tool_slot.cpp, memory_pool.cpp
|-- python/
|   |-- CMakeLists.txt                  # pybind11 module build
|   |-- src/
|   |   |-- bind_forward.hpp            # Forward declarations for binding functions
|   |   |-- bindings.cpp                # Module entry + enums + structs + exceptions
|   |   |-- bind_core.cpp              # Resource, Agent, FutureRequestStatus, ResourceManager
|   |   |-- bind_monitors.cpp          # Monitor trampoline, ConsoleMonitor, MetricsMonitor
|   |   |-- bind_policies.cpp          # SchedulingPolicy trampoline, 5 concrete policies
|   |   |-- bind_subsystems.cpp        # SafetyChecker, DemandEstimator
|   |   |-- bind_ai.cpp               # ai submodule: TokenBudget, RateLimiter, ToolSlot, MemoryPool
|   |-- agentguard/
|   |   |-- __init__.py                # Re-exports from C extension
|   |   |-- _version.py               # __version__ = "1.0.0"
|   |   |-- langgraph/
|   |       |-- __init__.py            # Public API: AgentGuard, guarded_tool, GuardedToolNode
|   |       |-- guard.py              # High-level Pythonic wrapper (string names, context managers)
|   |       |-- decorator.py          # @guarded_tool decorator
|   |       |-- node.py              # GuardedToolNode (wraps LangGraph ToolNode)
|   |       |-- callback.py          # AgentGuardCallbackHandler (LangChain callbacks)
|   |-- tests/
|       |-- conftest.py               # Shared fixtures
|       |-- test_bindings_basic.py    # Enums, structs, constants, exceptions
|       |-- test_bindings_manager.py  # Full ResourceManager lifecycle
|       |-- test_bindings_threading.py # GIL release, concurrent agents
|       |-- test_bindings_monitors.py # Python Monitor subclass, callbacks
|       |-- test_bindings_subsystems.py # Progress, delegation, adaptive
|       |-- test_langgraph_guard.py   # AgentGuard wrapper
|       |-- test_langgraph_decorator.py # @guarded_tool
|       |-- test_langgraph_node.py    # GuardedToolNode
|-- tests/
|   |-- CMakeLists.txt                  # GoogleTest via FetchContent
|   |-- unit/                           # Per-class unit tests (10 files)
|   |-- integration/                    # Concurrent, deadlock, and feature integration tests (5 files)
|-- examples/
    |-- CMakeLists.txt
    |-- 01_basic_usage.cpp              # Minimal example
    |-- 02_llm_api_rate_limits.cpp      # Multi-threaded API rate sharing
    |-- 03_tool_sharing.cpp             # Deadlock-free tool sharing
    |-- 04_priority_agents.cpp          # Priority scheduling with metrics
    |-- 05_adaptive_agents.cpp          # All three novel features in action

Requirements

Python

  • Python: 3.9+
  • Build: pybind11 2.12+, scikit-build-core 0.8+, CMake 3.16+ (all auto-installed by pip install agentguard-ai)
  • Optional: langgraph 0.2+, langchain-core 0.2+ (for LangGraph integration)
  • Dev: pytest, pytest-timeout

C++

  • C++ Standard: C++17
  • Build System: CMake 3.16+
  • Compiler: Any C++17-capable compiler
    • GCC 7+
    • Clang 5+
    • AppleClang 10+ (Xcode 10+)
    • MSVC 19.14+ (Visual Studio 2017 15.7+)
  • Dependencies: None (GoogleTest fetched automatically for tests)
  • Threading: POSIX threads (pthreads) or Windows threads

Integration

Python (pip)

pip install agentguard-ai              # from PyPI
pip install "agentguard-ai[langgraph]" # with LangGraph support
import agentguard as ag                       # low-level C++ bindings
from agentguard.langgraph import AgentGuard   # high-level wrapper
from agentguard.langgraph import guarded_tool # decorator

CMake (after install)

find_package(AgentGuard REQUIRED)
target_link_libraries(my_target PRIVATE AgentGuard::agentguard)

CMake (as subdirectory)

add_subdirectory(agentguard)
target_link_libraries(my_target PRIVATE AgentGuard::agentguard)

Single include

#include <agentguard/agentguard.hpp>   // includes everything

Or include only what you need:

#include <agentguard/resource_manager.hpp>
#include <agentguard/ai/rate_limiter.hpp>

License

MIT

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors