Skip to content

feat: add agent availability control plane#3

Merged
zhchxiao123 merged 2 commits into
beyonai:mainfrom
zhchxiao123:support_sendMessage_hook_for_waking_up_workers
May 18, 2026
Merged

feat: add agent availability control plane#3
zhchxiao123 merged 2 commits into
beyonai:mainfrom
zhchxiao123:support_sendMessage_hook_for_waking_up_workers

Conversation

@zhchxiao123
Copy link
Copy Markdown
Collaborator

Summary

This PR adds an Agent Availability Control Plane to by-framework.

The core change is that GatewayClient.send_message() and AgentContext.call_agent() now share a unified AvailabilityRouter. When a target agent_type has no online worker, callers can choose a route_policy instead of always failing immediately.

Supported policies include:

  • FAIL_FAST: preserve the default behavior and fail when no online worker is available.
  • SEND_ANYWAY: skip online checks and write directly to queue:ctrl:{agent_type}.
  • WAKE_AND_WAIT: emit a wakeup request, wait for a manager decision, then deliver if ready.
  • WAKE_AND_QUEUE: emit a wakeup request and store the command in pending delivery.
  • QUEUE_ONLY: store the command in pending delivery without triggering wakeup.

The control-plane Redis keys are namespaced under byai_gateway:control_plane:*.

Key Changes

  • Added by_framework.core.availability

    • AvailabilityRouter
    • RoutePolicy
    • DeliveryIntent
    • WakeupRequest
    • PendingDelivery
    • WakeupDecision
    • AvailabilityResult
  • Added manager-side reference components

    • WakeupController
    • WakeupProvider
    • DeliveryGate
  • Updated routing APIs

    • GatewayClient.send_message(..., route_policy=..., availability_timeout_ms=..., region=..., priority=...)
    • AgentContext.call_agent(..., route_policy=..., availability_timeout_ms=..., region=..., priority=...)
    • ByaiGatewayClient and ByaiAgentContext forward the same parameters.
  • Unified request_id and execution_id

    • Wakeup requests, wakeup results, pending delivery, and execution tracking now use execution_id as the single correlation ID.
  • Replaced the older online-check flags with a single route_policy

    • Removed require_online_worker
    • Removed offline_route_policy
    • Kept routing behavior explicit through RoutePolicy
  • Added Redis key helpers under RedisKeys

    • byai_gateway:control_plane:mgmt:wakeup
    • byai_gateway:control_plane:mgmt:wakeup:result:{execution_id}
    • byai_gateway:control_plane:mgmt:delivery:pending
    • byai_gateway:control_plane:mgmt:deadletter
    • byai_gateway:control_plane:availability:agent_type:{agent_type}
    • byai_gateway:control_plane:circuit:agent_type:{agent_type}
    • byai_gateway:control_plane:fallback:agent_type:{agent_type}
    • byai_gateway:control_plane:quota:tenant:{user_code}
    • byai_gateway:control_plane:wakeup:dedupe:{agent_type}:{user_code}:{region}

Architecture

The framework now separates delivery intent from wakeup ownership:

  • Framework side:

    • checks online worker availability
    • applies circuit/quota/fallback policy
    • emits wakeup requests
    • waits or queues based on route_policy
    • performs final control-stream delivery
  • Manager/client-owner side:

    • listens to Redis wakeup management events
    • deduplicates concurrent wakeup requests
    • starts containers or signals external systems through WakeupProvider
    • writes WakeupDecision
    • releases pending delivery through DeliveryGate

Agent developers do not need to implement wakeup hooks. As long as they use framework APIs, call_agent() automatically goes through the same availability control plane as send_message().

Tests

Added and updated tests covering:

  • default FAIL_FAST behavior
  • SEND_ANYWAY
  • WAKE_AND_WAIT
  • WAKE_AND_QUEUE
  • QUEUE_ONLY
  • manager READY / FAILED / REJECTED decisions
  • timeout behavior
  • pending delivery dispatch
  • wakeup dedupe
  • quota rejection
  • circuit breaker rejection
  • fallback routing
  • send_message() and call_agent() sharing the same availability logic
  • Redis key prefix coverage
  • execution_id correlation across wakeup and delivery flow

Verification

make lint
make test

Repository owner locked and limited conversation to collaborators May 16, 2026
Repository owner unlocked this conversation May 16, 2026
@zhchxiao123 zhchxiao123 merged commit 5987593 into beyonai:main May 18, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant