feat: add agent availability control plane#3
Merged
zhchxiao123 merged 2 commits intoMay 18, 2026
Merged
Conversation
Repository owner
locked and limited conversation to collaborators
May 16, 2026
Repository owner
unlocked this conversation
May 16, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds an Agent Availability Control Plane to by-framework.
The core change is that
GatewayClient.send_message()andAgentContext.call_agent()now share a unifiedAvailabilityRouter. When a targetagent_typehas no online worker, callers can choose aroute_policyinstead of always failing immediately.Supported policies include:
FAIL_FAST: preserve the default behavior and fail when no online worker is available.SEND_ANYWAY: skip online checks and write directly toqueue:ctrl:{agent_type}.WAKE_AND_WAIT: emit a wakeup request, wait for a manager decision, then deliver if ready.WAKE_AND_QUEUE: emit a wakeup request and store the command in pending delivery.QUEUE_ONLY: store the command in pending delivery without triggering wakeup.The control-plane Redis keys are namespaced under
byai_gateway:control_plane:*.Key Changes
Added
by_framework.core.availabilityAvailabilityRouterRoutePolicyDeliveryIntentWakeupRequestPendingDeliveryWakeupDecisionAvailabilityResultAdded manager-side reference components
WakeupControllerWakeupProviderDeliveryGateUpdated routing APIs
GatewayClient.send_message(..., route_policy=..., availability_timeout_ms=..., region=..., priority=...)AgentContext.call_agent(..., route_policy=..., availability_timeout_ms=..., region=..., priority=...)ByaiGatewayClientandByaiAgentContextforward the same parameters.Unified
request_idandexecution_idexecution_idas the single correlation ID.Replaced the older online-check flags with a single
route_policyrequire_online_workeroffline_route_policyRoutePolicyAdded Redis key helpers under
RedisKeysbyai_gateway:control_plane:mgmt:wakeupbyai_gateway:control_plane:mgmt:wakeup:result:{execution_id}byai_gateway:control_plane:mgmt:delivery:pendingbyai_gateway:control_plane:mgmt:deadletterbyai_gateway:control_plane:availability:agent_type:{agent_type}byai_gateway:control_plane:circuit:agent_type:{agent_type}byai_gateway:control_plane:fallback:agent_type:{agent_type}byai_gateway:control_plane:quota:tenant:{user_code}byai_gateway:control_plane:wakeup:dedupe:{agent_type}:{user_code}:{region}Architecture
The framework now separates delivery intent from wakeup ownership:
Framework side:
route_policyManager/client-owner side:
WakeupProviderWakeupDecisionDeliveryGateAgent developers do not need to implement wakeup hooks. As long as they use framework APIs,
call_agent()automatically goes through the same availability control plane assend_message().Tests
Added and updated tests covering:
FAIL_FASTbehaviorSEND_ANYWAYWAKE_AND_WAITWAKE_AND_QUEUEQUEUE_ONLYREADY / FAILED / REJECTEDdecisionssend_message()andcall_agent()sharing the same availability logicexecution_idcorrelation across wakeup and delivery flowVerification
make lint make test