Add topology-aware scheduling (tree + block)#80
Merged
Conversation
Issue #69 (all pods get rank 0): - peer_nodes contains addr:port strings but target_node is a hostname, so starts_with matching always failed, defaulting all ranks to 0 - Fix: derive node_rank from task_offset / tasks_per_node, which the dispatcher increments correctly per node Issue #70 (nodes always show idle): - register_node() unconditionally set state=Idle, but the K8s node watcher re-registers nodes on every Apply event, resetting Allocated/Mixed state back to Idle - Fix: if node already exists, update connection info and resources but preserve current state and allocations Co-Authored-By: Claude <noreply@anthropic.com>
Implements three features needed by the spur-cloud GPUaaS platform: 1. exec_in_job in K8s agent: Uses kube Api<Pod>::exec() to run commands inside job pods. Enables web terminal access via the spur-cloud platform. 2. stream_job_output in K8s agent: Uses Api<Pod>::log_stream() to tail pod logs. Enables real-time log viewing in the web UI. 3. Leader election for spurctld: Adds --enable-leader-election flag that uses K8s Lease API for HA deployments. Standby replicas block until the leader fails to renew, then take over. No-op when flag is absent (bare-metal deploys unaffected). Changes: - Implement exec_in_job using kube ws exec in spur-k8s agent - Implement stream_job_output using kube log_stream in spur-k8s agent - Add leader_election.rs module to spurctld (172 LoC) - Add --enable-leader-election and --election-namespace CLI flags - Add ws feature to kube dependency for exec support - Add kube + k8s-openapi deps to spurctld for Lease API Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implement topology/tree and topology/block scheduling modes for locality-aware multi-node job placement. Closes #76, closes #77. When configured with a switch hierarchy, the backfill scheduler groups candidate nodes by their leaf switch and preferentially selects nodes from the same switch (or closest switches) for multi-node jobs. This reduces network hops and improves communication performance for distributed training workloads. Changes: - New `topology.rs` module with Switch, TopologyTree, distance computation, switch grouping, and locality-aware node selection - TopologyConfig in config.rs: `[topology]` section with plugin ("tree"/"block"), switch definitions, and block_size - `switch_name` field on Node, `topology` field on JobSpec - Backfill scheduler reorders candidates by topology locality when job.spec.topology is "tree" or "block" - `--topology` CLI flag for sbatch - Proto updates: topology field on JobSpec, switch_name on NodeInfo - 4 new scheduling tests + 10 topology unit tests (792 total, 0 failures) Example configuration: [topology] plugin = "tree" [[topology.switches]] name = "rack01" nodes = "gpu[001-018]" [[topology.switches]] name = "rack02" nodes = "gpu[019-036]" [[topology.switches]] name = "fabric0" switches = "rack01,rack02" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
topology/treeandtopology/blockscheduling modes for locality-aware multi-node job placementChanges
spur-core/src/topology.rs:Switch,TopologyTreewith distance computation, switch grouping, and greedy locality-aware node selection[topology]section withplugin("tree"/"block"/"none"),[[topology.switches]]definitions, andblock_sizeswitch_nameonNode,topologyonJobSpec--topology=treeor--topology=block--topologyflag forsbatchtopologyfield 59 onJobSpec,switch_namefield 40 onNodeInfoConfiguration example
Test plan
🤖 Generated with Claude Code