Summary
Add network bandwidth as a first-class resource dimension alongside CPU and disk in the ResourceSampler system, so the scheduler can factor network saturation into concurrency decisions.
Motivation
TaskMill's IO-aware scheduling currently tracks CPU and disk throughput via ResourceSampler. For network-heavy workloads like S3 transfers, network bandwidth is the primary bottleneck, not disk. Without network awareness, the scheduler can't detect that the link is saturated and will keep launching new tasks that just pile up behind a congested connection.
This is especially important for users on flaky or bandwidth-constrained connections (satellite, mobile tethering, shared office links) where oversubscribing the network causes timeouts and cascading retries.
Proposed Behavior
ResourceSnapshot gains a network_throughput_bps field (or network_tx / network_rx split)
platform_sampler() includes network interface sampling (via sysinfo or platform APIs)
- The scheduler's concurrency adjustment logic factors in network utilization alongside CPU/disk
- A configurable
max_network_bandwidth lets users declare their link capacity so the scheduler knows what "saturated" means
- Integration point with
PressureSource: a NetworkPressure source that external bandwidth limiters (token-bucket rate limiters) can feed into, so the scheduler respects application-level bandwidth caps, not just OS-level throughput
Example
let config = SchedulerConfig::builder()
.max_network_bandwidth_bytes(50 * 1024 * 1024) // 50 MB/s link
.network_saturation_threshold(0.85) // throttle at 85%
.build();
Design Considerations
- Network metrics are noisier than disk/CPU — the
SmoothedReader approach already in use would help here
- Users may have multiple network interfaces; should support selecting which interface(s) to monitor, or measuring aggregate
- For containerized / cloud deployments, OS-level network stats may not reflect the actual available bandwidth — the
PressureSource integration is key here as a fallback
Summary
Add network bandwidth as a first-class resource dimension alongside CPU and disk in the
ResourceSamplersystem, so the scheduler can factor network saturation into concurrency decisions.Motivation
TaskMill's IO-aware scheduling currently tracks CPU and disk throughput via
ResourceSampler. For network-heavy workloads like S3 transfers, network bandwidth is the primary bottleneck, not disk. Without network awareness, the scheduler can't detect that the link is saturated and will keep launching new tasks that just pile up behind a congested connection.This is especially important for users on flaky or bandwidth-constrained connections (satellite, mobile tethering, shared office links) where oversubscribing the network causes timeouts and cascading retries.
Proposed Behavior
ResourceSnapshotgains anetwork_throughput_bpsfield (ornetwork_tx/network_rxsplit)platform_sampler()includes network interface sampling (viasysinfoor platform APIs)max_network_bandwidthlets users declare their link capacity so the scheduler knows what "saturated" meansPressureSource: aNetworkPressuresource that external bandwidth limiters (token-bucket rate limiters) can feed into, so the scheduler respects application-level bandwidth caps, not just OS-level throughputExample
Design Considerations
SmoothedReaderapproach already in use would help herePressureSourceintegration is key here as a fallback