Skip to content

Feature: Network IO as a resource dimension #9

@deepjoy

Description

@deepjoy

Summary

Add network bandwidth as a first-class resource dimension alongside CPU and disk in the ResourceSampler system, so the scheduler can factor network saturation into concurrency decisions.

Motivation

TaskMill's IO-aware scheduling currently tracks CPU and disk throughput via ResourceSampler. For network-heavy workloads like S3 transfers, network bandwidth is the primary bottleneck, not disk. Without network awareness, the scheduler can't detect that the link is saturated and will keep launching new tasks that just pile up behind a congested connection.

This is especially important for users on flaky or bandwidth-constrained connections (satellite, mobile tethering, shared office links) where oversubscribing the network causes timeouts and cascading retries.

Proposed Behavior

  • ResourceSnapshot gains a network_throughput_bps field (or network_tx / network_rx split)
  • platform_sampler() includes network interface sampling (via sysinfo or platform APIs)
  • The scheduler's concurrency adjustment logic factors in network utilization alongside CPU/disk
  • A configurable max_network_bandwidth lets users declare their link capacity so the scheduler knows what "saturated" means
  • Integration point with PressureSource: a NetworkPressure source that external bandwidth limiters (token-bucket rate limiters) can feed into, so the scheduler respects application-level bandwidth caps, not just OS-level throughput

Example

let config = SchedulerConfig::builder()
    .max_network_bandwidth_bytes(50 * 1024 * 1024) // 50 MB/s link
    .network_saturation_threshold(0.85)             // throttle at 85%
    .build();

Design Considerations

  • Network metrics are noisier than disk/CPU — the SmoothedReader approach already in use would help here
  • Users may have multiple network interfaces; should support selecting which interface(s) to monitor, or measuring aggregate
  • For containerized / cloud deployments, OS-level network stats may not reflect the actual available bandwidth — the PressureSource integration is key here as a fallback

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions