Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Learn heterogeneous bandwidths #2743
In order to make good scheduling decisions the scheduler often has to make an estimate for how long transfers will take. Currently, it learns a uniform exponentially weighted moving average based on what the workers observe.
However, this assumption of uniformity breaks down in a few cases:
Learning a model that estimates the total transit time of a piece of data would be useful, but it may also be somewhat tricky. There is a balance to be struck between generalizing across the cluster and data types and learning heterogeneity that may exist.
Also, this needs to be fairly lightweight on the scheduler.