Skip to content

Workload Assignment

Andrew Grant edited this page Jan 30, 2024 · 11 revisions

There are four critical components at play when a worker requests a workload.

  1. Priority: Workloads with a higher priority are completed before workloads of lower priority, in general. However, this does not strictly mean that the highest priority workload will be returned to the worker. The highest priority workload, that the worker is capable of completing, will be assigned. A worker must have the proper Operating System, CPU Flags, Syzygy requirements, Thread counts, Compiler Versions (public engines), and Fine-grained Tokens (private engines), to be able to accept a workload. The list of all such highest priority workloads that a worker can complete comprises the candidate assignments.

  2. Throughput: Every workload has a throughput value associated with it. By default, workloads are created with a throughput of 1000. Suppose there is exactly one workload running, with Throughput=1000. If we want to create another workload, and have it get twice a many resources, we would create one with a Throughput=2000. This means that ~2/3rds of all resources will go to the new workload. If we wanted a third workload, which got 1/2 of all resource, we would need to create one with Throughput=3000. This is because the total throughput is now 6000, and the workload itself has 3000. When summing up the throughput values, we only consider workloads in the candidate assignments.

  3. Engine Balancing: The balance_engine_throughputs option in the main configuration, if enabled, will scale the throughput of all workloads for an engine, by dividing by the number of workloads that the engine has at once, in the candidate assignments. For example, if Ethereal has a two workloads at Priority=1, and three workloads at Priority=0, then the throughput will be scaled down by a factor of 2, not by a factor of 5.

  4. Focus Enabled Workers: The client has an option, --focus ENGINE1 [ENGINE2 ...] which allows the user to specify a list of engines which they would prefer to have their machine contribute games towards. If a focused engine appears in the candidate assignments, then all non-focused engines will be excluded. Furthermore, the balancing algorithm will ignore a machine which is assigned to one of the focused engines, when assigning workloads to non focus-assigned workers. This makes it such that connecting your own machines for your own engine will be a strict increase in the resources your workloads get. If a machine is given an assignment whose dev engine appears in their focus list, we call this a focus-assigned machine


Workload Selection Algorithm:

  1. Identify all workloads which the machine is capable of completing. Filter that list down such that only the highest priority workloads remain. If the machine was run with --focus, and some of the focused engines appear in the list still, then remove all engines which are not focused. These are the candidate assignments.

  2. Determine how many threads are currently assigned to each of the candidate workloads . If the machine is not going to be focus-assigned, then ignore the thread contribution to the candidate workloads that come from other focus-assigned machines. The conditional ignore is needed, in order to still balance amongst focus-assigned machines.

  3. Compute the effective-throughput for each workload. If balance_engine_throughputs is not enabled, then the throughput is the effective-throughput. Otherwise, divide each individual workloads' throughput, by the number of workloads from the workloads' dev engine which appear in the candidate assignments.

  4. Compute the resource ratios for each workload. The ratio for a workload is the number of assigned threads, divided by the effective-throughput. For example: Workload #1 has 1000 effective-throughput and 32 threads. Workload #2 has 2000 effective-throughput and 16 threads. The ratios for workload 1 and 2 would be 0.032 and 0.008 respectively. This tells us that Workload #1 is getting 4x the resources as Workload #2. A 2x factor comes from the 1000 vs 2000 throughput, and another 2x factor comes from the 32 vs 16 threads.

  5. Compute the fair resource ratio. This is the total number of threads on the candidate workloads, divided by the sum of the effective throughputs. In our example, this would be 0.016. From this, we see that Workload #1 is getting twice as many resources as would be fair, and Workload #2 is only getting half the resources as would be fair.

  6. For efficiency, we would like the machines to repeat the same workload multiple times, to reduce overhead on downloading and building engines. If the machine 's most recent workload is in our candidates, and no workload is receiving less than 75% of the fair amount of resources, then repeat the same workload.

  7. Return the workload which has the lowest ratio. If many workloads have a shared lowest ratio, select from them at random, using the effective throughput for weighting.


Concurrency Settings

The server provides three critical values in the workload JSON response. These are cutechess-count, concurrency-per, and games-per-cutechess. These values are a function of the number of threads and sockets, as well as the nature of the test or tune. They are explained below.

  1. cutechess-count indicates the number of cutechess copies that should be running at one time. For a typical workload, where each engine is playing with one thread, cutechess-count will be equal to the number of sockets on the worker, as provided via --nsockets or -N when starting the Client. If the workload is an SPSA tune, using the MULTIPLE method of distributing SPSA-points, then cutechess-count will be the maximum number of concurrent games divided by two. Finally, if the previous condition is not true, and the workload uses more than 1 thread for either engine, then cutechess-count will be set to 1.

  2. concurrency-per indicates the number of concurrent games that will be played, for any particular cutechess copy that is running. If the workload is an SPSA tune, using the MULTIPLE method of distributing SPSA-points, then this value will be 2. Otherwise, it will be the maximum number of concurrent games, which is defined as (threads // cutechess-count) // max(dev_threads, base_threads).

  3. games-per-cutechess is the number of games to play total, on each particular cutechess copy that is running. Once again, SPSA tunes using the MULTIPLE method are a special case, and will play 2 * workload_size games, ie a game-pair for each workload_size. The general case will instead play 2 * workload_size * concurrency-per, ie a game-pair for each workload_size for each possible concurrent game.