A Python module/script for using simulated annealing to solve the problem of dividing different-sized tasks among workers.
This was originally intended as a means to come up with a fair assignment of stories to participants in a writing competition, wherein each participant in the preliminary round of judging would judge a few other participants' stories. The demand/inspiration for this little excursion in numerical methods came from RogerDodger's efforts to orchestrate such writing competitions, which he is formalizing in this project. Shortly before he began said project, he told me he was trying to find a good way of fairly assigning stories to participants for judging (so they would all get, more or less, the same amount of reading/judging work). I encouraged him to use simulated annealing because it is by far the simplest method. After seeing what he came up with, I challenged myself to write something better using Python that he could use. The result was "simanneal.py"...
- There are N "workers", each bringing to the table a "job" that is performed M times by M different workers
- No worker does its own job
- No worker does any job more than once (implied by #1)
- Jobs are each characterized by a "weight" in discrete units, and unique jobs can have different numbers of work units.
- The end result must be that each worker has close to the same number of work units In the write-off problem, the units are (of course) words.
Physical Description: The Energy Function For System Partitions
I advocated this approach: think of the system as an ice cube tray with each cell filled with slightly different quantities of water, and how agitating the tray will even the volume of water in each cell. Specifically, each system partition should be treated as a single-file stack of unit-mass particles (each a word in a story assigned to that participant), evenly-spaced (at unit length), in a uniform and unitary gravitational field. The potential energy of each particle would thus be equal to g (1) times the height (n, the number of particles underneath it) times the mass (1). Hence, for a stack of N particles, the energy is a sum over n for n from 1 to N, or simply n*(n+1)/2. The goal of simulated annealing in this problem is to minimize the system-wide total of this quantity by transferring groups of these particles between system partitions at random, selecting whether to transfer based on the resulting difference in energy according to the Metropolis algorithm.
The Full Story
The first attempt that Roger made (and ended up using) was a shoddy Perl script that (in thermodynamic terms) was simply a zero-temperature downhill slide to an equilibrium point that might not even be the global extremum. Furthermore, the state space was constrained to states where the number of stories assigned to each participant was the same for all participants, and changes between states involved swapping of stories rather than transferring them. In my self-imposed challenge, I allowed workers to have an arbitrary number of jobs, so long as the basic rules of the problem were adhered to. 12-15 hours later, I had two classes written and a script that not only performed the simulated annealing but displayed a bar chart of the system. I even made a video of it in action. Lo and behold, however, it was far less efficient than the shoddy script that RogerDodger wrote in less than an hour, but furthermore it didn't perform nearly as well.
After re-examining what he had written, I realized that the energy difference imposed by transferring a story at random should be far higher than simply trading two stories at random (the difference in energy is based on the difference between the sizes of the two stories, rather than the size of a whole story). Thus, the likelihood of a transfer decreasing the overall energy in the system would be lower, because the disparity in energy between two partitions would need to be far greater. Thus, transfers would be far less likely as the temperature decreased, and the result would be a multitude of wasted iterations making random choices for transfers that get rejected due to their extremely low transfer probability.
Then, in the middle of the night, having not showered, I set out write something far more efficient. Two hours of hacking later, I had simanneal-v2.py: a stupidly simple, list-based script (integer indices used for everything). It outperformed both my old script and RogerDodger's script, resulting in a standard deviation (in word counts among judges) that got below 50 in an arrangement where the standard deviation in word count among all the stories was around 558. Specifically, this assortment of stories (wordcount - title):
- 8779 Consonance
- 12637 Bittersweet Music
- 2735 The WestFillya Waltz
- 2780 Memories of Chaos
- 7961 Bon Bon Bon Bon Bon Bon
- 5864 His Heart Too Full For Words
- 2579 The Manticore Problem
- 5799 Joie de Vivre
- 2557 The Sound of Raindrops on Slate
- 2917 My Sunshine
- 7114 You'll Never Know Until You Try It
- 2583 Every Night Is a Swan Song
- 2525 A Deck with No Hearts
- 2835 War is Hay
- 2786 The End of the Season
- 6776 The Good You Might Do
- 3378 Candy Crisp
- 3753 Made of Dreams
- 4222 Melody of Solace
- 3186 Unwanted Song
- 3017 Can I keep it? Please, oh PLEASE?
- 2545 On Loyalty
The cooling schedule I'm most fond of is exponential decay with a time constant proportional to the total number of workers. The results it got usually took a minute or two (or five) to drop below 80, far longer to crystallize into ultra-optimal configurations.