Skip to content

Data Generation Workloads

Andrew Grant edited this page Mar 29, 2024 · 1 revision

The Create Datagen button on the OpenBench sidebar can be used to create Data Generation workloads. The options on this page are very similar to the Create Test page. Those options will not be reiterated, and you should refer to the "Creating Tests" wikipage for that information. The datagen page adds a few new options, which are explain below. But before explaining the new options, we must first go over the "genfens" interface of OpenBench.

Genfens Interface

When testing engines, opening books are used. These books tend to have at most a few hundred thousand lines. But for Data Generation, it is very likely that you'll be playing tens of millions of games. Books like that would be cumbersome to upload and manage, and even more so to have workers download. As a result, OpenBench has a genfens interface, which allows you to build your opening books on the fly, internally in your engine. To do this, the Dev engine in a Datagen workload will be executed from the command line as follows:

./engine "genfens N seed S book <None|Books/my_book.epd> <?extra>" "quit"

N is the number of openings that you must generate. S is a seed, given uniquely to each workload and worker, to allow you to seed random number generation better. The book, possibly to be used as a starting point for your generation, may be passed. A value of None is also possible. Lastly, there is room for any additional arguments that you want, which can be provided when creating the Datagen workload.

It is expected that your engine will print N outputs to stdout, with each line having the form of info string genfens <fen>. OpenBench will run multiple copies of your engine at once to do this, so there is no need to implement any sort of multi-threaded generation internally. Additionally, it might be a good idea to limit or disable that vast majority of UCI output during generation, just to reduce overhead.

Lastly, to detect stalled and crashed engines, If more than 15 seconds has elapsed since the last opening line was obtained, the engines will be aborted, a client error will be thrown, and an error will be reported back to the OpenBench server.

Data Generation Options

As stated earlier, almost everything on the Datagen creation page is seen on the Test creation page. One new addition is the Genfens Args text box. This allows you to pass additional arguments through the genfens interface. Perhaps your implementation will allow you to pass a custom depth, custom line length, or custom threshold. You are free to parse those arguments however you like. It would not be advised to include double-quotes in your arguments.

There is also a Yes or No option called Play Reverses. When using fixed nodes or fixed depth, it is expected that a properly written engine will produce identical results for the games. As a result, playing reverses during self-play does not make sense. However, adversarial play, or playing against a different branch of the same engine, would produce different games, and perhaps the reverses would be desired. By default the option is set to No, to avoid possible wasted effort.

Lastly, the Opening Book, while present on all other workload pages, has a different purpose here. The book will be downloaded, and the name of the book will be passed to the genfens interface. This allows you to use the lines in the opening book as a starting point for your generation algorithm. Or perhaps to simply return just the book's original lines. The seed value is nicely stepped along, such that it would match the opening index of the book in a typical workload.

Example

$ ./torch "genfens 4 seed 123 book None" "quit"
Torch by A. Grant, F. Eggers, K. Kahre, M. Whiteley, J. Honnold from Chess.com
info string found 0 book lines
info string genfens rnbqkbnr/1p1p1p2/2p5/p3p1pp/P2P1P2/8/1PP1P1PP/RNBQKBNR w Kkq - 0 6
info string genfens rnbqk1nr/p1pp1pp1/3b3p/1p2p3/3P4/N1P2NP1/PP2PP1P/R1BQKB1R w KQkq - 3 6
info string genfens rnbqk2r/p1pp1p1p/1p3n2/2b1p1p1/2N3P1/3P1P2/PPP1P2P/R1BQKBNR w KQkq - 4 6
info string genfens rnb2bnr/pp1ppkp1/1qp4p/5P2/5P2/8/PPPPP2P/RNBQKBNR w KQ - 1 6