[blueprint] support multiple output file formats #1965

myk002 · 2021-09-11T07:28:10Z

Part of milestone 2 of #1842

Add the ability to output in either "pretty" format (equivalent to the old behavior) or "minimal" format (the new default). Minimal blueprints don't include any annotations or "shadows" of building footprints, but are much faster to read and write. This allows us to optimize for the use case where a player is generating blueprints to play back later without manual editing. More advanced users that want to edit the generated blueprints should continue to use "pretty" format.

I am confident that there was no behavior change for "pretty" blueprints since the quickfort ecosystem integration/regression tests continue to pass without modification. I did make a few modifications to the ecosystem test harness to make it more efficient, but I did not need to change any of the golden regression test files.

This PR involved rewriting the core blueprint data structures and algorithm to allow the plugin to support multiple output formats. The new data structures will also support features planned for future milestones. My first attempt used sparse std::maps and std::strings for all data, but that ended up being much slower than the old implementation and, more importantly, ran out of memory and crashed on larger maps.

My second attempt used sparse std::maps and const char *, which significantly cut down on runtime, but was still too memory hungry. This was actually not as complex as I feared since pointers to string literals in the code can be passed up the stack without having to dynamically copy them into the heap. I did implement a static cache for the few strings that were constructed on the stack so that the pointers would be valid when returned from functions.

My third attempt used sparse std::maps for the higher-level z and y coordinate structures but a pre-allocated std::vector for the x coordinate structure (the one that actually held the const char * pointers). This was the key that brought both the runtime and memory utilization down below the original implementation.

Testing on a 16x16 embark for maximum scale (that's 768 by 768 by 198 = 116 million tiles), I get the following numbers:

old implementation: 18s, 1.1G memory
new implementation, pretty format: 17s, 0.6G memory
new implementation, minimal format: 8s, 0.6G memory

The difference in runtime between the new pretty and minimal formats is mostly due to I/O.

I did a further experiment using only std::vectors and no maps at all. This brought the runtime down by one second for the minimal format, but memory utilization stayed the same. However, it significantly complicated the code and required a lot of manual indexing and careful memory management. I decided that a 1 second savings on a 16x16 embark is just not worth the complexity cost. For the common, non-pathological case, the blueprint area will be much smaller and runtime will be near-instantaneous. I don't need to optimize for more speed. I'm satisfied that the new data structures have half the memory footprint of the old data structures, and I am confident that anything that worked before will continue to work.

The test is a little biased because most of the map was solid wall, which incurs a memory cost for the old implementation but is zero cost for the new implementation (we now only allocate memory if we have something to store). If the entire map were hollowed out then the new implementation would likely be on par with the memory consumption of the old algorithm.

I will be continuing to make significant structural changes to the blueprint plugin in the upcoming weeks and months, so I don't want to spend your time on a review quite yet. Perhaps if we get close to releasing a new version of DFHack, or once meta blueprints are implemented in milestone 5, whichever comes first.

pretty is currently equivalent to the previous format, though there will be changes in the future

so we can actually process large maps without OOMing

myk002 added 6 commits September 10, 2021 12:46

implement --format option for minimal and pretty

aac7df1

pretty is currently equivalent to the previous format, though there will be changes in the future

use const char *, not std::string for efficiency

4fc0727

so we can actually process large maps without OOMing

update unit tests

c785baa

only run dig-now over the test area

bf7405f

use vector instead of map for great memory savings

f6eabc3

clean up, document

985f5d1

myk002 added this to In progress in 0.47.05-r4 via automation Sep 11, 2021

update docs

e4d3514

myk002 mentioned this pull request Sep 12, 2021

[gui/blueprint] support format param DFHack/scripts#334

Merged

myk002 merged commit 413917e into DFHack:develop Sep 17, 2021

0.47.05-r4 automation moved this from In progress to Done Sep 17, 2021

myk002 deleted the myk_blueprint_format branch September 17, 2021 17:57

myk002 mentioned this pull request Sep 17, 2021

Revert "[blueprint] support multiple output file formats" #1967

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[blueprint] support multiple output file formats #1965

[blueprint] support multiple output file formats #1965

myk002 commented Sep 11, 2021 •

edited

[blueprint] support multiple output file formats #1965

[blueprint] support multiple output file formats #1965

Conversation

myk002 commented Sep 11, 2021 • edited

myk002 commented Sep 11, 2021 •

edited