Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

buildingplan can crash when world->selected_building contains garbage data #1762

Closed
myk002 opened this issue Jan 27, 2021 · 4 comments · Fixed by DFHack/scripts#246
Closed
Assignees
Projects

Comments

@myk002
Copy link
Member

myk002 commented Jan 27, 2021

I've seen multiple, but rare, crashes in buildingplan_room_hook::getNoblePositionOfSelectedBuildingOwner (in the interposed feed handler) where world->selected_building->getType() is called

From inspection of the stack trace, world->selected_building appears to be garbage. This is unfortunate since if it's non-null it is supposed to be usable.

here's the dump of the garbage data:
(gdb) print *world->selected_building $3 = {<DFHack::virtual_class> = {<No data fields>}, _vptr.building = 0xd2d2d2d2d2d2d2d2, x1 = 0, y1 = -757935406, centerx = -757935406, x2 = -757935406, y2 = 0, centery = 0, z = 1, flags = {whole = 0, bits = {exists = 0, site_blocked = 0, room_collision = 0, anon_1 = 0, justice = 0, almost_deleted = 0, in_update = 0, from_worldgen = 0}}, mat_type = 19, mat_index = -757935406, room = {extents = 0x7fffe0425d30, x = 68, y = 0, width = 0, height = 0, static _identity = {<DFHack::compound_identity> = {<DFHack::constructed_identity> = {<DFHack::type_identity> = { _vptr.type_identity = 0x7ffff7df3ff0 <vtable for DFHack::struct_identity+16>, size = 24}, allocator = 0x7ffff7a89114 <void* df::allocator_fn<df::building_extents>(void*, void const*)>}, static list = 0x7ffff7fc96a0 <DFHack::dfhack_lua_viewscreen::_identity>, next = 0x7ffff7f81740 <df::building_drawbuffer::_identity>, dfhack_name = 0x7ffff7caf6f1 "building_extents", scope_parent = 0x0, scope_children = std::vector of length 0, capacity 0, static top_scope = {<std::_Vector_base<DFHack::compound_identity*, std::allocator<DFHack::compound_identity*> >> = { _M_impl = {<std::allocator<DFHack::compound_identity*>> = {<__gnu_cxx::new_allocator<DFHack::compound_identity*>> = {<No data fields>}, <No data fields>}, <std::_Vector_base<DFHack::compound_identity*, std::allocator<DFHack::compound_identity*> >::_Vector_impl_data> = {_M_start = 0x7fffe06e06d0, _M_finish = 0x7fffe06e3cb0, _M_end_of_storage = 0x7fffe06e46d0}, <No data fields>}}, <No data fields>}}, parent = 0x0, children = std::vector of length 0, capacity 0, has_children = false, fields = 0x7ffff7edc7e0 <df::building_extents_fields>}}, age = 1, race = -11566, id = -1073589904, jobs = std::vector of length -7, capacity -406913472656310278 = {<error reading variable>

@myk002
Copy link
Member Author

myk002 commented Jan 27, 2021

I have only reproduced this when creating or destroying buildings with quickfort, so I'm thinking race condition. Plugins don't run while scripts are executing, right?

My first thought of a workaround is to get the building from the cursor position instead of the world pointer, but I'm going to spend a little more time investigating before coding a fix.

@lethosor
Copy link
Member

0xd2d2d2d2d2d2d2d2 as a vtable pointer likely indicates a use-after-free issue - MALLOC_PERTURB_=45 from the DFHack launcher would set uninitialized memory to 0xd2, and freed memory to 0x2d.

DF itself doesn't always initialize world->selected_building, which is why Gui::getSelectedBuilding() and buildingplan's custom logic check for ui_sidebar_mode::QueryBuilding first. I would be inclined to suggest that buildingplan at least replace the direct field access with a call to getSelectedBuilding(), but it doesn't look to me like that would add any extra checks; it would just repeat some of the existing buildingplan-side checks. (buildingplan's checks are more strict in any case, because it only cares about q mode)

We'd probably need to track down where quickfort is creating/destroying buildings - does it seem to occur with either? I am particularly interested in quickfort destroying buildings - how does it do that (and why / in what situations)?

Plugins don't run while scripts are executing, right?

I may be forgetting something, but my understanding is:

  • Commands that could modify game data (which includes all scripts and any plugin commands that opt in) grab a mutex (through CoreSuspender), which would block the DF simulation thread, and would run any tasks waiting on the mutex in sequence, not in parallel. Commands can call other commands, though (but might deadlock if those commands also try to synchronize with DF - not sure)
  • vmethod hooks are called directly by DF from the simulation thread. They could also be called from other commands by calling DF vmethods directly (e.g. screen:render())

getNoblePositionOfSelectedBuildingOwner() appears to be used exclusively by vmethod hooks in buildingplan - are you calling any vmethods directly from quickfort, perhaps?

@myk002
Copy link
Member Author

myk002 commented Jan 27, 2021

I don't call vmethods directly, but when applying #query blueprints, I do set df.global.ui.main.mode = df.ui_sidebar_mode.QueryBuilding, which may bypass any initialization the UI might do.

Then I proceed to call guidm.setCursorPos(pos) and dfhack.gui.refreshSidebar(), which would lead to the screen being re-rendered.

so, possible fixes:

  1. in quickfort, don't set the mode directly, but rather feed keys to the UI to do it
  2. in quickfort, manually initialize world->selected_building properly whenever we move the cursor
  3. in buildingplan, use Buildings::findAtTile() to get the current building instead of the world->selected_building

I think (1) is the best option. Switching to query mode from any screen is easy -- just escape until in Default and then 'q'. Getting back to whatever mode we started in might be more complicated, though. 'k', 't', 's', 'bo'..what else allows the cursor to be active? Maybe I should just change the validation condition from 'cursor is active' to 'screen is in look, query, or inspect mode' (not sure if that's what they're called -- whatever 't', 'q', and 'k' are).

(2) sounds brittle and (3) will both slow down buildingplan (linear search. ugh) and won't solve the problem for any other plugins that make the same reasonable assumption that buildingplan does -- that if you're in query mode and world->selected_building is non-null, then it's usable.

@lethosor
Copy link
Member

  • in quickfort, don't set the mode directly, but rather feed keys to the UI to do it

I see I'm late, but for the record, I definitely think this would help.

  • in quickfort, manually initialize world->selected_building properly whenever we move the cursor

I had a related idea: just set selected_building to null every time you change UI state. Maybe even just once would work. The downside is that it's entirely possible that there are other fields with similar issues, and in most cases they would probably pop up less often. Relying on a known good implementation (i.e. triggering DF's, which at least hopefully is correct) is IMO the better approach.

As a side note: Buildings::findAtTile() does cache results to an extent, so for a map that's not changing, it should average to constant time.

@lethosor lethosor added this to To do in 0.47.04-r5 via automation Jan 29, 2021
@lethosor lethosor moved this from To do to Done in 0.47.04-r5 Jan 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
0.47.04-r5
  
Done
Development

Successfully merging a pull request may close this issue.

2 participants