Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ruby win64 crash #1023

Closed
9 tasks done
lethosor opened this issue Nov 27, 2016 · 17 comments
Closed
9 tasks done

Ruby win64 crash #1023

lethosor opened this issue Nov 27, 2016 · 17 comments
Assignees
Milestone

Comments

@lethosor
Copy link
Member

lethosor commented Nov 27, 2016

This came up on IRC around alpha2, and someone reported an exterminate crash today that's likely the same issue.

A couple things to check, from @jjyg:

  • Does calling vmethods work? Try rb df.curview.feed_keys(:SELECT)
  • Alignment - try rb p df.world

Some scripts that have come up (and should probably be checked to make sure they don't crash before r1, etc.):

@ab9rf
Copy link
Member

ab9rf commented Nov 27, 2016

I am seriously considering rewriting autofarm in C++, just because I use it so much and I don't want to dependent on Ruby for it to work. It's not like it would be hard to do in C++.

@lethosor
Copy link
Member Author

Hopefully this sort of thing happens rarely (besides this, the only time I remember having to fix Ruby support was when OS X stopped distributing Ruby 1.8, before the ruby plugin supported 2.0), but I won't object if you decide writing it in C++ would be better.

@cc-corey
Copy link

I believe I'm getting the same crash with source

@lethosor
Copy link
Member Author

Yeah, this issue is still open.

Can you try the commands listed in the first checklist and see if they crash?

@cc-corey
Copy link

cc-corey commented Dec 13, 2016

Some testing. The info from adaptation might show a way forward.

exterminate - crashes
growcrops - crashes
autofarm - crashes
ban-cooking - crashes
deathcause - crashes
adaptation - does not immediately crash on just running "adaptation". Typing "adaptation" show gives error messages but "adaptation show him" crashes
source - crashes

I would also not that sources gave me an error about not having ruby framework (I don't have the exact error anymore) in the old version of dfhack. I could load that up if it would be helpful. The crash is immediate with no error message or other feedback.

Also tested

autounsuspend - without options gives "stopped" adding start causes crash

@lethosor
Copy link
Member Author

You definitely have a ruby library. 0.43.05-alpha1 didn't, but alpha2 does.

Can you try the two commands in the first checklist from above?

  • rb df.curview.feed(:SELECT)
  • rb p df.world

@cc-corey
Copy link

Yes, I understood that was the difference. Just wanted to add the info in case it helps people understand where things are.

rb df.curview.feed(:SELECT) - crashes
rb p df.world - crashes

fyi just running rb does not crash.

@lethosor
Copy link
Member Author

Yeah, rb by itself doesn't do anything. That's good to know, though. I was hoping it would be just one, but it sounds like a couple things are issues.

What about stuff like rb p df.ui, rb p df.current_weather, etc.?

@ab9rf
Copy link
Member

ab9rf commented Dec 13, 2016

autounsuspend with no options doesn't access anything in DF memory; it just checks the value of a variable in ruby namespace. But autounsuspend stop calls df.onupdate_register, and that's presumably where the crash occurs.

There's obviously still a problem with ruby calling at least some entry points in the DF main code, including apparently STL utility methods (which are used by the Ruby formatters for the world object, for example).

Do relatively "simple" global variables (e.g. rb p df.cursor, rb p df.cur_year) work?

@lethosor
Copy link
Member Author

lethosor commented Dec 14, 2016

onupdate_register just calls a few functions in ruby.cpp (setting a few onupdate_ variables):

        def onupdate_register(descr, ticklimit=nil, initialtickdelay=0, &b)
            raise ArgumentError, 'need a description as 1st arg' unless descr.kind_of?(::String)
            @onupdate_list ||= []
            @onupdate_list << OnupdateCallback.new(descr, b, ticklimit, initialtickdelay)
            DFHack.onupdate_active = true
            if onext = @onupdate_list.sort.first
                DFHack.onupdate_minyear = onext.minyear
                DFHack.onupdate_minyeartick = onext.minyeartick
            end
            @onupdate_list.last
        end

cur_year and related globals look reasonable, although I don't know if they're right:

        <global-address name='cur_year' value='0x141904248'/>
        <global-address name='cur_year_tick' value='0x141904240'/>
        <global-address name='cur_year_tick_advmode' value='0x141884110'/>
        <global-address name='cur_season_tick' value='0x1413030d4'/>
        <global-address name='cur_season' value='0x1413030e5'/>

Can you try printing those (rb p df.cur_year, etc.) and see if they're right?

Edit: @ab9rf confirmed a crash with rb p df.cur_year on IRC.

@cc-corey
Copy link

cc-corey commented Dec 14, 2016

Okay, here is the next round of testing.

rb p df.cursor - crash, with our without a cursor ('k')
rb p df.ui - crash
rb p df.cur_year - crash
rb p df.current_weather - crash
rb p df.cur_year_tick - crash
rb p df.cur_year_tick_advmode - crash
rb p df.cur_season - crash
rb p df.cur_season_tick - crash

I'm sorry to say it looks like they are all off. I only tested on a single save game but I have no reason to believe that anything is wrong with it. All of the lua stuff I've used is working great and dfhack doesn't otherwise seem to be having issues. Is there a log file I could go looking for to see if that provides any additional insight? Did I miss anything you wanted tested?

And now back to a bit of playing df! Cheers

@lethosor
Copy link
Member Author

Oh, yeah, we figured out the issue with globals on IRC, so I'm not surprised that all of those are affected. Basically, somewhere in the function that translates global names to addresses, the upper 32 bits are truncated (e.g. 0x1413030e5 becomes 0x413030e5), which is bad.

@ab9rf
Copy link
Member

ab9rf commented Dec 14, 2016

I just spent some time with the MSVC debugger poking around in this. The problem is the use of rb_uint2inum to translate pointers to a Ruby value (e.g. https://github.com/DFHack/dfhack/blob/develop/plugins/ruby/ruby.cpp#L583). This is presumably because in win64, "long" is 32 bits, and thus the "long long" type is required to get all 64 bits in win64. (I'm told that on linux64, "long" is 64 bits.) The result is to truncate pointers to 32 bits when they're converted to a Ruby value, which makes them invalid for later reference.

I was able to make a small bit of progress by replacing rb_uint2inum to rb_ull2inum in that one place. With this change, DFHack.get_global_address returns the full 64-bit address. But there are other places in ruby.cpp that also need to be changed, presumably to use rb_num2ull (instead of rb_num2ulong) everywhere that pointers are being handled. Furthermore, this might result in code that doesn't work properly on linux.

I'm not up for trying to test this further tonight, so I'll leave this here for anyone who wants to pick it up, or I'll try to work on it further either later tonight or tomorrow. Someone willing to test whether using rb_ull2inum and rb_num2ull cause breakage on linux would be helpful.

p.s. Thanks @lethosor for your help with this on irc.

@ab9rf
Copy link
Member

ab9rf commented Dec 14, 2016

I've made the changes suggested above and with these changes ruby appears to work on win64. I was able to do rb p df.cur_year and rb p df.world just fine, and starting autofarm in a loaded world worked without any trouble at all.

I have the modifications in a branch of my repo, at 36c12d8.

@lethosor
Copy link
Member Author

lethosor commented Dec 14, 2016

Those changes look like they work on OS X (both builds), so they ought to work on Linux too.

Interesting issue: rb df.curview.feed(:SELECT) is resulting in an error for me, and rb df.curview.feed(0) crashes, even before your changes. I don't remember that happening in 0.43.05-alpha2.

Edit: that's because the actual method is feed_keys, which works just fine.

@ab9rf
Copy link
Member

ab9rf commented Dec 14, 2016

All of the scripts named earlier in this issue work (or at least do not crash) for me on win64 with the patch in #1041

@lethosor
Copy link
Member Author

Cool, I'll close this. Hopefully we can get a build up fairly soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants