Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regressions with erlang half word emulator #3

Closed
hungryblank opened this issue Jun 27, 2011 · 13 comments
Closed

regressions with erlang half word emulator #3

hungryblank opened this issue Jun 27, 2011 · 13 comments

Comments

@hungryblank
Copy link

We're currently using jiffy in our application and we see some test regressions when using the half word emulator.

Specifically we're observing that some function clauses stop matching in other sections of our code, if we exclude jiffy calls from our code by providing directly the parsed structures all our test go back to succeeding, this behavior is observed only using the half word emulator and everything works just fine using the pure 32 or 64 bits VMS.

We're unable to provide a breaking test that can be published but we can provide access to an environment where is possibile to observe the behavior.

@davisp
Copy link
Owner

davisp commented Jun 27, 2011

Most intriguing. Is Jiffy the only NIF you're using in the VM? I haven't had any reports of this before but I'm not certain how popular the half word emulator is. I'll try and reproducing with some tests locally before asking for access.

Also, when you say function clauses stop matching, you mean, that you call the function, it throws a function clause error but everything looks kosher?

@hungryblank
Copy link
Author

What I mean by function clauses stopping to match is the following

given the function

    by_id(1000) ->
        [{color, red}, {x, 2}, {y, 3}];
    by_id(1001) ->
        [{color, blue}, {x, 3}, {y, 4}];
    by_id(1002) ->
        [{color, green}, {x, 5}, {y, 7}];
    by_id(Any) ->
        throw({not_found, [Any]}).

When using the half word emulator calling by_id(1001) will actually throw the {not_found, [Any]} , everything instead is normal running 32 or 64 bit VMs and the 2nd claused is matched and the return value is proper.

This happens only when jiffy calls are involved, still we're not sure whether jiffy is the actual cause, what we tried was to inline the result of the jiffy encoding/decoding commenting out the actual call and this made our test pass.

The only other NIF we're using is https://github.com/vinoski/erlsha2 which seems to not cause problems. I'm aware of the fact that Steve Vinoski was in touch with the rebar team in order to have his NIF compiled properly by rebar http://lists.basho.com/pipermail/rebar_lists.basho.com/2011-May/000821.html

I also wrote on the erlang questions mailing list looking for clarifications on this case, you can follow the thread here http://erlang.org/pipermail/erlang-questions/2011-June/059650.html

Thanks for your help

@davisp
Copy link
Owner

davisp commented Jun 27, 2011

Most odd indeed. I don't think the rebar thread would affect this but I could be wrong. So far as I can tell it appears that a term returned from Jiffy is being considered something weird. Could you by chance try logging what gets passed as Any in that call to see if it's something funky (ie doesn't print correctly, or is something totally unexpected)?

It is possible that I've gone and done something weird in Jiffy that's only exercised with the half word emulator. I do use an erlang list internally as a stack that could be returning something it shouldn't. I had a problem with this at one point but have since fixed it (or at least I thought I did).

I was having problems compiling Erlang R14B03 with the halfword emulator on OS X 10.6 this morning. Have you heard of anything like that? I didn't see anything in some brief time with Google.

@hungryblank
Copy link
Author

What's odd is that what is passed (and printed out in the stacktrace) in Any is actually 1001 (or at least it prints as 1001) apparently is like the 2nd function clause was forgotten (or what still prints as 1001 is not only 1001) and calls end up in the Any catchall.
Keep in mind that the code is not the real one and we have 20 ~ 50 clauses.

If you have doubts about the internals of jiffy I'd recommend following up on erlang questions where someone with more knowledge might be able to help you out.

I'm sorry but I can't help you on getting the half word emulator working on OSX as we work on linux, I think that a linux VM on OSX would be a quick way to set everything up.

@davisp
Copy link
Owner

davisp commented Jun 27, 2011

What's more likely here isn't that the function clause is missing but that the term that represents 1001 is being misinterpreted during pattern matching. This could theoretically be a bug in something Jiffy does though it seems odd that its printed correctly later on. Alternatively it could be a weird bug in the internals somewhere. Its hard to say for certain without being able to poke at it more. I'll try and get it going tonight in a VM or some other manner so I can look more closely.

@hungryblank
Copy link
Author

I have a small test case that consistently reproduce the issue

the test passes on 64 bit VM while it fails with

k_foos_tests: load_game_from_json_test (module 'k_foos_tests')...*failed*
::throw:{oh_no,id_not_found,[4000]}
    in function k_foos:by_id/1

on half word emulator

Hope it helps

@davisp
Copy link
Owner

davisp commented Jun 28, 2011

That is super awesome. I got distracted by trying to debug something for work this evening so I'll try and get to it tomorrow morning or tomorrow night.

@hungryblank
Copy link
Author

Last finding on this one, on the half word emulator, the numbers produced by jiffy are not quite the numbers one would expect:

1> {ok, {Decoded}} = jiffy:decode(<<"{\"foo\": 5000}">>).
{ok,{[{<<"foo">>,5000}]}}
2>  N = proplists:get_value(<<"foo">>, Decoded).         
5000
3> N =:= 5000.
false
4> N == 5000. 
true

@davisp
Copy link
Owner

davisp commented Jun 29, 2011

Whoa! That's pretty awesomely crazy. That smaller case should also help me narrow it down further. I haven't gotten to far in this because it turns out I don't have a 64bit machine at home. I'll compile the halfword emulator here at work before heading out to try and find the bug or (hopefully) find a much smaller NIF example to reproduce the behavior showing that its not me being nutty.

@hungryblank
Copy link
Author

Sorry for tampering, is just that I'm intrigued and that my c sucks too much to get in depth.

I also found out the following, the weirdness stops once the 27th power of 2 is reached look at this

1> jiffy:decode(<<"{\"foo\": 134217727}">>) =:= {ok,{[{<<"foo">>, 134217727}]}}.
false
2> jiffy:decode(<<"{\"foo\": 134217728}">>) =:= {ok,{[{<<"foo">>, 134217728}]}}.
true

@davisp
Copy link
Owner

davisp commented Jun 29, 2011

You should be CC'd on the email I just sent to erlang questions, so I won't repeat myself too much here. Basically, it looks to be a bug in the halfword emulator's NIF API. I have a failing test case at [1] and I've added a failing test case to Jiffy itself. Now I'll just wait to see what the Erlang guys think.

[1] https://github.com/davisp/halfwordtest

@hungryblank
Copy link
Author

Read the email, thanks a lot for helping on this one.

@hungryblank
Copy link
Author

We're running with the patch for the NIF and this problem is solved, closing the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants