Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deterministic bytecode generation #1008

Closed
JCash opened this issue Jun 7, 2023 · 4 comments
Closed

Deterministic bytecode generation #1008

JCash opened this issue Jun 7, 2023 · 4 comments

Comments

@JCash
Copy link

JCash commented Jun 7, 2023

I'm wondering if it would be possible to make it so that the output of a luajit compilation (same input, same architecture) would produce the same output.
I think it's a good trait to have if possible (e.g. for debugging what has changed and for buildsystems in general).

I have a small repro case here which produces different output in 95% of the cases:

test.lua

local M = {}

local local_table = {
	a = true,
	b = true, -- need at least two to trigger it
	c = true,
	d = true, -- using four will make it almost 100% (95%-ish)
}

return M

Then I run luajit (v2.1 - 51fb2f2)

LUA_PATH=/Users/mawe/notwork/LuaJIT/src/?.lua /Users/mawe/notwork/LuaJIT/src/luajit -b ./test.lua ./test1.luac
LUA_PATH=/Users/mawe/notwork/LuaJIT/src/?.lua /Users/mawe/notwork/LuaJIT/src/luajit -b ./test.lua ./test2.luac
LUA_PATH=/Users/mawe/notwork/LuaJIT/src/?.lua /Users/mawe/notwork/LuaJIT/src/luajit -b ./test.lua ./test3.luac

find . -iname "test*.luac" | xargs shasum

which in turn produces output like this:

$ ./test.sh
5932802689818645f53a99df295044f9bc0afa4f  ./test3.luac
9033faa79eec123e1e9b8e44c19ac447c23d955e  ./test2.luac
900d2fbd8244afaba6fa6a36b583df4e68792f80  ./test1.luac

My first guess would be that there is some padding somewhere that isn't zeroed out, but perhaps there is something more complex that makes it behave like this?

@MikePall MikePall changed the title Q: Deterministic output? Deterministic bytecode generation Jun 7, 2023
@MikePall
Copy link
Member

MikePall commented Jun 7, 2023

The iteration order of tables in Lua is non-deterministic. In LuaJIT it's even non-deterministic for different VM invocations (on purpose).

The bytecode for a table constructor with initializers (TDUP) uses a template table embedded within the constants. That one is written out in the current iteration order at bytecode creation. In this particular case, bytecode generation does not produce deterministic results.

The solution sketched out in #993 for bytecode generation options could be extended with another flag. Which would then serialize the template table to the bytecode in a deterministic (sorted) order (bcwrite_ktab()). Go ahead, if you need this.

That said: any progress on implementing #993? I thought it was important for you.

@JCash
Copy link
Author

JCash commented Jun 8, 2023

Ah, I understand, I should have realised. I wrongly assumed the iteration order was a runtime thing as opposed to compile time.

Re the #993 ticket: As it seems to work to disable the LUAJIT_DISABLE_GC64 flag, it is not a blocker. (I need to reconfirm that it works to produce bytecode for e.g. the 32-bit Android).

Possibly, our client wishes to have the determinism, in which case I would propose your suggestion as the way forward.

@bmwiedemann
Copy link

I suspect that this is the reason, why openSUSE's bcc package varies every time. It uses luajit-5.1.2.1.0 luajit -bg bcc.lua bcc.o
and

cd bcc-0.28.0/build/src/lua && for i in $(seq 1000) ; do
    luajit -bg bcc.lua bcc.o && md5sum bcc.o
done | sort -u |wc -l
1000

@MikePall
Copy link
Member

Now implemented, thanks to Peter Cawley. Add the d flag (deterministic bytecode generation) to -b:

luajit -bd test.lua test.bin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants