Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: determinism of the interpreters and bytecode (pyc) files #22570

Closed
FRidh opened this issue Feb 8, 2017 · 24 comments
Closed

Python: determinism of the interpreters and bytecode (pyc) files #22570

FRidh opened this issue Feb 8, 2017 · 24 comments

Comments

@FRidh
Copy link
Member

FRidh commented Feb 8, 2017

Issue description

Bytecode is created during builds of the interpreter and packages. The bytecode records a timestamp, which we cannot set. We could patch the interpreter, but we do want it to use the actual timestamp when used outside of Nix.

A fix for this issue is available in #2281 where a useFakeTime option is added to the generic builder.

@FRidh
Copy link
Member Author

FRidh commented Feb 8, 2017

@dezgeg
Copy link
Contributor

dezgeg commented Feb 8, 2017

I think using faketime is probably a bad idea, e.g. I suspect some unit tests in random packages can start behaving like crazy under it.

@FRidh
Copy link
Member Author

FRidh commented Feb 8, 2017

With Python 2.7 I have the following issue left:

--- /nix/store/73rjww4wkap08xp7gn2rz82x2274i00p-python-2.7.13
+++ /nix/store/73rjww4wkap08xp7gn2rz82x2274i00p-python-2.7.13-check
├── lib
│   ├── python2.7
│   │   │   --- /nix/store/73rjww4wkap08xp7gn2rz82x2274i00p-python-2.7.13/lib/python2.7/getopt.pyc
│   │   ├── +++ /nix/store/73rjww4wkap08xp7gn2rz82x2274i00p-python-2.7.13-check/lib/python2.7/getopt.pyc
│   │   │ @@ -1,8 +1,8 @@
│   │   │ -00000000: 03f3 0d0a 258b 9b58 6300 0000 0000 0000  ....%..Xc.......
│   │   │ +00000000: 03f3 0d0a 808b 9b58 6300 0000 0000 0000  .......Xc.......
│   │   │  00000010: 0005 0000 0040 0000 0073 b900 0000 6400  .....@...s....d.
│   │   │  00000020: 005a 0000 6401 0064 0200 6403 0064 0400  .Z..d..d..d..d..
│   │   │  00000030: 6704 005a 0100 6405 0064 0600 6c02 005a  g..Z..d..d..l..Z
│   │   │  00000040: 0200 6401 0065 0300 6601 0064 0700 8400  ..d..e..f..d....
│   │   │  00000050: 0083 0000 595a 0400 6504 005a 0500 6700  ....YZ..e..Z..g.
│   │   │  00000060: 0064 0800 8401 005a 0600 6700 0064 0900  .d.....Z..g..d..
│   │   │  00000070: 8401 005a 0700 640a 0084 0000 5a08 0064  ...Z..d.....Z..d
│   │   │   --- /nix/store/73rjww4wkap08xp7gn2rz82x2274i00p-python-2.7.13/lib/python2.7/getopt.pyo
│   │   ├── +++ /nix/store/73rjww4wkap08xp7gn2rz82x2274i00p-python-2.7.13-check/lib/python2.7/getopt.pyo
│   │   │ @@ -1,8 +1,8 @@
│   │   │ -00000000: 03f3 0d0a 258b 9b58 6300 0000 0000 0000  ....%..Xc.......
│   │   │ +00000000: 03f3 0d0a 808b 9b58 6300 0000 0000 0000  .......Xc.......
│   │   │  00000010: 0005 0000 0040 0000 0073 b900 0000 6400  .....@...s....d.
│   │   │  00000020: 005a 0000 6401 0064 0200 6403 0064 0400  .Z..d..d..d..d..
│   │   │  00000030: 6704 005a 0100 6405 0064 0600 6c02 005a  g..Z..d..d..l..Z
│   │   │  00000040: 0200 6401 0065 0300 6601 0064 0700 8400  ..d..e..f..d....
│   │   │  00000050: 0083 0000 595a 0400 6504 005a 0500 6700  ....YZ..e..Z..g.
│   │   │  00000060: 0064 0800 8401 005a 0600 6700 0064 0900  .d.....Z..g..d..
│   │   │  00000070: 8401 005a 0700 640a 0084 0000 5a08 0064  ...Z..d.....Z..d
│   │   │   --- /nix/store/73rjww4wkap08xp7gn2rz82x2274i00p-python-2.7.13/lib/python2.7/py_compile.pyc
│   │   ├── +++ /nix/store/73rjww4wkap08xp7gn2rz82x2274i00p-python-2.7.13-check/lib/python2.7/py_compile.pyc
│   │   │ @@ -1,8 +1,8 @@
│   │   │ -00000000: 03f3 0d0a 258b 9b58 6300 0000 0000 0000  ....%..Xc.......
│   │   │ +00000000: 03f3 0d0a 808b 9b58 6300 0000 0000 0000  .......Xc.......
│   │   │  00000010: 0004 0000 0040 0000 0073 c900 0000 6400  .....@...s....d.
│   │   │  00000020: 005a 0000 6401 0064 0200 6c01 005a 0100  .Z..d..d..l..Z..
│   │   │  00000030: 6401 0064 0200 6c02 005a 0200 6401 0064  d..d..l..Z..d..d
│   │   │  00000040: 0200 6c03 005a 0300 6401 0064 0200 6c04  ..l..Z..d..d..l.
│   │   │  00000050: 005a 0400 6401 0064 0200 6c05 005a 0500  .Z..d..d..l..Z..
│   │   │  00000060: 6401 0064 0200 6c06 005a 0600 6502 006a  d..d..l..Z..e..j
│   │   │  00000070: 0700 8300 005a 0800 6403 0064 0400 6405  .....Z..d..d..d.
│   │   │   --- /nix/store/73rjww4wkap08xp7gn2rz82x2274i00p-python-2.7.13/lib/python2.7/py_compile.pyo
│   │   ├── +++ /nix/store/73rjww4wkap08xp7gn2rz82x2274i00p-python-2.7.13-check/lib/python2.7/py_compile.pyo
│   │   │ @@ -1,8 +1,8 @@
│   │   │ -00000000: 03f3 0d0a 258b 9b58 6300 0000 0000 0000  ....%..Xc.......
│   │   │ +00000000: 03f3 0d0a 808b 9b58 6300 0000 0000 0000  .......Xc.......
│   │   │  00000010: 0004 0000 0040 0000 0073 c900 0000 6400  .....@...s....d.
│   │   │  00000020: 005a 0000 6401 0064 0200 6c01 005a 0100  .Z..d..d..l..Z..
│   │   │  00000030: 6401 0064 0200 6c02 005a 0200 6401 0064  d..d..l..Z..d..d
│   │   │  00000040: 0200 6c03 005a 0300 6401 0064 0200 6c04  ..l..Z..d..d..l.
│   │   │  00000050: 005a 0400 6401 0064 0200 6c05 005a 0500  .Z..d..d..l..Z..
│   │   │  00000060: 6401 0064 0200 6c06 005a 0600 6502 006a  d..d..l..Z..e..j
│   │   │  00000070: 0700 8300 005a 0800 6403 0064 0400 6405  .....Z..d..d..d.
│   │   │   --- /nix/store/73rjww4wkap08xp7gn2rz82x2274i00p-python-2.7.13/lib/python2.7/struct.pyc
│   │   ├── +++ /nix/store/73rjww4wkap08xp7gn2rz82x2274i00p-python-2.7.13-check/lib/python2.7/struct.pyc
│   │   │ @@ -1,8 +1,8 @@
│   │   │ -00000000: 03f3 0d0a 258b 9b58 6300 0000 0000 0000  ....%..Xc.......
│   │   │ +00000000: 03f3 0d0a 808b 9b58 6300 0000 0000 0000  .......Xc.......
│   │   │  00000010: 0002 0000 0040 0000 0073 2e00 0000 6400  .....@...s....d.
│   │   │  00000020: 0064 0100 6c00 0054 6400 0064 0200 6c00  .d..l..Td..d..l.
│   │   │  00000030: 006d 0100 5a01 0001 6400 0064 0300 6c00  .m..Z...d..d..l.
│   │   │  00000040: 006d 0200 5a02 0001 6404 0053 2805 0000  .m..Z...d..S(...
│   │   │  00000050: 0069 ffff ffff 2801 0000 0074 0100 0000  .i....(....t....
│   │   │  00000060: 2a28 0100 0000 740b 0000 005f 636c 6561  *(....t...._clea
│   │   │  00000070: 7263 6163 6865 2801 0000 0074 0700 0000  rcache(....t....
│   │   │   --- /nix/store/73rjww4wkap08xp7gn2rz82x2274i00p-python-2.7.13/lib/python2.7/struct.pyo
│   │   ├── +++ /nix/store/73rjww4wkap08xp7gn2rz82x2274i00p-python-2.7.13-check/lib/python2.7/struct.pyo
│   │   │ @@ -1,8 +1,8 @@
│   │   │ -00000000: 03f3 0d0a 258b 9b58 6300 0000 0000 0000  ....%..Xc.......
│   │   │ +00000000: 03f3 0d0a 808b 9b58 6300 0000 0000 0000  .......Xc.......
│   │   │  00000010: 0002 0000 0040 0000 0073 2e00 0000 6400  .....@...s....d.
│   │   │  00000020: 0064 0100 6c00 0054 6400 0064 0200 6c00  .d..l..Td..d..l.
│   │   │  00000030: 006d 0100 5a01 0001 6400 0064 0300 6c00  .m..Z...d..d..l.
│   │   │  00000040: 006d 0200 5a02 0001 6404 0053 2805 0000  .m..Z...d..S(...
│   │   │  00000050: 0069 ffff ffff 2801 0000 0074 0100 0000  .i....(....t....
│   │   │  00000060: 2a28 0100 0000 740b 0000 005f 636c 6561  *(....t...._clea
│   │   │  00000070: 7263 6163 6865 2801 0000 0074 0700 0000  rcache(....t....
│   │   ╵
│   ╵
╵

@edanaher
Copy link
Contributor

edanaher commented Feb 8, 2017

This is actually a performance issue; as I described in #22569 (coincidentally filed minutes before this), the wrong timestamps causes python to ignore the cached bytecode and re-compile every library on every invocation.

Notably irritating to me, this causes nvr (neovim-remote) to take 0.3 seconds instead of 0.1 seconds to run.

It seems to my relatively nix-inexperienced intuition that byte-wise patching all .pyc files generated in a python interpreter or library build to have timestamp 0 in postBuild or preInstall or some such should be a straightforward fix; am I missing something?

@FRidh
Copy link
Member Author

FRidh commented Feb 8, 2017

@edanaher you couldn't tell any difference with Python 2?

On Python 2 we use this patch and we need to port this to Python 3. Unfortunately, the touched files are quite different in 3.

@FRidh
Copy link
Member Author

FRidh commented Feb 9, 2017

#22585 improves the determinism somewhat. In case of Python 3.5, I fixed the timestamp in the .pyc files. I'm not sure whether it solves the issue described by @edanaher.

@edanaher
Copy link
Contributor

edanaher commented Feb 9, 2017

Sadly, @FRidh, your patch doesn't seem to solve my problem; the nix-store appears to use timestamp 1, not 0, for its files:

edanaher@chileh:/nix/store/6pizxkhd6aw22hqp6rjbyhdr0v50xica-python3-3.5.2/lib/python3.5$ stat /nix/store/6pizxkhd6aw22hqp6rjbyhdr0v50xica-python3-3.5.2/ | grep Modify
Modify: 1969-12-31 19:00:01.000000000 -0500

Though tweaking your patch to set the timestamp to 1 doesn't appear to work either; I'll keep digging today.

edit: Actually, setting the timestamp to 1 helps; however, some files appear to still have the actual timestamps embedded. In fact, nvr now loses about half as much time to recompiling as it did before. I wonder what's wrong with the remaining files...

@edanaher
Copy link
Contributor

edanaher commented Feb 9, 2017

#22585 is incomplete; there appears to be another codepath that can write mtimes to the compiled files. Adding the following substituteInPlace fixes it:

    substituteInPlace "Lib/importlib/_bootstrap_external.py" --replace "source_mtime = int(st['mtime'])" "source_mtime = 1"

Unfortunately, this is not based on DETERMINISTIC_BUILD; this change is in a bootstrapping file, so getting at the environment is non-trivial. But it does fix nearly the rest of the nondeterminism; running with an additional convenience patch to check the timestamps:

    substituteInPlace "Lib/importlib/_bootstrap_external.py" --replace "message = 'bytecode is stale for {!r}'.format(name)" "message = 'bytecode is stale for {!r}: {!r} vs {!r}'.format(name, _r_long(raw_timestamp), source_mtime)"

I get the following:

edanaher@chileh:~/nixpkgs$ time NIX_PATH=nixpkgs=/home/edanaher/nixpkgs:nixos-config=/etc/nixos/configuration.nix:/nix/var/nix/profiles/per-user/root/channels nix-shell -p python35Packages.neovim-remote -p strace
 --run 'strace python -vvv -s 1024 -c "" 2>&1 | grep "EROFS\|stale"'
open("/nix/store/m67irg2n09g66pxnimljlnlvl1pxpfg2-python3.5-setuptools-30.2.0/lib/python3.5/site-packages/__pycache__/site.cpython-35.pyc.140711294959064", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0644) = -1 EROFS (Rea
d-only file system)
# bytecode is stale for 'site': 1486649480 vs 1
open("/nix/store/2fabkzmyxq0haqc8xzgdnv0hgdr99vdx-python3-3.5.2/lib/python3.5/__pycache__/_sysconfigdata.cpython-35.pyc.140711151814336", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0644) = -1 EROFS (Read-only file system
)

So for some reason site is still getting the timestamp wrong and breaking determinism; I'm not sure what's up with the attempted write to sysconfigdate.

@FRidh
Copy link
Member Author

FRidh commented Feb 9, 2017

Thanks @edanaher. I thought that path wasn't relevant for the writing but apparently it is then.

Looking at the diffoscope result all that is left I think are sets. I've discussing this issue on python-dev.

@edanaher
Copy link
Contributor

edanaher commented Feb 9, 2017

Thank you, @FRidh. I'm now running python scripts I care about off of my branch that adds the _bootstrap_external.py fix (along with a couple commented-out changes that might be useful for debugging) to your branch, but will watch your more proper fixes with interest.

@globin globin added this to Open in Blocking Issues 17.09 Feb 14, 2017
@FRidh FRidh changed the title Python: fix the timestamp in bytecode (pyc) files Python: determinism of the interpreters and bytecode (pyc) files Feb 19, 2017
@FRidh
Copy link
Member Author

FRidh commented Feb 19, 2017

Today I got a very useful reply from a PyPy dev:

As I mentioned, it seems only sets cause unreproducible bytecode. Sets have no order. But when generating the bytecode, I would expect there would still be an order since the code isn't actually executed, right?

No, the sets are built as real sets and then marshalled to .pyc files
in a separate step. So on CPython an essentially random order will
end up in the .pyc file. Even CPython 3.6 gives a deterministic order
to dictionaries but not sets. You could ensure sets are marshalled in
a known order by changing the marshalling code, e.g. to emit them in
sorted order (on Python 2.x; on 3.x it is more messy because different
types are more often non-comparable).

@dezgeg
Copy link
Contributor

dezgeg commented Feb 19, 2017

Does debian ship any .pyc files in their packages? Maybe they have a patch lying around.

@FRidh
Copy link
Member Author

FRidh commented Feb 19, 2017

I've looked at Debian and their patches, but they're not this far yet with reproducible Python builds it seems.

Also interesting.

Note that Fedora doesn't even rebuild all the extension modules when bumping CPython to a new maintenance release, let alone rebuilding and re-releasing all the pure Python ones. (RPM supports doing that just fine, but it would mean shipping thousands of updated binary artifacts instead of just one - the new CPython maintenance release)

Debian/Ubuntu doesn't rebuild extension modules either. We don't ship .pyc
files in binary packages, but instead build them at install time on the user's
machine.

http://bugs.python.org/issue29514#msg287560

This came up when 3.5.3 changed the magic part of the bytecode which typically shouldn't happen in minor releases.

While they don't ship the bytecode, they still compile it during installation time. Since we don't distinguish between the two we have to choose whether we include the bytecode or not. Not including bytecode has a negative impact on the performance.

@FRidh
Copy link
Member Author

FRidh commented Feb 19, 2017

Note that I updated #22585. Builds of 2.7 and 3.5 are now deterministic.

@FRidh
Copy link
Member Author

FRidh commented Mar 13, 2017

Closing because most interpreters are now built deterministic.

@FRidh FRidh closed this as completed Mar 13, 2017
@copumpkin
Copy link
Member

❤️ @FRidh

@vcunat vcunat moved this from Open to Done in Blocking Issues 17.09 Jul 3, 2017
@fpletz fpletz removed this from Done in Blocking Issues 17.09 Aug 28, 2017
@domenkozar
Copy link
Member

@domenkozar
Copy link
Member

@domenkozar domenkozar reopened this Feb 6, 2019
@matthewbauer matthewbauer modified the milestones: 17.03, 19.09 Apr 5, 2019
@FRidh
Copy link
Member Author

FRidh commented Aug 18, 2019

The 2.7.16 interpreter seems to be reproducible again. Python packages however not.

@FRidh FRidh modified the milestones: 19.09, 20.03 Aug 18, 2019
@disassembler disassembler modified the milestones: 20.03, 20.09 Feb 5, 2020
@stale

This comment has been minimized.

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Aug 4, 2020
@stale stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Nov 16, 2020
@FRidh
Copy link
Member Author

FRidh commented Nov 16, 2020

Python packages are mostly reproducible again on master. An exception is pytest, which generates its own bytecode. The interpreters are not yet.

@zimbatm zimbatm added this to To do in R13y Nov 21, 2020
@FRidh FRidh modified the milestones: 20.09, 21.03 Dec 20, 2020
@FRidh
Copy link
Member Author

FRidh commented Mar 13, 2021

Closing as done.

@FRidh FRidh closed this as completed Mar 13, 2021
R13y automation moved this from Inbox to Done Mar 13, 2021
@FRidh
Copy link
Member Author

FRidh commented Apr 10, 2021

Slowdown because unoptimized bytecode is no longer generated because it is not reproducible #118810.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
R13y
Done
Development

No branches or pull requests

8 participants