Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: deterministic interpreters #22585

Merged
merged 11 commits into from
Feb 26, 2017
Merged

Python: deterministic interpreters #22585

merged 11 commits into from
Feb 26, 2017

Conversation

FRidh
Copy link
Member

@FRidh FRidh commented Feb 9, 2017

Motivation for this change
  • Reproducible builds of the interpreters
Things done
  • Tested using sandboxing
    (nix.useSandbox on NixOS,
    or option build-use-sandbox in nix.conf
    on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • Linux
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Fits CONTRIBUTING.md.

@mention-bot
Copy link

@FRidh, thanks for your PR! By analyzing the history of the files in this pull request, we identified @domenkozar, @edolstra and @peti to be potential reviewers.

@FRidh
Copy link
Member Author

FRidh commented Feb 9, 2017

@FRidh FRidh changed the base branch from master to staging February 9, 2017 12:12
@FRidh
Copy link
Member Author

FRidh commented Feb 9, 2017

Since 3.2.3 there is a PYTHONHASHSEED env var.

If this variable is not set or set to random, a random value is used to seed the hashes of str, bytes and datetime objects

It also influences sets. When doing a x in {'a', 'b', 'c'} the set is converted to a frozenset. If PYTHONHASHSEED is set to 0, then no hashing is done and the order should be preserved (even though sets have no order).

Unfortunately, the bytecode is still not deterministic.

DETERMINISTIC_BUILD=1;
# Determinism: We fix the hashes of str, bytes and datetime objects.
PYTHONHASHSEED = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only applies to the build process, right? I assume that the actual scripts will still run with random hash seeds; but we don't want to accidentally open up NixOS to hash collision attacks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need to disable it for nix-shell.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree. What is nix-shell used or designed for? For testing/building a package, or for building environments with certain packages? In my point of view its designed for the former, but its (mis)used a lot for the latter as well. If we are going to make changes for nix-shell then we should also unset SOURCE_DATE_EPOCH and likely a dozen other vars as well which will render it useless for the former use case.

In Nix 1.12 we have nix run which, if I am correct, shouldn't set such env vars (unfortunately it still does).

Copy link
Member

@Mic92 Mic92 Feb 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FRidh I don't think every user is aware of this, so their needs to be a big warning somewhere. Somebody may use nix-shell --command in their own services/timer so this could make people vulnerable to hash collision attacks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also there is the nix-shell -i shebang feature for which the Nix documentation even includes a Python example.

@FRidh
Copy link
Member Author

FRidh commented Feb 19, 2017

At the end of the installation I now force it to rebuild all bytecode. Both 2.7 and 3.5 are now deterministic.

cc @edanaher could you check regarding your issue?

find "$out" -name 'wininst*.exe' | xargs -r rm -f

# Determinism: rebuild all bytecode
# We exclude lib2to3 because that's Python 2 code which fails
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here the reason is slightly different...but it includes incorrect syntax

@edanaher
Copy link
Contributor

Looks good to me; nvr is nice and fast, and there's only a single EROFS:

[nix-shell:~/nixpkgs]$ strace nvr  2>&1 | grep EROFS
open("/nix/store/33mqlj6pchixmkraylrxm22zia5ji0sc-python3-3.5.2/lib/python3.5/__pycache__/_sysconfigdata.cpython-35.pyc.140390896732008", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0644) = -1 EROFS (Read-only file system)

I still find it a little strange that that file is still attempting to be re-cached, but it's not a significant performance issue. If you're not seeing determinism issues with it, I certainly don't see any problem.

@FRidh
Copy link
Member Author

FRidh commented Feb 20, 2017

@edanaher I'll have a look at the sysconfigdata issue. For some reason it was removed, and I don't know why.

@FRidh FRidh self-assigned this Feb 20, 2017
@FRidh FRidh force-pushed the repr branch 2 times, most recently from d7771d4 to 24f5f22 Compare February 22, 2017 15:01
@FRidh FRidh changed the title Python: deterministic interpreters - WIP Python: deterministic interpreters Feb 22, 2017
@FRidh
Copy link
Member Author

FRidh commented Feb 22, 2017

Versions 2.7, 3.5 and 3.6 are now reproducible. I've applied the patches to 3.4 as well but there's one file left causing problems. I've updated the docs and the release notes. The issue with sysconfigdata I keep for later.

I think this is ready now. What say you?

@FRidh FRidh requested a review from globin February 22, 2017 15:06
Copy link
Member

@globin globin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the link in the release notes, LGTM.

There is some randomness in the Windows installers. Since we don't need
them, we delete them.
@FRidh
Copy link
Member Author

FRidh commented Feb 26, 2017

@globin its not clear to me what you mean but we can fix that later.

@vcunat
Copy link
Member

vcunat commented Mar 1, 2017

@FRidh: this or something else new on staging has broken 3.4 setuptools:

ImportError: No module named 'pyexpat'

Example: http://hydra.nixos.org/build/49482784

@FRidh
Copy link
Member Author

FRidh commented Mar 1, 2017

Thanks @vcunat , I'll have a look at it later today.

vcunat pushed a commit that referenced this pull request Mar 1, 2017
Python: deterministic interpreters
(cherry picked from commit 04c41e7)
@FRidh
Copy link
Member Author

FRidh commented Mar 2, 2017

Python 3.4 is fixed with b588ed9

@FRidh
Copy link
Member Author

FRidh commented Mar 3, 2017

@vcunat am I correct none of this is in 17.03?

@FRidh
Copy link
Member Author

FRidh commented Mar 3, 2017

Python 3.6 is fixed with a1f6b8b
Now I'm sure they're all ok.

@vcunat
Copy link
Member

vcunat commented Mar 3, 2017

@FRidh: this PR is in 17.03. I pushed it in response to your comment.

@vcunat
Copy link
Member

vcunat commented Mar 3, 2017

When rereading it, I probably misunderstood you and just cherry-picked the whole merge, but I suppose it will be OK if we pick these fixes as well?

@FRidh
Copy link
Member Author

FRidh commented Mar 3, 2017

I saw only the merge and none of the commits in 17.03, so that's why I was asking.

Yes, all these commits plus the two fix commits can go in.

@vcunat
Copy link
Member

vcunat commented Mar 3, 2017

For reference, the 17.03 commit: bee7854. I don't know why GitHub doesn't display a mention on this PR – I think it does, normally.

@vcunat
Copy link
Member

vcunat commented Mar 3, 2017

I picked those two commits and tested they fix the builds.

@FRidh FRidh deleted the repr branch March 17, 2017 17:51
adrianpk added a commit to adrianpk/nixpkgs that referenced this pull request May 31, 2024
Python: deterministic interpreters
(cherry picked from commit 04c41e7)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants