Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use struct.calcsize("P") rather than platform.machine() #830

Merged
merged 5 commits into from Mar 29, 2020

Conversation

oconnor663
Copy link
Contributor

@oconnor663 oconnor663 commented Mar 23, 2020

platform.machine() gives the wrong answer if you're running 32-bit
Python on a 64-bit machine. sys.maxsize struct.calcsize("P") (EDITED based on #830 (comment)) gives the answer we want. See
also https://stackoverflow.com/a/1405971/823869.

Also use CARGO_CFG_TARGET_POINTER_WIDTH rather than inferring the
pointer width from CARGO_CFG_TARGET_ARCH.

@oconnor663
Copy link
Contributor Author

This should fix the issue described at #779 (comment).

@oconnor663
Copy link
Contributor Author

It would be ideal to have CI test this case for us. Would you guys be open to adding test targets like what I've got in https://github.com/oconnor663/blake3-py/blob/master/.github/workflows/push.yml?

@davidhewitt
Copy link
Member

Thanks very much for this. I would be in favour of having both 32-bit and 64-bit Python in CI.

Copy link
Member

@kngwyu kngwyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR!
My requests are:

  • Now we can remove machine.
  • Could you please enable x86 tests for Windows CI?
    It would work to add an architecture section to .github/workflows/test.yml like
    strategy:
      max-parallel: 4  # Also we should change this line to 8?
      matrix:
        python-version: [3.5, 3.6, 3.7, 3.8]
        architecture: [x86, x64]  # <- New line

build.rs Outdated
@@ -33,6 +33,7 @@ struct InterpreterConfig {
base_prefix: String,
executable: String,
machine: String,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please remove machine?

@kngwyu
Copy link
Member

kngwyu commented Mar 24, 2020

I'm going to prepare x86/x64 tests for macOS and Win.
Please wait for that.

@kngwyu
Copy link
Member

kngwyu commented Mar 24, 2020

I enabled x64 + macOS and x86 + Windows tests in #831.
Please rebase on the master and add a test you want (x86 + macOS?)

@oconnor663 oconnor663 force-pushed the maxsize branch 2 times, most recently from 9f41393 to 7e291ea Compare March 24, 2020 15:51
@oconnor663
Copy link
Contributor Author

I've tried to add Linux CI testing, but it's failing with this error:

/home/runner/work/pyo3/pyo3/target/debug/deps/pyo3-1c9bfe8599f85944: error while loading shared libraries: libpython3.7m.so.1.0: cannot open shared object file: No such file or directory

Is that a known problem?

@kngwyu
Copy link
Member

kngwyu commented Mar 24, 2020

@oconnor663
As in travis scripts,

PYTHON_LIB=$($PYTHON_BINARY -c "import sysconfig; print(sysconfig.get_config_var('LIBDIR'))")

export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$PYTHON_LIB:$HOME/rust/lib"

would solve the problem.

But I don't think we need Ubuntu CI on Actions unless it can test something different from what we do on travis.

@oconnor663
Copy link
Contributor Author

Oh that makes sense. I missed the Travis runs. I'll just remove the test.yml change then.

@oconnor663
Copy link
Contributor Author

oconnor663 commented Mar 24, 2020

Looks like I'm hitting some OverflowError: timestamp out of range test failures, possibly related to PyPy. Could those be my fault?

@kngwyu
Copy link
Member

kngwyu commented Mar 25, 2020

@oconnor663
No, this is due to #720

I haven't seen this error 🤔 cc: @pganssle
image

The failing test uses platform.architecture()[0] == "32bit" to determine whether the architecture is 32bit or 64bit...

@kngwyu kngwyu requested a review from pganssle March 25, 2020 14:47
@oconnor663
Copy link
Contributor Author

The failing test uses platform.architecture()[0] == "32bit" to determine whether the architecture is 32bit or 64bit...

That's consistent with the admonition here, which I mention in the code comments in this PR:

BTW, you might be tempted to use platform.architecture() for this. Unfortunately, its results are not always reliable, particularly in the case of OS X universal binaries.

$ arch -x86_64 /usr/bin/python2.6 -c 'import sys,platform; print platform.architecture()[0], sys.maxsize > 2**32'
64bit True
$ arch -i386 /usr/bin/python2.6 -c 'import sys,platform; print platform.architecture()[0], sys.maxsize > 2**32'
64bit False

Let me see if I can switch it over to the same maxsize approach.

@oconnor663
Copy link
Contributor Author

Actually I hadn't seen this comment before. It sounds like struct.calcsize("P") might be the One True Way?

@oconnor663 oconnor663 changed the title use sys.maxsize rather than platform.machine() use struct.calcsize("P") rather than platform.machine() Mar 25, 2020
@oconnor663
Copy link
Contributor Author

884db67 doesn't seem to have helped. I'm going to keep staring at it.

@oconnor663
Copy link
Contributor Author

oconnor663 commented Mar 25, 2020

It looks like the test failure is only on macOS, and only on Python 3.5 specifically. I don't know what could've changed here between 3.5 and 3.6. (Doesn't seem related to the "fold" field added in 3.6.)

@davidhewitt
Copy link
Member

The exception pathway being hit looks like it exists on python 3.5 but was removed from 3.6.

https://github.com/python/cpython/blob/3.5/Modules/_datetimemodule.c#L4936

@oconnor663
Copy link
Contributor Author

oconnor663 commented Mar 25, 2020

Cool, that's definitely what we're seeing. But I don't know enough about this test to interpret whether we want it to fail here or not. Is it that 3.5 is failing for a dumb obsolete reason, or is it that 3.6-3.8 happen to silently succeed when we wish they would fail?

Edit: Similarly, I don't understand why this wasn't failing before.

@oconnor663
Copy link
Contributor Author

oconnor663 commented Mar 25, 2020

I'm going to defer to you all on this one. I'll remove my temporary debugging commits. If you want to see the debugging output (commit 57a03ad), it's visible in this run: https://github.com/PyO3/pyo3/actions/runs/63179251

build.rs Outdated Show resolved Hide resolved
@kngwyu
Copy link
Member

kngwyu commented Mar 26, 2020

@oconnor663
Thank you for hard debugging!

I think it's better for us to wait for @pganssle, who is the author of our datetime binding.
I don't have a macOS machine so this is hard to debug for me, though I'm going to try using Actions.

@konstin
Copy link
Member

konstin commented Mar 26, 2020

Is 32-bit Mac OS still a used platform? Even numpy doesn't have 32-bit Mac OS wheels on pypi (https://pypi.org/project/numpy/#files). If I remember correctly there isn't a wheel tag for 32-bit Mac OS.

This is what the blog post of rust demoting apple 32-bit targets:

Why are those targets being demoted?

Apple dropped support for running 32-bit binaries starting from macOS 10.15 and iOS 11. They also prevented all developers from cross-compiling 32-bit programs and apps starting from Xcode 10 (the platform’s IDE, containing the SDKs).

Due to those decisions from Apple, the targets are no longer useful to our users, and their choice to prevent cross-compiling makes it hard for the project to continue supporting the 32-bit platform in the long term.

platform.machine() gives the wrong answer if you're running 32-bit
Python on a 64-bit machine.

The reason we don't use platform.architecture() here is that it's not
reliable on macOS. See https://stackoverflow.com/a/1405971/823869.
Similarly, sys.maxsize is not reliable on Windows. See
https://stackoverflow.com/questions/1405913/how-do-i-determine-if-my-python-shell-is-executing-in-32bit-or-64bit-mode-on-os/1405971#comment6209952_1405971
and https://stackoverflow.com/a/3411134/823869.

Also use CARGO_CFG_TARGET_POINTER_WIDTH rather than inferring the Rust
target pointer width from CARGO_CFG_TARGET_ARCH.
… tests

Same reasoning as the previous commit.
@oconnor663
Copy link
Contributor Author

Made the comment change requested above.

I want to emphasize that this bug is causing build failures with the current release version, for example https://github.com/oconnor663/blake3-py/runs/534852943. If you guys can think of a way to reduce the scope of this change, so that we can release a fix quickly without the risk of breaking anything, that might be worth doing.

@davidhewitt
Copy link
Member

davidhewitt commented Mar 26, 2020

It's all a bit arcane.

I couldn't find anything about this specifically in the cpython changelog. I did find the commit in cpython which reworked the implementation removing this OverflowError. It looks like it's a side-effect of changes for PEP 495.

python/cpython@5d0c598#diff-0090febf24974047330d8a02d4dbdc79

Prior to those changes the implementation used mktime. I can't find good documentation on the apple docs about what values should be accepted. The following stack overflow post suggests that for some undocumented reason macOS can't handle time_t values (aka mktime output) earlier than 1900.

https://stackoverflow.com/questions/35300453/am-i-using-tm-mktime-wrong-and-if-not-is-there-a-workaround

That would at least be consistent with the error we're seeing on that CI. So I suggest you mark that test as xfail for macOS on 3.5.

@davidhewitt
Copy link
Member

(Sorry for not being more helpful on this one, I haven't had much free time recently!)

@oconnor663
Copy link
Contributor Author

oconnor663 commented Mar 27, 2020

I just xfailed the tests on macOS, but now something else datetime+3.5 related is failing on Windows?That's odd to me, because everything else passed on this run: https://github.com/PyO3/pyo3/actions/runs/63179251

@kngwyu
Copy link
Member

kngwyu commented Mar 27, 2020

@oconnor663
It's not related to this PR. Please don't care.

So I suggest you mark that test as xfail for macOS on 3.5.

I'm not strongly against using xfail, but please let me debug this for a while.

@oconnor663
Copy link
Contributor Author

Gotcha, I've removed the commit.

@oconnor663
Copy link
Contributor Author

oconnor663 commented Mar 27, 2020

Note: I put fail-fast: false on one of me earlier temporary commits, and I've since removed it from this branch. But it looks like there might be some unexpected failures getting masked by failures on other platforms here. It might be worth putting fail-fast: false in master, or at least in whatever experimental branches you're working on.

In particular, one of the things we could be missing here (speculation on my part) is flaky test failures. One of the possible causes of flakiness in this environment is a heterogeneous CI fleet. I've seen issues before where e.g. some Windows machines supported AVX-512, and some didn't, and that turned what should've been a deterministic test failure into a flaky one.

@kngwyu
Copy link
Member

kngwyu commented Mar 28, 2020

@oconnor663
Sorry I can't detect the cause at last. Please use xfail for the test.

@oconnor663
Copy link
Contributor Author

I've added back e44e4ce.

oconnor663 added a commit to oconnor663/blake3-py that referenced this pull request Mar 29, 2020
We need to do this until PyO3/pyo3#830 is
resolved.
@kngwyu
Copy link
Member

kngwyu commented Mar 29, 2020

Thank you!
I'll push some tweaks to make CI happy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants