Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intstall fails on SmartOS due to pynacl dependency install failure #55

Closed
rjloura opened this issue Nov 16, 2018 · 24 comments
Closed

Intstall fails on SmartOS due to pynacl dependency install failure #55

rjloura opened this issue Nov 16, 2018 · 24 comments

Comments

@rjloura
Copy link
Contributor

rjloura commented Nov 16, 2018

There is an issue with running the test suite for the libsodium that is bundled with pynacl. So during an install of python-manta on SmartOS you will see a build failure that looks something like this:

make[4]: Entering directory '/home/rui/pynacl/PyNaCl-1.3.0/build/temp.solaris-2.11-i86pc.32bit-2.7/test/default'
PASS: aead_aes256gcm
PASS: aead_chacha20poly1305
PASS: aead_xchacha20poly1305
PASS: auth
PASS: auth2
PASS: auth3
PASS: auth5
PASS: auth6
PASS: auth7
PASS: box
PASS: box2
PASS: box7
PASS: box8
PASS: box_easy
PASS: box_easy2
PASS: box_seal
PASS: box_seed
PASS: chacha20
PASS: codecs
PASS: core1
PASS: core2
PASS: core3
PASS: core4
PASS: core5
PASS: core6
PASS: ed25519_convert
PASS: generichash
PASS: generichash2
PASS: generichash3
PASS: hash
PASS: hash3
PASS: kdf
PASS: keygen
PASS: kx
PASS: metamorphic
PASS: misuse
PASS: onetimeauth
PASS: onetimeauth2
PASS: onetimeauth7
PASS: pwhash_argon2i
PASS: pwhash_argon2id
/home/rui/pynacl/PyNaCl-1.3.0/src/libsodium/build-aux/test-driver: line 107: 309442: Illegal instruction(coredump)
FAIL: randombytes
PASS: scalarmult
PASS: scalarmult2
PASS: scalarmult5
PASS: scalarmult6
PASS: scalarmult7
PASS: secretbox
PASS: secretbox2
PASS: secretbox7
PASS: secretbox8
PASS: secretbox_easy
PASS: secretbox_easy2
PASS: secretstream
PASS: shorthash
PASS: sign
PASS: sodium_core
PASS: sodium_utils
PASS: sodium_version
PASS: stream
PASS: stream2
PASS: stream3
PASS: stream4
PASS: verify1
PASS: sodium_utils2
PASS: sodium_utils3
PASS: core_ed25519
PASS: pwhash_scrypt
PASS: pwhash_scrypt_ll
PASS: scalarmult_ed25519
PASS: siphashx24
PASS: xchacha20
============================================================================
Testsuite summary for libsodium 1.0.16
============================================================================
# TOTAL: 72
# PASS:  71
# SKIP:  0
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0
============================================================================
See test/default/test-suite.log
Please report to https://github.com/jedisct1/libsodium/issues
============================================================================
Makefile:1804: recipe for target 'test-suite.log' failed
make[4]: *** [test-suite.log] Error 1
make[4]: Leaving directory '/home/rui/pynacl/PyNaCl-1.3.0/build/temp.solaris-2.11-i86pc.32bit-2.7/test/default'
Makefile:1910: recipe for target 'check-TESTS' failed
make[3]: *** [check-TESTS] Error 2
make[3]: Leaving directory '/home/rui/pynacl/PyNaCl-1.3.0/build/temp.solaris-2.11-i86pc.32bit-2.7/test/default'
Makefile:2480: recipe for target 'check-am' failed
make[2]: *** [check-am] Error 2
make[2]: Leaving directory '/home/rui/pynacl/PyNaCl-1.3.0/build/temp.solaris-2.11-i86pc.32bit-2.7/test/default'
Makefile:402: recipe for target 'check-recursive' failed
make[1]: *** [check-recursive] Error 1
make[1]: Leaving directory '/home/rui/pynacl/PyNaCl-1.3.0/build/temp.solaris-2.11-i86pc.32bit-2.7/test'
Makefile:515: recipe for target 'check-recursive' failed
make: *** [check-recursive] Error 1
Traceback (most recent call last):
  File "setup.py", line 255, in <module>
    "Programming Language :: Python :: 3.7",
  File "/opt/local/lib/python2.7/site-packages/setuptools/__init__.py", line 143, in setup
    return distutils.core.setup(**attrs)
  File "/opt/local/lib/python2.7/distutils/core.py", line 151, in setup
    dist.run_commands()
  File "/opt/local/lib/python2.7/distutils/dist.py", line 953, in run_commands
    self.run_command(cmd)
  File "/opt/local/lib/python2.7/distutils/dist.py", line 972, in run_command
    cmd_obj.run()
  File "/opt/local/lib/python2.7/site-packages/wheel/bdist_wheel.py", line 188, in run
    self.run_command('build')
  File "/opt/local/lib/python2.7/distutils/cmd.py", line 326, in run_command
    self.distribution.run_command(command)
  File "/opt/local/lib/python2.7/distutils/dist.py", line 972, in run_command
    cmd_obj.run()
  File "/opt/local/lib/python2.7/distutils/command/build.py", line 127, in run
    self.run_command(cmd_name)
  File "/opt/local/lib/python2.7/distutils/cmd.py", line 326, in run_command
    self.distribution.run_command(command)
  File "/opt/local/lib/python2.7/distutils/dist.py", line 972, in run_command
    cmd_obj.run()
  File "setup.py", line 179, in run
    subprocess.check_call(["make", "check"] + make_args, cwd=build_temp)
  File "/opt/local/lib/python2.7/subprocess.py", line 540, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['make', 'check']' returned non-zero exit status 2
@rjloura
Copy link
Contributor Author

rjloura commented Nov 16, 2018

Workaround for this:

$ sudo pkgin install libsodium
$ SODIUM_INSTALL=system pip install pynacl
$ pip install manta

@rjloura
Copy link
Contributor Author

rjloura commented Dec 12, 2018

It seems this may be more pernicious than originally thought. After successfully installing via pip install manta, I see this error:

[root@standard ~]# mantash ls /rjloura/stor
Traceback (most recent call last):
  File "/opt/local/bin/mantash", line 29, in <module>
    import manta
  File "/opt/local/lib/python2.7/site-packages/manta/__init__.py", line 7, in <module>
    from .auth import PrivateKeySigner, SSHAgentSigner, CLISigner
  File "/opt/local/lib/python2.7/site-packages/manta/auth.py", line 23, in <module>
    from paramiko import Agent
  File "/opt/local/lib/python2.7/site-packages/paramiko/__init__.py", line 22, in <module>
    from paramiko.transport import SecurityOptions, Transport
  File "/opt/local/lib/python2.7/site-packages/paramiko/transport.py", line 90, in <module>
    from paramiko.ed25519key import Ed25519Key
  File "/opt/local/lib/python2.7/site-packages/paramiko/ed25519key.py", line 22, in <module>
    import nacl.signing
  File "/opt/local/lib/python2.7/site-packages/nacl/signing.py", line 17, in <module>
    import nacl.bindings
  File "/opt/local/lib/python2.7/site-packages/nacl/bindings/__init__.py", line 17, in <module>
    from nacl.bindings.crypto_aead import (
  File "/opt/local/lib/python2.7/site-packages/nacl/bindings/crypto_aead.py", line 18, in <module>
    from nacl._sodium import ffi, lib
ImportError: ld.so.1: python2.7: fatal: relocation error: file /opt/local/lib/python2.7/site-packages/nacl/_sodium.so: symbol sodium_unpad: referenced symbol not found

I tried installing pynacl with and without the SODIUM_INSTALL=system variable set and had the same issue. Interestingly install completes without the variable as it did not before.

[root@standard ~]# nm /opt/local/lib/python2.7/site-packages/nacl/_sodium.so | grep sodium_unpad
00000000000125d0 t _cffi_d_sodium_unpad
00000000000125e0 t _cffi_f_sodium_unpad
                 U sodium_unpad

@trentm
Copy link
Contributor

trentm commented Dec 13, 2018

@rjloura Make sure I understand. Current python-manta is basically broken on SmartOS?

@cburroughs
Copy link
Contributor

As I understand it, the build fails on SmartOS (I tried on 17.4.0) when pip installing pynacl, which in term builds libsodium. I see that pkgsrc has libsodium-1.0.16 so presumably the tests passed when pkgsrc built it. If I download the 1.0.16 tarball from https://download.libsodium.org/libsodium/releases/, make check also passes. So I'm confused what is different about the environment when pip is building libsodium.

@cburroughs
Copy link
Contributor

Regarding SODIUM_INSTALL, maybe pyca/pynacl#497 is related?

For the main breakage, recall that pynacl is a wrapper around libsodium. Above I pointed out that if you just download the libsodium it appear to work, so it was unclear what was different when pip did it's thing. I think the answer is https://github.com/pyca/pynacl/blob/master/setup.py#L161:

        # Run ./configure
        subprocess.check_call(
            [
                configure, "--disable-shared", "--enable-static",
                "--disable-debug", "--disable-dependency-tracking",
                "--with-pic", "--prefix", os.path.abspath(self.build_clib),
            ],
            cwd=build_temp,

If I run that on the same system I was testing with, then make check does fail with

../../build-aux/test-driver: line 107: 142284: Memory fault(coredump)
FAIL: randombytes

From some trial and error the combination of both --disable-shared and --with-pic is needed to trigger the crash.

The steps to reproduce are along the line of:

pkgin in build-essential
wget https://download.libsodium.org/libsodium/releases/libsodium-1.0.16.tar.gz
tar xvfz libsodium-1.0.16.tar.gz 
cd libsodium-1.0.16
./configure --disable-shared    --with-pic && make check

I tried it on a few different JPC instances and got:

pkgsrc kernel result
2018Q3-x86_64 20180816T001857Z CRASH
2017Q4-x86_64 20180813T212436Z CRASH
2014Q4-x86_64 20180816T001857Z no crash

I'm not sure if this points to a bug in the kernel or something provided by pkgsrc, or what the right place is to look next (cc @jperkin).

@jedisct1
Copy link

jedisct1 commented Jan 6, 2019

The problem seems to be that thread-local variables from static libraries are not properly initialized in SmartOS.

randombytes_salsa20_random.c declares a thread-local variable:

static TLS Salsa20Random stream = { ... };

(On SmartOS, TLS is a macro for_Thread_local).

When the library is statically linked, stream is not properly initialized, and accessing the structure's fields leads to undefined behavior.

A workaround is to compile the library without threads --without-threads, which is probably acceptable for the Python bindings.

@plluksie
Copy link

Just for the record - in case of libsodium-1.0.16 and libsodium-1.0.17 --without-threads doesn't help. The library compiles only if --enable-shared is requested.

@richlowe
Copy link

The compiler outputs an R_386_TLS_LDM relocation for stream at randombytes_salsa20_random_stir+0x7e which the link-editor is processing faithfully. We'd expect to see an R_386_PLT32 or R_386_TLS_LDM_PLT after this (at +0x84), to either fix up a call __tls_get_addr or to nop it out. The link-editor is doing neither, and leaving a dangling call with arbitrarily bad results.

At first glance, it looks like .rel.text contains the entries we'd hoped for, but that ld(1) is ignoring one of them.

I think this is because we expect the "main" TLS relocation to have dealt with this for us, as we would in the case of R_386_TLS_GD, which would indeed fix up the call. the LDM relocations don't do this, and it seems likely that they must. I think that would still be valid if we later see the LDM_PLT relocation, we'll just overwrite text we already overwrote (and, actually, then overwrite it yet again, later).

@richlowe
Copy link

If I can trust the randombytes test that @jasonbking showed me, a diff similar to this one:
https://gist.github.com/richlowe/33f189f0f71ffe6e0aa48b01faccfe9b

Applied to the illumos link-editor, makes things work. What this does is transition the 'call' as we would have with a _GD relocation, leaving behind an _LE relocation to fix up the addl.

Running randombytes after this

; TERM=ansi mdb +o pager ./randombytes
> randombytes_salsa20_random_stir:b
> ::run
mdb: stop at randombytes_salsa20_random_stir
mdb: target stopped at:
randombytes_salsa20_random_stir:pushl  %ebp
> 1::tls stream
fe533210
> fe533210/KKC
0xfe533210:     0               0               \0
> ::step out
mdb: target stopped at:
randombytes_stir+0x26:  popl   %eax
> fe533210/KKC
0xfe533210:     1               0               \266
> $q

Looks reasonable for correctness based on my very limited understanding of the code.

A more adequate check for correctness -- adding CTF to the salsa20 rdrand object and checking the hrtime at the end of the struct, also looks ok.

I don't have a build environment that allows me to build a broken system (I'm using an object from @jasonbking) so I can't reasonably test this further. I'd appreciate if somebody who could, would rebuild libld.so.4 using the above patch, and see if everything also seems good to them.

@jasonbking
Copy link

I was able to recreate the problem (and sent @richlowe the resulting binary). Using an ld with his patch, I do not see the failure anymore.

@plluksie
Copy link

I was able to recreate the problem (and sent @richlowe the resulting binary). Using an ld with his patch, I do not see the failure anymore.

What were the steps to test the patch? Is it possible to update ld only in one particular zone without touching GZ? Is there any plan to include the patch in the newest release? Thanks!

@jasonbking
Copy link

The local dynamic TLS fix was integrated in commit 096c97d62be876a03a0a8cdb0a540e9c84ec509f which was merged into illumos-joyent this morning. If you build your own SmartOS-live image, you can try that. Otherwise the fix should appear in the release that should be out around Feb 13th.

@plluksie
Copy link

plluksie commented Feb 24, 2019

Just tried to build on:
SunOS ikara 5.11 joyent_20190214T002809Z i86pc i386 i86pc Solaris
The issue persists. Steps to reproduce - tried on c193a558-1d63-11e9-97cf-97bb3ee5c14f base-64-lts 18.4.0 smartos zone-dataset 2019-01-21:
$ sudo pkgin install python37 py37-pip gcc7{,-libs} gmake
$ python3 -m venv test && source test/bin/activate
$ python3 -m pip install --upgrade pip
$ python3 -m pip install ansible
If needed I can attach the core file.

@jasonbking
Copy link

Unfortunately, this appears to be a different bug. The earlier bug was something that was just present in 32-bit.

@richlowe
Copy link

This is indeed a similar but different issue. The TLS transition related corruption here causes any write to members of stream to actually write to the beginning of stream, thus radically corrupting rnd32_outleft several times in randombytes_salsa20_random_stir

@richlowe
Copy link

The amd64 bug is occurring because the link-editor erroneously 0's the addend when transitioning from LD (dtpoff) to LE (tpoff), and thus the structure offsets emitted by the compiler in the form

leaq	48+stream@dtpoff(%rax), %rdi

to get at rnd32, effectively disappear.

@plluksie
Copy link

Thanks for your comments and interest. Should I open another issue to track that?

@richlowe
Copy link

I filed illumos bug: #10471 "ld(1) amd64 LD->LE TLS transition causes memory corruption"
(https://www.illumos.org/issues/10471) to track the fix.

I don't know whether the smartos folks would like a second bug filed or not.

@jasonbking
Copy link

This bug for 64-bit builds was filed as illumos#10471

@jasonbking
Copy link

Since we merge w/ illumos-gate M-F, once it lands in illumos-gate, it should land in illumos-joyent shortly thereafter (and then in the following bi-weekly release).

@plluksie
Copy link

Okay. Thanks, will wait patiently.

@jasonbking
Copy link

I tried the above commands on the latest release (20190328T010321Z), and pynacl as well as ansible in general both build without an error now. I suspect the previous release will also work (I believe it also has the fix), though I didn't have a chance to try to build on that.

@plluksie
Copy link

plluksie commented Apr 1, 2019

Hi @jasonbking. I confirm. Works like a charm. Thanks a lot!

@jasonbking
Copy link

Since the issue is resolved, I'm going to go ahead and close this now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants