Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add memory sanitizer test to travis #687

Closed
wants to merge 1 commit into from

Conversation

jonasnick
Copy link
Contributor

@jonasnick jonasnick commented Nov 4, 2019

In #558 there was an unintialized memory read (in the tests). This happened in all travis test configs but only a single test configuration produced an error in a verify check. In order to catch issues with uninitialized memory reliably, this PR adds a test configuration with the clang memory sanitizer to travis. The memory sanitizer errors out when running with the unfixed schnorrsig PR.

The result of running this PR with an additional commit that adds an uninitialized memory can be viewed at https://travis-ci.org/jonasnick/secp256k1/jobs/607352962

Copy link
Contributor

@sipa sipa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

.travis.yml Outdated
# and BIGNUM are disabled because clang memory sanitizer does not work
# with inline assembly (https://clang.llvm.org/docs/MemorySanitizer.html).
# The memory sanitizer is instructed to exit with a different exit code
# using MSAN_OPTIONS. This is because the default exit code is 77 - the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, how long did it take you to figure this out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, not long luckily. The automake doc is pretty clear about exit code 77 being the cause of skipped tests.

@real-or-random
Copy link
Contributor

real-or-random commented Nov 5, 2019

Apparently both test programs fail with the sanitizer enabled on travis.

I'm testing this locally and the output is much better with
./configure --enable-coverage --disable-openssl-tests CC=clang CFLAGS="-fsanitize=memory -fsanitize-memory-track-origins -fno-omit-frame-pointer -g"

  • -fsanitize-memory-track-origins is a no brainer. It needs more ressources but fine
  • -fno-omit-frame-pointer -g for printing functions and line numbers
  • --enable-coverage is mostly a hack to force -O0.

Interestingly, I get different results with coverage on and off (due to the enabled VERIFYs), and in both cases the sanitizer fails pretty early here. :/

edit: The caveat of all these options is that they don't work on the compiler options that we really use for the build in the end. I still think they're useful, I guess the memory sanitizer is more useful for spotting mistakes in the source code and not issues introduced by the compiler.

@jonasnick
Copy link
Contributor Author

@real-or-random you're missing --without-asm and --with-bignum=no. The problem with the travis test run is that ASM=no doesn't have an effect. Will write a fix.

@real-or-random
Copy link
Contributor

Oh indeed. I still suggest adding -fsanitize-memory-track-origins -fno-omit-frame-pointer -g here to make sure that the output of the sanitizer is useful.

By the way, is there a reason why you can't override the optimization level in the CFLAGS? @gmaxwell @sipa

.travis.yml Outdated
# using MSAN_OPTIONS. This is because the default exit code is 77 - the
# same exit code that autotools make check interprets as a test that is
# supposed to be skipped.
env: EXTRAFLAGS="--disable-openssl-tests CFLAGS=-fsanitize=memory" ASM=no BIGNUM=no EXPERIMENTAL=yes ENDOMORPHISM=yes RECOVERY=yes ECDH=yes MSAN_OPTIONS=exitcode=42
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's a good idea to test with both ENDOMORPHISM=yes and ENDOMORPHISM=no because both code paths are interesting for memory: setting up the contexts involves a lot of pointer arithmetic etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I can add that.

@practicalswift
Copy link
Contributor

Strong Concept ACK

Cannot survive over long time horizons without the sanitizers :)

Thanks for doing this @jonasnick!

@jonasnick
Copy link
Contributor Author

jonasnick commented Nov 5, 2019

Rebased. @real-or-random

Oh indeed. I still suggest adding -fsanitize-memory-track-origins -fno-omit-frame-pointer -g here to make sure that the output of the sanitizer is useful.

When I experimented with this before opening the PR I noticed that with -fsanitize-memory-track-origins the tests take much longer (especially with schnorrsigs). I settled on giving travis only the responsibility to detect errors and not to pretty-print them. That would be done locally, by the devs because they need to verify anyway that their patch works. OTOH I don't know how to document all the nice flags such that it's easy to discover. I'll play with -fsanitize-memory-track-origins in my fork to see how long it takes for travis.

@jonasnick
Copy link
Contributor Author

I've added a non-endo test. But my tests with -fsanitize-memory-track-origins didn't go that well: I didn't manage to escape quotes in a way that allows specifying multiple options with CFLAGS. Unless someone has an idea for how to do that, I'll leave the PR as is.

@jonasnick jonasnick force-pushed the travis-memsanitizer branch 2 times, most recently from e23148a to f3cae6b Compare November 6, 2019 00:02
@jonasnick
Copy link
Contributor Author

With @real-or-random's help I added -fno-omit-frame-pointer -g to give a much nicer output in case of failure.

@real-or-random
Copy link
Contributor

ACK f3cae6b

When I experimented with this before opening the PR I noticed that with -fsanitize-memory-track-origins the tests take much longer (especially with schnorrsigs). I settled on giving travis only the responsibility to detect errors and not to pretty-print them. That would be done locally, by the devs because they need to verify anyway that their patch works.

Makes perfect sense by the way.

@gmaxwell
Copy link
Contributor

gmaxwell commented Nov 7, 2019

The msan is a much weaker test than valgrind particularly with -DVALGRIND as that tests properties that can't be easily tested any other way ... does the travis environment have valgrind? We've historically used valgrind (so this is not a case where valgrind is too slow to be useful)...

@real-or-random
Copy link
Contributor

The msan is a much weaker test than valgrind particularly with -DVALGRIND as that tests properties that can't be easily tested any other way ... does the travis environment have valgrind? We've historically used valgrind (so this is not a case where valgrind is too slow to be useful)...

My guess was that it is too slow but this was really just a guess. If you say it's not, we should definitively try it.

@elichai
Copy link
Contributor

elichai commented Nov 7, 2019

@gmaxwell I'm not an expert on valgrind and/or sanitizers,
but I'm not sure that you're right. I think msan also capture/checks the stack. where valgrind mostly checks only the heap (though I might be confusing msan with asan)

@gmaxwell
Copy link
Contributor

gmaxwell commented Nov 7, 2019

@elichai that isn't the case, memcheck checks the stack/heap, use of freed memory, use of uninitilized memory (regardless of stack or heap). And it isn't broken by SIMD or assembly ... Sometimes optimizations in the compiler can hide issues, since valgrind cannot see any behaviour that doesn't actually make it into the binary, but the optimizers can hide bugs from msan too.

Moreover, libsecp256k1's tests have active instrumentation for valgrind which mark memory that calls shouldn't touch as uninitialized and then after the calls verifies that it's still uninitialized-- making sure that its not moving data out and putting it back from pointers it shouldn't be accessing at all.

With the exception of bugs totally removed by optimization I'm not aware of any case where msan is more sensitive than valgrind, but I'm aware of many where it's less. Because valgrind can actually run with the production binaries it can also catch miscompliation. -- even the msan page states "MSan implements a subset of functionality found in Valgrind (Memcheck tool)".

The big thing that msan has going for it is that its faster than memcheck. Of course, there is no harm in using both... but switching from using valgrind in development to msan run in travis would be a step back.

@jonasnick
Copy link
Contributor Author

@elichai valgrind certainly found the uninitialized I temporarily added to #558.

@gmaxwell this PR is not intended to replace local testing - it's just belt and suspenders. Travis is not to be trusted anyway.

I played with running valgrind in travis and it seems to work with "test count" 8 instead of the default 64 jonasnick#10. I'll open an alternative PR with a more cleaned up version of the valgrind travis config.


# clang with memory sanitizer
#
# --disable-openssl-tests because openssl uses uninitialized memory. ASM
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I haven't checked this myself, but an alternative might be to export MSAN_OPTIONS with a suppressions file for the openssl function that is affected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed but I don't think that's better in the end, and disabling the tests as done here is simpler. Note that the flag here really just influences just the test code. OpenSSL is only used in the tests, not used in the library itself.

@jonasnick
Copy link
Contributor Author

The memory sanitizer documentation says that it implements a subset of valgrind's memcheck. So I don't think we need both and can close this PR if we have valgrind.

@jonasnick jonasnick closed this Nov 25, 2019
jonasnick added a commit that referenced this pull request Nov 25, 2019
dd98cc9 travis: Added a valgrind test without endro and enabled recovery+ecdh (Elichai Turkel)
b4c1382 Add valgrind check to travis (Elichai Turkel)

Pull request description:

  As discussed in #687
  This adds valgrind check to the repo.

  It doesn't run on recovery+ecdh because of the time.
  No openssl because of uninitialized mem.
  I debated between with and without ASM, but decided with ASM because it might be more fragile(?).

  I wasn't sure if I should pass `-DVALGRIND` via `CFLAGS` or `CPPFLAGS`, it seems like because this is only C then there shouldn't even be `CPPFLAGS` but looks like we use `CPPFLAGS` in other places for the preprocessor definitions.

  If people are worried about the time it takes we can mark it as `allow_failure` although I don't think it's a problem here because there's only a handful of PRs and they're usually open for weeks.

ACKs for top commit:
  real-or-random:
    ACK dd98cc9 I looked at the diff
  jonasnick:
    ACK dd98cc9

Tree-SHA512: 72d7f1f4c8dd4c58501ac1003b28296d6fd140a8f7711e9e3b3c04a3fbce358ff1c89d2e1d1c5489d7668d3019981264c5cadecae3d9b48cd38c9463e287d8ad
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants