Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI failure: ThreadSanitizer: data race /usr/lib/llvm-12/bin/../include/c++/v1/ios:523:12 in std::__1::ios_base::width() const #23366

Open
ryanofsky opened this issue Oct 27, 2021 · 13 comments · Fixed by #23370

Comments

@ryanofsky
Copy link
Contributor

I'm seeing this race failure in two unrelated PRs during the validation_chainstate_tests/chainstate_update_tip case:

https://cirrus-ci.com/task/5125235994263552?logs=ci#L4140 from #22702
https://cirrus-ci.com/task/5269464468946944?logs=ci#L4142 from #21206

test/validation_chainstate_tests.cpp(83): Entering test case "chainstate_update_tip"

...

[Makefile:18480: test/validation_chainstate_tests.cpp.test] Error 1

...

WARNING: ThreadSanitizer: data race (pid=23076)
Read of size 8 at 0x7f23d669c6d0 by main thread:
#0 std::__1::ios_base::width() const /usr/lib/llvm-12/bin/../include/c++/v1/ios:523:12 (test_bitcoin+0x17826c)

...

  Location is global 'std::__1::cout' of size 160 at 0x7f23d669c6b0 (libc++.so.1+0x0000000cd6d0)

...

SUMMARY: ThreadSanitizer: data race /usr/lib/llvm-12/bin/../include/c++/v1/ios:523:12 in std::__1::ios_base::width() const

I'm not sure if this is a real bug or spurious, but I did work around a similar problem previously (in #15719) by adding a suppression:

# Disable spurious error about std::cout.
# data race /usr/lib/llvm-10/bin/../include/c++/v1/ios:522:12 in std::__1::ios_base::width() const
# Location is global 'std::__1::cout' of size 160 at 0x7f492785e270 (libc++.so.1+0x0000000c0290)
# https://travis-ci.org/github/bitcoin/bitcoin/jobs/695004018
#
# Uses of std::cout are guaranteed thread safe by the c++ standard
# https://stackoverflow.com/questions/50322790/thread-safety-and-piping-to-streams
race:std::__1::ios_base::width

@ryanofsky ryanofsky added the Bug label Oct 27, 2021
@hebasto
Copy link
Member

hebasto commented Oct 27, 2021

Also the same bug was observed on the master branch recently:

@maflcko
Copy link
Member

maflcko commented Oct 27, 2021

Indeed, this is not an issue with #15719, but with master

@hebasto
Copy link
Member

hebasto commented Oct 27, 2021

Indeed, this is not an issue with #15719, but with master

Suggested suppression in #23370.

@ryanofsky
Copy link
Contributor Author

Looking at https://cirrus-ci.com/github/bitcoin/bitcoin/master, I guess this problem does not happen reliably on master, but is nondeterministic? IIRC, when I was seeing similar TSAN errors months ago in #15719, the problem seemed to be deterministic and would happen every time.

The chainstate_update_tip was added a month ago in 673a5bd from #21526 by @jamesob, so maybe James could look at stack traces and clarify whether it is expected that there might be multiple threads writing to cout at same time.

I think the TSAN suppression seems safe, but am not completely confident about it. And even if it is thread safe it still means multiple threads could be writing to stdout at the same time and producing garbled output.

Also, I'm not sure why this bug would start happening now if #21526 was merged a month ago, so wondering if something else might have changed recently? Or if this happens infrequently enough to not be noticed? Or if this was noticed but was just ignored a little while?

Would be nice if there was a way to easily grep cirrus logs and find all the instances of this.

@martinus
Copy link
Contributor

martinus commented Nov 7, 2021

As mentioned std::cout is supposed to be threadsafe: https://eel.is/c++draft/iostream.objects#overview-7

The problem is, it seems that io manipulators are not threadsafe: the stream members are not atomic, and there are no locks: https://github.com/llvm/llvm-project/blob/release/12.x/libcxx/include/ios#L372

But many methods call ios.width(0) so that std::setw() is only valid for one output: https://en.cppreference.com/w/cpp/io/ios_base/width

This is done here: https://github.com/llvm/llvm-project/blob/main/libcxx/include/locale#L1400 which is called whenever e.g. operator<<(char const*) is called.

So even when the standard says that concurrent access to cout isn't supposed to have any data races, this is one right there. Not sure what to do about that.

@maflcko
Copy link
Member

maflcko commented Nov 8, 2021

Interesting that no one else ran into this before us. Maybe a bug should be reported, since I couldn't find any https://bugs.llvm.org/buglist.cgi?quicksearch=race%20ios_base&list_id=226152

@laanwj laanwj closed this as completed in 8f86820 Nov 10, 2021
sidhujag pushed a commit to syscoin/syscoin that referenced this issue Nov 10, 2021
96c7db9 test: Add ios_base::width tsan suppression (Hennadii Stepanov)

Pull request description:

  This PR:
  - adds tsan suppression for intermittent failures in CI
  ```
  SUMMARY: ThreadSanitizer: data race /usr/lib/llvm-12/bin/../include/c++/v1/ios:523:12 in std::__1::ios_base::width() const
  ```

  - fixes bitcoin#23366

ACKs for top commit:
  laanwj:
    Concept and code review ACK 96c7db9

Tree-SHA512: fcad296e8da4a6d94dcbb011c3d9b3d07f6983818be16cfff8341a035fa6abe2777ae72409c9bc83083097660408a850c1e9cd6f0ad3ea7976e4a4768f1e1858
@ryanofsky
Copy link
Contributor Author

I guess #23370 resolved CI failures, but from Martin's comment #23366 (comment), it seems like the suppression isn't ideal because there is a real data race. I think good followups would be to:

  • Improve comment added in #23370, so instead of being a blind link, it has some explanation and at least mentions std::cout. A good template could be comment from 81be7ff5 with updated information from #23366 (comment)
  • Add a new lock around uses of std::cout to remove the TSAN suppression. Or replace cout calls triggering this with calls that go through the logging framework to remove the suppression. Doing either of these would also prevent interleaved output from multiple threads in addition to avoiding need for the suppression.
  • Maybe check upstream and see if library implementation is violating the standard. From Martin's comment, it does seem like there is a bug in either the library or the standard, though it's not clear what a good fix would be.

Could maybe mark this issue up for grabs if this makes sense and is of interest.

@maflcko
Copy link
Member

maflcko commented Nov 16, 2021

Or replace cout calls triggering this with calls that go through the logging framework to remove the suppression.

If the boost test logging has internal locking, that seems a fine alternative. Are you referring to BOOST_TEST_MESSAGE https://www.boost.org/doc/libs/1_77_0/libs/test/doc/html/boost_test/utf_reference/testout_reference/test_output_macro_message.html ?

@maflcko maflcko reopened this Nov 16, 2021
@ryanofsky
Copy link
Contributor Author

Or replace cout calls triggering this with calls that go through the logging framework to remove the suppression.

If the boost test logging has internal locking, that seems a fine alternative. Are you referring to BOOST_TEST_MESSAGE https://www.boost.org/doc/libs/1_77_0/libs/test/doc/html/boost_test/utf_reference/testout_reference/test_output_macro_message.html ?

I was being vague because I didn't look into where these cout writes were coming from. But from the cirrus TSAN output above, one cout call is coming G_TEST_LOG_FUN and the other one is coming from a boost method boost::unit_test::unit_test_log_t::operator<<(boost::unit_test::lazy_ostream const&). We also do have a few scattered BOOST_TEST_MESSAGE calls which would be a third source of cout writes.

To be able to remove the suppression, all these these cout writes would need to be protect with a lock. We can lock G_TEST_LOG_FUN ourselves since it is our own code. It looks like we could synchronize the boost cout output by redirecting it with boost::unit_test::unit_test_log.set_stream( std::ostream& ); https://www.boost.org/doc/libs/1_77_0/libs/test/doc/html/boost_test/test_output/logging_api/log_ct_output_stream_redirection.html with a custom std::ostream subclass that locks a mutex and forwards to cout.

I'm not sure if this would be worth it, but it seems something like this would be needed to remove the suppression.

@martinus
Copy link
Contributor

Someone (not me) created a bug for exactly this issue, with a nice short reproducer: https://bugs.llvm.org/show_bug.cgi?id=52509

In C++20 there will be std::osyncstream: https://en.cppreference.com/w/cpp/io/basic_osyncstream So I think it would be nice to have something with the same API that's eventually replaced with std::osyncstream once we switch to C++20

@maflcko
Copy link
Member

maflcko commented Jan 11, 2022

bugs.llvm was deleted. The bug is now here: llvm/llvm-project#51851

PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this issue Apr 16, 2023
96c7db9 test: Add ios_base::width tsan suppression (Hennadii Stepanov)

Pull request description:

  This PR:
  - adds tsan suppression for intermittent failures in CI
  ```
  SUMMARY: ThreadSanitizer: data race /usr/lib/llvm-12/bin/../include/c++/v1/ios:523:12 in std::__1::ios_base::width() const
  ```

  - fixes bitcoin#23366

ACKs for top commit:
  laanwj:
    Concept and code review ACK 96c7db9

Tree-SHA512: fcad296e8da4a6d94dcbb011c3d9b3d07f6983818be16cfff8341a035fa6abe2777ae72409c9bc83083097660408a850c1e9cd6f0ad3ea7976e4a4768f1e1858
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this issue Apr 16, 2023
96c7db9 test: Add ios_base::width tsan suppression (Hennadii Stepanov)

Pull request description:

  This PR:
  - adds tsan suppression for intermittent failures in CI
  ```
  SUMMARY: ThreadSanitizer: data race /usr/lib/llvm-12/bin/../include/c++/v1/ios:523:12 in std::__1::ios_base::width() const
  ```

  - fixes bitcoin#23366

ACKs for top commit:
  laanwj:
    Concept and code review ACK 96c7db9

Tree-SHA512: fcad296e8da4a6d94dcbb011c3d9b3d07f6983818be16cfff8341a035fa6abe2777ae72409c9bc83083097660408a850c1e9cd6f0ad3ea7976e4a4768f1e1858
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this issue Apr 17, 2023
96c7db9 test: Add ios_base::width tsan suppression (Hennadii Stepanov)

Pull request description:

  This PR:
  - adds tsan suppression for intermittent failures in CI
  ```
  SUMMARY: ThreadSanitizer: data race /usr/lib/llvm-12/bin/../include/c++/v1/ios:523:12 in std::__1::ios_base::width() const
  ```

  - fixes bitcoin#23366

ACKs for top commit:
  laanwj:
    Concept and code review ACK 96c7db9

Tree-SHA512: fcad296e8da4a6d94dcbb011c3d9b3d07f6983818be16cfff8341a035fa6abe2777ae72409c9bc83083097660408a850c1e9cd6f0ad3ea7976e4a4768f1e1858
@maflcko
Copy link
Member

maflcko commented Jun 12, 2023

Maybe we can set https://en.cppreference.com/w/cpp/io/ios_base/sync_with_stdio as a work-around to get thread-safety?

Edit: nvm, this should already be set by default?

@maflcko
Copy link
Member

maflcko commented Jan 3, 2024

To be able to remove the suppression, all these these cout writes would need to be protect with a lock. We can lock G_TEST_LOG_FUN ourselves since it is our own code. It looks like we could synchronize the boost cout output by redirecting it with boost::unit_test::unit_test_log.set_stream( std::ostream& ); https://www.boost.org/doc/libs/1_77_0/libs/test/doc/html/boost_test/test_output/logging_api/log_ct_output_stream_redirection.html with a custom std::ostream subclass that locks a mutex and forwards to cout.

I'm not sure if this would be worth it, but it seems something like this would be needed to remove the suppression.

In C++20 there will be std::osyncstream: https://en.cppreference.com/w/cpp/io/basic_osyncstream So I think it would be nice to have something with the same API that's eventually replaced with std::osyncstream once we switch to C++20

I guess std::osyncstream does not work out of the box with boost::unit_test::unit_test_log.set_stream, because boost would internally need to call emit?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants
@martinus @maflcko @ryanofsky @hebasto and others