Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop 1-Weso squaring on reaching target iterations #174

Merged
merged 1 commit into from May 14, 2024

Conversation

xearl4
Copy link
Contributor

@xearl4 xearl4 commented Apr 4, 2024

For 1-Wesolowski proofs (used for "blueboxing") the squaring thread previously needlessly continued squaring while proof computation was in progress. In practice, proof computation also takes a while, so this could easily result in 10-20M additional squaring iterations. With this change, the squaring stops after the iterations target is reached. This also makes CPU utilisation more predictable. Previously, during squaring 2 threads were heavily utilised, and 3 threads during proof computation. Now it's never more than 2 high-load threads: 2 for squaring, 1 for proof computation.

This was originally written more than 2 years ago for the "blueboxing group" we ran in a SpaceFarmers.io Discord channel and was used by the people participating in that group intensively over several months. With the recently renewed interest in blueboxing, I remembered that I never got around to upstreaming that change: so here we go :)

@hoffmang9
Copy link
Member

Awesome!

@emlowe emlowe requested a review from fchirica April 4, 2024 20:59
@emlowe
Copy link
Contributor

emlowe commented Apr 4, 2024

The macos TSAN (thread sanitizer) test does seem to be failing - this only runs on macOS intel so perhaps it's usefulness is declining. The Ubuntu TSAN test passed - so this is a curious failure

@emlowe
Copy link
Contributor

emlowe commented Apr 5, 2024

The macOS TSAN tests passes for me in main of chiavdf, but does not with this change. Although I can't quite figure out why yet since it seems to be complaining about std::cout

==================
WARNING: ThreadSanitizer: data race (pid=5847)
  Read of size 4 at 0x7ff8569a1348 by main thread:
    #0 std::__1::basic_ostream<char, std::__1::char_traits<char> >& std::__1::__put_character_sequence<char, std::__1::char_traits<char> >(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, char const*, unsigned long) <null>:2 (1weso_test:x86_64+0x10002bbe8)
    #1 main <null>:2 (1weso_test:x86_64+0x10003e279)
    #2 start <null>:2 (dyld:x86_64+0x552d)
    #3 start <null>:2 (dyld:x86_64+0x552d)

  Previous write of size 4 at 0x7ff8569a1348 by thread T1:
    #0 std::__1::basic_ostream<char, std::__1::char_traits<char> >& std::__1::__put_character_sequence<char, std::__1::char_traits<char> >(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, char const*, unsigned long) <null>:2 (1weso_test:x86_64+0x10002bc45)
    #1 repeated_square(unsigned long long, form, integer const&, integer const&, WesolowskiCallback*, FastStorage*, std::__1::atomic<bool>&) <null>:2 (1weso_test:x86_64+0x100039fa7)
    #2 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(unsigned long long, form, integer const&, integer const&, WesolowskiCallback*, FastStorage*, std::__1::atomic<bool>&), unsigned long long, form, integer, integer, OneWesolowskiCallback*, FastStorage*, std::__1::reference_wrapper<std::__1::atomic<bool> > > >(void*) <null>:2 (1weso_test:x86_64+0x100040aa7)

  As if synchronized via sleep:
    #0 nanosleep <null>:2 (libclang_rt.tsan_osx_dynamic.dylib:x86_64+0x2ccc5)
    #1 std::__1::this_thread::sleep_for(std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > const&) <null>:2 (libc++.1.dylib:x86_64+0x15ad0)
    #2 start <null>:2 (dyld:x86_64+0x552d)
    #3 start <null>:2 (dyld:x86_64+0x552d)

  Location is global 'std::__1::cout' at 0x7ff8569a12b0 (libc++.1.dylib+0x4179a348)

  Thread T1 (tid=25547, finished) created by main thread at:
    #0 pthread_create <null>:2 (libclang_rt.tsan_osx_dynamic.dylib:x86_64+0x2df1f)
    #1 main <null>:2 (1weso_test:x86_64+0x10003da3f)
    #2 start <null>:2 (dyld:x86_64+0x552d)
    #3 start <null>:2 (dyld:x86_64+0x552d)

SUMMARY: ThreadSanitizer: data race (1weso_test:x86_64+0x10002bbe8) in std::__1::basic_ostream<char, std::__1::char_traits<char> >& std::__1::__put_character_sequence<char, std::__1::char_traits<char> >(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, char const*, unsigned long)+0xb8
==================

@emlowe
Copy link
Contributor

emlowe commented Apr 5, 2024

Ok, if I comment out the line in vdf.h
std::cout << "VDF loop finished. Total iters: " << num_iterations << "\n" << std::flush;

The tests pass again.

@emlowe
Copy link
Contributor

emlowe commented Apr 5, 2024

This seems to be a TSAN bug in this version of Clang , as per the C++ spec:

In C++11, we do have some guarantees. The FDIS says the following in §27.4.1 [iostream.objects.overview]:

Concurrent access to a synchronized (§27.5.3.4) standard iostream object’s formatted and unformatted input (§27.7.2.1) and output (§27.7.3.1) functions or a standard C stream by multiple threads shall not result in a data race (§1.10). [ Note: Users must still synchronize concurrent use of these objects and streams by multiple threads if they wish to avoid interleaved characters. — end note ]

@emlowe emlowe mentioned this pull request Apr 5, 2024
@emlowe
Copy link
Contributor

emlowe commented Apr 5, 2024

@xearl4 I added a lock around cout to prevent the TSAN false positive - it really shouldn't be needed, but 🤷 - too bad we don't have the convenient c++20 <syncstream> - but that would seem a larger change to require c++20

see #176

For 1-Wesolowski proofs (used for "blueboxing") the squaring thread
previously needlessly continued squaring while proof computation was in
progress. In practice, proof computation also takes a while, so this
could easily result in 10-20M additional squaring iterations. With this
change, the squaring stops after the iterations target is reached. This
also makes CPU utilisation more predictable. Previously, during squaring
2 threads were heavily utilised, and 3 threads during proof computation.
Now it's never more than 2 high-load threads: 2 for squaring, 1 for
proof computation.
Copy link
Contributor

@fchirica fchirica left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Thank you!

@emlowe
Copy link
Contributor

emlowe commented May 13, 2024

This needs #181 merged in first to fix TSAN

@emlowe
Copy link
Contributor

emlowe commented May 14, 2024

close and re-open for CI updates

@emlowe emlowe closed this May 14, 2024
@emlowe emlowe reopened this May 14, 2024
@emlowe emlowe merged commit 29079ca into Chia-Network:main May 14, 2024
98 of 99 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants