-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assertion failures crypto_macos.cc: expected: CHECK_IS_BLOCKING(SIGPROF)
#36908
Comments
There are cores available for the failure above, see the "isolated out" links in the build. Looking at one particular core it has the following stack trace:
Looking at the code: // thread.h
class Thread : public ThreadState {
...
private:
Random thread_random_;
... // random.cc
Random::Random() {
uint64_t seed = FLAG_random_seed;
if (seed == 0) {
Dart_EntropySource callback = Dart::entropy_source_callback();
if (callback != NULL) {
if (!callback(reinterpret_cast<uint8_t*>(&seed), sizeof(seed))) {
...
...
...
} bool Crypto::GetRandomBytes(intptr_t count, uint8_t* buffer) {
ThreadSignalBlocker signal_blocker(SIGPROF);
intptr_t fd =
TEMP_FAILURE_RETRY_NO_SIGNAL_BLOCKER(open("/dev/urandom", O_RDONLY));
if (fd < 0) {
return false;
}
intptr_t bytes_read = 0;
do {
int res = TEMP_FAILURE_RETRY_NO_SIGNAL_BLOCKER( // <-- this is where SIGPROF is not blocked
read(fd, buffer + bytes_read, count - bytes_read));
... It's a bit unclear how this could ever happen, since we block Yet we hit the assertion above. The addition of |
If we can trust the data on the flakiness dashboard, then it is a little suspicious that a270999 was landed in April but the heavy flakiness started on 2nd of May. |
All results from the last two months on
And there are results in the database in between April 5th and May 2nd. (I'm unsure why |
Okay, only 1 of the 6 (1 original + 5 deflake) results make it into the results database (because they have the same primary key. This is why things with a single flake don't show up in the results database. Moreover, after 100 non-flaky subsequent runs, a flaky test is forgiven - disappearing from the flakiness dashboard. So actually, 2nd of May is just exactly 100 builds ago. So this flakiness might have actually been happening since a270999. |
I ran this script to track down the issue:
Which located this build as the first containing the issue: https://ci.chromium.org/p/dart/builders/ci.sandbox/vm-kernel-mac-debug-simdbc64/2470 Which has blamelist 97122d1..0377617, out of which this commit seems the most likely: https://dart-review.googlesource.com/c/sdk/+/100988 Which does indeed mention SIGPROF. |
Thanks for helping me dig through our cloud deflaking data @sortie! |
// Spawned threads inherit their spawner's signal mask. We sometimes spawn
// threads for running Dart code from a thread that is blocking SIGPROF.
// This function explicitly unblocks SIGPROF so the profiler continues to
// sample this thread.
static void UnblockSIGPROF() {
sigset_t set;
sigemptyset(&set);
sigaddset(&set, SIGPROF);
int r = pthread_sigmask(SIG_UNBLOCK, &set, NULL);
USE(r);
ASSERT(r == 0);
ASSERT(!CHECK_IS_BLOCKING(SIGPROF));
} The commit adds this method on MacOS and calls it on pthread spawning. This causes us to hit this crash. However, the Linux version already had this code, which was added back in 2016 #26416 https://codereview.chromium.org/1953143002. The question is why does MacOS behave different from Linux. |
Some quick Googling revealed this: https://lesteryu.com/signal-delivery-behavior-on-os-x/.
|
The flakiness dashboard also contains a group of tests that fails more frequently, with SIGPROF.
etc.
Somewhat reliably reproduces the problem on my macbook (50-50ish). Reverting 5393ce7 reliably prevents the crashes by not enabling SIGPROF on MacOS in the first place. |
@rmacnak-google This might not be properly supported on MacOS. (Unfortunately, the CL 3 years ago does not mention why it was only added to Linux.) You can semi-reliably reproduce this with |
I looked at this again with @mraleph yesterday. The sigprof signal seems to become unblocked at random moments. I double checked, on master reverting the part of @rmacnak-google's change that enables the sigprof signal on mac threads makes the AsciiDecoder convert test succeed (sigprof.patch). |
Avoid flakily failures setting the thread signal mask. Bug: #36908 Change-Id: I3d66214189f3276365b4c1cc333847cb4f65c94f Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/103400 Commit-Queue: Ryan Macnak <rmacnak@google.com> Reviewed-by: Alexander Markov <alexmarkov@google.com>
Since around 5th of May our flakiness dashboard shows a lot of flaky crashes due to hitting assertions of this form (from build):
The text was updated successfully, but these errors were encountered: