Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frequent clangd test failures on Windows buildbot #1712

Closed
HighCommander4 opened this issue Jul 24, 2023 · 4 comments
Closed

Frequent clangd test failures on Windows buildbot #1712

HighCommander4 opened this issue Jul 24, 2023 · 4 comments

Comments

@HighCommander4
Copy link

It appears there are frequent clangd test failures on the Windows buildbots.

A few samples:

https://lab.llvm.org/buildbot/#/builders/123/builds/20148/steps/4/logs/stdio:

Failed Tests (3):
  Clangd Unit Tests :: ./ClangdTests.exe/LSPTest/DiagnosticsHeaderSaved
  Clangd Unit Tests :: ./ClangdTests.exe/LSPTest/FeatureModulesThreadingTest
  Clangd Unit Tests :: ./ClangdTests.exe/LSPTest/ModulesTest

https://lab.llvm.org/buildbot/#/builders/123/builds/20145/steps/4/logs/stdio

Failed Tests (2):
  Clangd Unit Tests :: ./ClangdTests.exe/LSPTest/FeatureModulesThreadingTest
  Clangd Unit Tests :: ./ClangdTests.exe/LSPTest/ModulesTest

https://lab.llvm.org/buildbot/#/builders/123/builds/20142/steps/4/logs/stdio

Failed Tests (1):
  Clangd Unit Tests :: ./ClangdTests.exe/LSPTest/DiagnosticsHeaderSaved

https://lab.llvm.org/buildbot/#/builders/123/builds/20137/steps/4/logs/stdio

Failed Tests (8):
  Clangd Unit Tests :: ./ClangdTests.exe/12/38
  Clangd Unit Tests :: ./ClangdTests.exe/31/38
  Clangd Unit Tests :: ./ClangdTests.exe/7/38
  Clangd Unit Tests :: ./ClangdTests.exe/ClangdServerTest/ReparseOnHeaderChange
  Clangd Unit Tests :: ./ClangdTests.exe/ClangdServerTest/TestStackOverflow
  Clangd Unit Tests :: ./ClangdTests.exe/ClangdThreadingTest/NoConcurrentDiagnostics
  Clangd Unit Tests :: ./ClangdTests.exe/CompletionTest/DynamicIndexIncludeInsertion
  Clangd Unit Tests :: ./ClangdTests.exe/LSPTest/CDBConfigIntegration

https://lab.llvm.org/buildbot/#/builders/123/builds/20134/steps/4/logs/stdio

Failed Tests (3):
  Clangd Unit Tests :: ./ClangdTests.exe/28/38
  Clangd Unit Tests :: ./ClangdTests.exe/LSPTest/GoToDefinition
  Clangd Unit Tests :: ./ClangdTests.exe/LSPTest/RecordsLatencies

https://lab.llvm.org/buildbot/#/builders/123/builds/20131/steps/4/logs/stdio

Failed Tests (1):
  Clangd Unit Tests :: ./ClangdTests.exe/LSPTest/CDBConfigIntegration

https://lab.llvm.org/buildbot/#/builders/123/builds/20022/steps/4/logs/stdio

Failed Tests (2):
  Clangd Unit Tests :: ./ClangdTests.exe/28/38
  Clangd Unit Tests :: ./ClangdTests.exe/LSPTest/GoToDefinition

A common element seems to be the following message printed before the failures:

No result from call after 10 seconds!

cc @kadircet @hokein @sam-mccall

@AaronBallman
Copy link

I'm noticing a new message that might explain some of the instability.

https://lab.llvm.org/buildbot/#/builders/123/builds/20277

has a failure:

C:\b\slave\clang-x64-windows-msvc\llvm-project\llvm\include\llvm/Support/PointerLikeTypeTraits.h(58): fatal error C1088: Cannot flush compiler intermediate file: 'C:\Users\buildbot\AppData\Local\Temp\_CL_3eb71680ex': No space left on device

so I'm contacting the bot owner to see if they can address that; hopefully that will resolve the mysterious failures. However, if the only error you get is "no result from call after 10 seconds" but the root cause is disk space issues, perhaps the diagnostic can be improved in clangd?

@AaronBallman
Copy link

Any update on this?

Spot-checking https://lab.llvm.org/buildbot/#/builders/clang-x64-windows-msvc shows most of the failures are spurious ones from clangd tests. Given that this has been happening for a few weeks it's become kind of disruptive; should we be exploring disabling these tests?

@kadircet
Copy link
Member

Considering that even the Clangd Unit Tests :: ./ClangdTests.exe/LSPTest/RecordsLatencies test is failing, for which I can't get the logs. I am inclined to believe that load on this build-bot is just too high and our 10 seconds default timeouts waiting for async work to finish is not enough (This test literally calls a non-existent function, and gets a reply from the ClangdLSPServer, without even reaching ClangdServer).

So sending out a patch to bump the deadlines. It'd be great to get rid of them completely, but they're the only thing preventing clangd from hanging the buildbots forever if things go wrong.

@kadircet
Copy link
Member

patch for LSP tests in https://reviews.llvm.org/D158426, will take a look at ManyUpdates separately as it's a little bit more delicate.

kadircet added a commit to llvm/llvm-project that referenced this issue Aug 22, 2023
We seem to be hitting limits in some windows build bots, see
clangd/clangd#1712 (comment).

So bumping the timeouts to 60 seconds and completely dropping them for sync
requests. As mentioned in the comment above, this should improve things,
considering even the tests that don't touch any complicated scheduler is
failing.

Differential Revision: https://reviews.llvm.org/D158426
avillega pushed a commit to avillega/llvm-project that referenced this issue Sep 11, 2023
We started seeing a lot of timeouts that align with the change in lit to
execute gtests in shards. The logic there assumes tests are
single-threaded, which is the case for most of the LLVM, hence they
pick #shards ~ #cores (by slightly overshooting).

There are enough unittests in clangd that rely on multi-threading, they
can create arbitrarily many threads but we limit amount of meaningful
work to ~4 thread per process.

This change ensures that we're accounting for that paralelism when
executing clangd tests and not overloading test executors.

In theory the change overestimates the requirements, not all tests are
multi-threaded, but it doesn't seem to be resulting in any regressions
on my local runs.

Fixes llvm#64964.
Fixes clangd/clangd#1712.
razmser pushed a commit to SuduIDE/llvm-project that referenced this issue Oct 2, 2023
We seem to be hitting limits in some windows build bots, see
clangd/clangd#1712 (comment).

So bumping the timeouts to 60 seconds and completely dropping them for sync
requests. As mentioned in the comment above, this should improve things,
considering even the tests that don't touch any complicated scheduler is
failing.

Differential Revision: https://reviews.llvm.org/D158426
razmser pushed a commit to SuduIDE/llvm-project that referenced this issue Oct 2, 2023
We seem to be hitting limits in some windows build bots, see
clangd/clangd#1712 (comment).

So bumping the timeouts to 60 seconds and completely dropping them for sync
requests. As mentioned in the comment above, this should improve things,
considering even the tests that don't touch any complicated scheduler is
failing.

Differential Revision: https://reviews.llvm.org/D158426
razmser pushed a commit to SuduIDE/llvm-project that referenced this issue Oct 2, 2023
We seem to be hitting limits in some windows build bots, see
clangd/clangd#1712 (comment).

So bumping the timeouts to 60 seconds and completely dropping them for sync
requests. As mentioned in the comment above, this should improve things,
considering even the tests that don't touch any complicated scheduler is
failing.

Differential Revision: https://reviews.llvm.org/D158426
razmser pushed a commit to SuduIDE/llvm-project that referenced this issue Oct 3, 2023
We seem to be hitting limits in some windows build bots, see
clangd/clangd#1712 (comment).

So bumping the timeouts to 60 seconds and completely dropping them for sync
requests. As mentioned in the comment above, this should improve things,
considering even the tests that don't touch any complicated scheduler is
failing.

Differential Revision: https://reviews.llvm.org/D158426
razmser pushed a commit to SuduIDE/llvm-project that referenced this issue Oct 3, 2023
We seem to be hitting limits in some windows build bots, see
clangd/clangd#1712 (comment).

So bumping the timeouts to 60 seconds and completely dropping them for sync
requests. As mentioned in the comment above, this should improve things,
considering even the tests that don't touch any complicated scheduler is
failing.

Differential Revision: https://reviews.llvm.org/D158426
razmser pushed a commit to SuduIDE/llvm-project that referenced this issue Oct 6, 2023
We seem to be hitting limits in some windows build bots, see
clangd/clangd#1712 (comment).

So bumping the timeouts to 60 seconds and completely dropping them for sync
requests. As mentioned in the comment above, this should improve things,
considering even the tests that don't touch any complicated scheduler is
failing.

Differential Revision: https://reviews.llvm.org/D158426
razmser pushed a commit to SuduIDE/llvm-project that referenced this issue Oct 11, 2023
We seem to be hitting limits in some windows build bots, see
clangd/clangd#1712 (comment).

So bumping the timeouts to 60 seconds and completely dropping them for sync
requests. As mentioned in the comment above, this should improve things,
considering even the tests that don't touch any complicated scheduler is
failing.

Differential Revision: https://reviews.llvm.org/D158426
qihangkong pushed a commit to rvgpu/llvm that referenced this issue Apr 18, 2024
We started seeing a lot of timeouts that align with the change in lit to
execute gtests in shards. The logic there assumes tests are
single-threaded, which is the case for most of the LLVM, hence they
pick #shards ~ #cores (by slightly overshooting).

There are enough unittests in clangd that rely on multi-threading, they
can create arbitrarily many threads but we limit amount of meaningful
work to ~4 thread per process.

This change ensures that we're accounting for that paralelism when
executing clangd tests and not overloading test executors.

In theory the change overestimates the requirements, not all tests are
multi-threaded, but it doesn't seem to be resulting in any regressions
on my local runs.

Fixes llvm/llvm-project#64964.
Fixes clangd/clangd#1712.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants