-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unit_hipDeviceSynchronize_Functional can cause system hangs #112
Comments
I've added it to the exclude-tests regexp in PR #113. The test is problematic for multiple reasons, but we can only partially fix it. First problem that's actually solvable is that it uses a simple for loop + assignment, which Clang can optimize away based on -O flag (that's why i got different results than Henry on the same machine). On that machine, the test causes the GUI to "freeze" but it eventually recovers. If the test causes your laptop to freeze and not recover, i think that problem is out of scope for CHIP-SPV and is a linux kernel / driver issue. BTW Intel's oneAPI docs recommend to disable GPU hangcheck for long-running tasks, so it seems they're aware of the "freezing". |
What do I do to make this test fail? It passes for me after 200s
…On Wed, Aug 17, 2022, 17:41 Michal Babej ***@***.***> wrote:
I've added it to the exclude-tests regexp in PR #113
<#113>.
The test is problematic for multiple reasons, but we can only partially
fix it. First problem that's actually solvable is that it uses a simple for
loop + assignment, which Clang can optimize away based on -O flag (that's
why i got different results than Henry on the same machine). On that
machine, the test causes the GUI to "freeze" but it eventually recovers. If
the test causes your laptop to freeze and *not* recover, i think that
problem is out of scope for CHIP-SPV and is a linux kernel / driver issue.
BTW Intel's oneAPI docs recommend to disable GPU hangcheck
<https://www.intel.com/content/www/us/en/develop/documentation/get-started-with-intel-oneapi-hpc-linux/top/before-you-begin.html>
for long-running tasks, so it seems they're aware of the "freezing".
—
Reply to this email directly, view it on GitHub
<#112 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACCJBQKS7L5KCXU7PSFGUODVZT2YLANCNFSM56VZDXNQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
I reduced the runtime further on |
@franz I waited 10 minutes at most, it could recover of course eventually :) Yep, sounds like a driver-side issue. Let's just exclude the test for now as it might hit end users and they think it's a CHIP-SPV issue. |
Is it closed because it's no longer hanging after the reduced problem size?
…On Thu, Aug 18, 2022, 10:54 Pekka Jääskeläinen ***@***.***> wrote:
Closed #112 <#112> as
completed.
—
Reply to this email directly, view it on GitHub
<#112 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACCJBQIZ7JH56EV57I3U3WDVZXT2FANCNFSM56VZDXNQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
The test tests long running kernels and causes hangs (likely kernel mode busy loops) with a shorter duration to some, longer to some. In this laptop I once waited for 10 minutes for the Linux to wake up before hard power off. I added it to the flaky_tests file for now in #111.
The text was updated successfully, but these errors were encountered: