Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

attach_test and attach_blocking timing out on Jenkins #5740

Closed
abhinav92003 opened this issue Nov 17, 2022 · 3 comments · Fixed by #6559
Closed

attach_test and attach_blocking timing out on Jenkins #5740

abhinav92003 opened this issue Nov 17, 2022 · 3 comments · Fixed by #6559

Comments

@abhinav92003
Copy link
Contributor

Happened multiple times on retrying: http://139.178.82.61:8080/job/DynamoRIO-AArch64-Precommit/2360/console.

Seems to have started after the recent AArch64 hardware move: https://groups.google.com/g/dynamorio-devs/c/lSaVAoryR1E/m/xTylpzXvDAAJ?utm_medium=email&utm_source=footer.

abhinav92003 added a commit that referenced this issue Nov 17, 2022
Adds the failing attach_test and attach_blocking tests to the ignore
list for AArch64. This is a temporary measure to unblock PRs.

Issue: #5740
abhinav92003 added a commit that referenced this issue Nov 17, 2022
Adds the failing attach_test and attach_blocking tests to the ignore list for AArch64.

Also ignore the flaky invariant_checker test failures.

This is a temporary measure to unblock PRs.

Issue: #5740, #5724
@derekbruening
Copy link
Contributor

Xref attach and detach test failures but with different symptoms on x86-64: #6452, #6536.

@derekbruening
Copy link
Contributor

Is this #6558?

@derekbruening
Copy link
Contributor

Confirmed: this is #6558. It works if I enable ptrace_scope. And that explains why the failures started on a new machine.

derekbruening added a commit that referenced this issue Jan 12, 2024
Now that we've enabled ptrace privileges on the a64 testing machine we
can remove the attach/detach tests from the flaky list as they no
longer hang.  The attach test passed 100x in a row for me so it
doesn't seem to be hitting flakes seen on other platforms like #6452.

Issue: #5740, #6558, #6127
Fixes #5740
derekbruening added a commit that referenced this issue Jan 12, 2024
Now that we've enabled ptrace privileges on the a64 testing machine we
can remove the attach/detach tests from the flaky list as they no longer
hang. The attach_test, attach_blocking, and deatch_test each passed 200x
in a row on this machine so it doesn't seem to be hitting flakes seen on
other platforms like #6452.

Issue: #5740, #6558, #6127
Fixes #5740
derekbruening added a commit that referenced this issue Jan 12, 2024
Before, we relied on drrun -s for all test suite timeouts except for
runcmp tests where we set a CTest timeout.  This resulted in the
default 10 minute CTest timeout for all tests, which was the only
timeout for runall tests and caused long suite times on the AArch64
machine which accidentally had no ptrace privileges (#5740, #6558,

Here, we set the CTest time for runall in addition to runcmp, and for
all other tests with no timeout specified (which are presumably
relying on drrun -s) we set a timeout of the drrun timeout plus 30
seconds.

Tested on the attach test:

Before:
```
123: Test timeout computed to be: 1500
```
Now:
```
$ echo 1 | sudo tee /proc/sys/kernel/yama/ptrace_scope; /usr/bin/time ctest -V -R client.attach_test; echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
...
    Start 121: code_api|client.attach_test
...
121: Test timeout computed to be: 90
1/1 Test #121: code_api|client.attach_test ......***Timeout  90.11 sec
...
The following tests FAILED:
	121 - code_api|client.attach_test (Timeout)
...
Command exited with non-zero status 8
1.13user 0.80system 1:30.14elapsed 2%CPU (0avgtext+0avgdata 13196maxresident)k
```

Fixes #6127
Issue: #6127, #6558, #5740
derekbruening added a commit that referenced this issue Jan 16, 2024
Before, we relied on drrun -s for all test suite timeouts except for
runcmp tests where we set a CTest timeout. This resulted in the default
10 minute CTest timeout for all tests, which was the only timeout for
runall tests and caused long suite times on the AArch64 machine which
accidentally had no ptrace privileges (#5740, #6558,

Here, we set the CTest time for runall in addition to runcmp, and for
all other tests with no timeout specified (which are presumably relying
on drrun -s) we set a timeout of the drrun timeout plus 30 seconds.

Tested on the attach test:

Before:
```
123: Test timeout computed to be: 1500
```
Now:
```
$ echo 1 | sudo tee /proc/sys/kernel/yama/ptrace_scope; /usr/bin/time ctest -V -R client.attach_test; echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
...
    Start 121: code_api|client.attach_test
...
121: Test timeout computed to be: 90
1/1 Test #121: code_api|client.attach_test ......***Timeout  90.11 sec
...
The following tests FAILED:
	121 - code_api|client.attach_test (Timeout)
...
Command exited with non-zero status 8
1.13user 0.80system 1:30.14elapsed 2%CPU (0avgtext+0avgdata 13196maxresident)k
```

The property being set on more tests was confirmed on debug x86-64:
Before:
```
$ grep -c TIMEOUT suite/tests/CTestTestfile.cmake
94
```
After:
```
$ grep -c TIMEOUT suite/tests/CTestTestfile.cmake
463
```
There seem to still be a few missing the property: the ones that don't
go through suite/. There were other efforts to avoid hangs on those such
as PR #6137.

Issue: #6127, #6558, #5740
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants