-
-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EPIC: Testing environment checks #45
Comments
Note that dumps for an OutOfMemoryError doesn't necessarily indicate a problem. The test could have intentionally caused it and caught the exception. OpenJ9 tests that do this tend to run with dumps turned off to avoid the overhead. |
In the case of those tests, they should delete what they create as part of the test. What I believe we will do is check at the very end of all tests running if there are cores that remain as we zip up the test artifact, it will be considered a failure. We may find when we first enable this that some tests need to be updated to clean up after themselves. |
We can/should add one more type of check, whether the machine has the test prereqs installed, if not, fail and clearly list what is missing (or bad version). |
Related: adoptium/infrastructure#1410 It is becoming clear that this feature/issue needs to be addressed as soon as possible, as it will remove some of the random failures we are seeing in nightly pipelines, leaving more time for triagers to focus on real issues rather than tracking and chasing environment issues. I recognize that if we were running in environments where we spin up on-demand machines each time some of this would not be needed, but given we will likely also always need to support running on static machines, we need to try to clean the slate or at least know the state of the slate each time. Prioritizing the environment checks, I would put the check for running processes and the prune of docker images to be higher priority than other checks, as those will completely block the next set of tests from running successfully on a test machine. |
Related: adoptium/aqa-tests#1887 |
We will temporarily land a change to maketest.sh, while awaiting a WIP fix for this issue. Once we have @nikolamilijevic1 PR merged into TKG, we can remove adoptium/aqa-tests#2059 once we are sure we address dangling processes in TKG. |
@llxia @renfeiw @sophia-guo - I have turned this into an EPIC that can be broken down into a set of smaller tasks (perhaps by the checklist in the description, or even more granular), what do you think? |
Related to the env check, the test framework should also check micro-architectures and execute/skip tests accordingly. Frome example, some VectorAPI tests only run on Z13/Z14 or newer, CRIU portability tests run on skylake, etc. Related issue: |
Related: adoptium/infrastructure#2745 |
Add check for bash --version #547 |
TKG should do general testing env checks before and/or after running the test. Below are something we could take a look to see if we should implement:
diskspace (fail if we do not have enough diskspace)
memory (fail if we do not have enough memory)
core (fail if there is core generated after a test run regardless of the test result)
list existing running processes before and after the test (fail if there is any dangling process?), this includes docker images leftover from aborted runs, those are not expected to be present and should get pruned
a flag to enable/disable this test environment health checks
The text was updated successfully, but these errors were encountered: