Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EPIC: Testing environment checks #45

Open
2 of 5 tasks
llxia opened this issue Mar 11, 2020 · 11 comments
Open
2 of 5 tasks

EPIC: Testing environment checks #45

llxia opened this issue Mar 11, 2020 · 11 comments
Labels
enhancement New feature or request Epic
Projects

Comments

@llxia
Copy link
Contributor

llxia commented Mar 11, 2020

TKG should do general testing env checks before and/or after running the test. Below are something we could take a look to see if we should implement:

  • diskspace (fail if we do not have enough diskspace)

  • memory (fail if we do not have enough memory)

  • core (fail if there is core generated after a test run regardless of the test result)

  • list existing running processes before and after the test (fail if there is any dangling process?), this includes docker images leftover from aborted runs, those are not expected to be present and should get pruned

  • a flag to enable/disable this test environment health checks

@pshipton
Copy link
Contributor

pshipton commented Mar 19, 2020

Note that dumps for an OutOfMemoryError doesn't necessarily indicate a problem. The test could have intentionally caused it and caught the exception. OpenJ9 tests that do this tend to run with dumps turned off to avoid the overhead.

@smlambert
Copy link
Contributor

In the case of those tests, they should delete what they create as part of the test. What I believe we will do is check at the very end of all tests running if there are cores that remain as we zip up the test artifact, it will be considered a failure. We may find when we first enable this that some tests need to be updated to clean up after themselves.

@smlambert
Copy link
Contributor

We can/should add one more type of check, whether the machine has the test prereqs installed, if not, fail and clearly list what is missing (or bad version).

@smlambert
Copy link
Contributor

Related: adoptium/infrastructure#1410

It is becoming clear that this feature/issue needs to be addressed as soon as possible, as it will remove some of the random failures we are seeing in nightly pipelines, leaving more time for triagers to focus on real issues rather than tracking and chasing environment issues.

I recognize that if we were running in environments where we spin up on-demand machines each time some of this would not be needed, but given we will likely also always need to support running on static machines, we need to try to clean the slate or at least know the state of the slate each time.

Prioritizing the environment checks, I would put the check for running processes and the prune of docker images to be higher priority than other checks, as those will completely block the next set of tests from running successfully on a test machine.

@smlambert
Copy link
Contributor

Related: adoptium/aqa-tests#1887

@smlambert
Copy link
Contributor

We will temporarily land a change to maketest.sh, while awaiting a WIP fix for this issue. Once we have @nikolamilijevic1 PR merged into TKG, we can remove adoptium/aqa-tests#2059 once we are sure we address dangling processes in TKG.

@smlambert
Copy link
Contributor

@llxia @renfeiw @sophia-guo - I have turned this into an EPIC that can be broken down into a set of smaller tasks (perhaps by the checklist in the description, or even more granular), what do you think?

@llxia
Copy link
Contributor Author

llxia commented Jun 27, 2022

Related to the env check, the test framework should also check micro-architectures and execute/skip tests accordingly. Frome example, some VectorAPI tests only run on Z13/Z14 or newer, CRIU portability tests run on skylake, etc.

Related issue:
runtimes/openj9-jit-z/issues/711
runtimes/infrastructure/issues/7037

@renfeiw
Copy link
Contributor

renfeiw commented Sep 7, 2022

Added microarch check for skylake: #346 and docker info check: #351

@smlambert
Copy link
Contributor

Related: adoptium/infrastructure#2745
being able to see when the last time a machine was updated.

@smlambert smlambert changed the title Testing environment checks EPIC: Testing environment checks Mar 8, 2024
@llxia
Copy link
Contributor Author

llxia commented May 9, 2024

Add check for bash --version #547

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Epic
Projects
No open projects
TKG
  
To do
Development

No branches or pull requests

5 participants