EPIC: Testing environment checks #45

llxia · 2020-03-11T19:48:56Z

TKG should do general testing env checks before and/or after running the test. Below are something we could take a look to see if we should implement:

diskspace (fail if we do not have enough diskspace)
memory (fail if we do not have enough memory)
core (fail if there is core generated after a test run regardless of the test result)
list existing running processes before and after the test (fail if there is any dangling process?), this includes docker images leftover from aborted runs, those are not expected to be present and should get pruned
a flag to enable/disable this test environment health checks

pshipton · 2020-03-19T21:41:05Z

Note that dumps for an OutOfMemoryError doesn't necessarily indicate a problem. The test could have intentionally caused it and caught the exception. OpenJ9 tests that do this tend to run with dumps turned off to avoid the overhead.

smlambert · 2020-03-19T22:40:36Z

In the case of those tests, they should delete what they create as part of the test. What I believe we will do is check at the very end of all tests running if there are cores that remain as we zip up the test artifact, it will be considered a failure. We may find when we first enable this that some tests need to be updated to clean up after themselves.

smlambert · 2020-03-20T02:16:59Z

We can/should add one more type of check, whether the machine has the test prereqs installed, if not, fail and clearly list what is missing (or bad version).

smlambert · 2020-06-24T19:33:39Z

Related: adoptium/infrastructure#1410

It is becoming clear that this feature/issue needs to be addressed as soon as possible, as it will remove some of the random failures we are seeing in nightly pipelines, leaving more time for triagers to focus on real issues rather than tracking and chasing environment issues.

I recognize that if we were running in environments where we spin up on-demand machines each time some of this would not be needed, but given we will likely also always need to support running on static machines, we need to try to clean the slate or at least know the state of the slate each time.

Prioritizing the environment checks, I would put the check for running processes and the prune of docker images to be higher priority than other checks, as those will completely block the next set of tests from running successfully on a test machine.

smlambert · 2020-07-14T13:02:54Z

Related: adoptium/aqa-tests#1887

smlambert · 2020-11-18T17:27:01Z

We will temporarily land a change to maketest.sh, while awaiting a WIP fix for this issue. Once we have @nikolamilijevic1 PR merged into TKG, we can remove adoptium/aqa-tests#2059 once we are sure we address dangling processes in TKG.

smlambert · 2021-06-29T21:27:01Z

@llxia @renfeiw @sophia-guo - I have turned this into an EPIC that can be broken down into a set of smaller tasks (perhaps by the checklist in the description, or even more granular), what do you think?

llxia · 2022-06-27T18:49:50Z

Related to the env check, the test framework should also check micro-architectures and execute/skip tests accordingly. Frome example, some VectorAPI tests only run on Z13/Z14 or newer, CRIU portability tests run on skylake, etc.

Related issue:
runtimes/openj9-jit-z/issues/711
runtimes/infrastructure/issues/7037

renfeiw · 2022-09-07T16:02:55Z

Added microarch check for skylake: #346 and docker info check: #351

smlambert · 2022-09-15T17:39:17Z

Related: adoptium/infrastructure#2745
being able to see when the last time a machine was updated.

llxia · 2024-05-09T16:30:24Z

Add check for bash --version #547

karianna added the bug Something isn't working label Mar 15, 2020

smlambert mentioned this issue Mar 19, 2020

Crash jitting Method_being_compiled=java/util/stream/ReferencePipeline.anyMatch(Ljava/util/function/Predicate;)Z in JTReg Test: java_lang_invoke_VarHandles_VarHandleTestAccessString.java eclipse-openj9/openj9#8870

Open

smlambert mentioned this issue Mar 27, 2020

Tests leave processes on ppcle machines adoptium/aqa-tests#1071

Closed

This was referenced Jun 15, 2020

Fail fast on missing curl, don't retry adoptium/aqa-tests#1830

Closed

test-scaleway-ubuntu1604-x64-1 out of disk space adoptium/infrastructure#1403

Closed

This was referenced Jun 23, 2020

Inconsistent run times of sanity.openjdk on xLinux adoptium/infrastructure#1165

Closed

AQAvit Meeting June 24, 2020 adoptium/aqa-tests#1844

Closed

smlambert mentioned this issue Jun 30, 2020

openliberty_microprofile_tck - container name openliberty-mp-tck-test already in use adoptium/aqa-tests#1856

Closed

smlambert mentioned this issue Jul 15, 2020

dacapo-lusearch-fix_0 fails intermittently adoptium/aqa-tests#1888

Open

smlambert mentioned this issue Jul 22, 2020

AQAvit Meeting July 22, 2020 adoptium/aqa-tests#1908

Closed

Willsparker mentioned this issue Sep 11, 2020

Track corrupt workspaces on Windows Machines adoptium/infrastructure#1396

Closed

This was referenced Oct 6, 2020

RMI tests without destroyed properly and hang on machines adoptium/aqa-tests#819

Closed

test-aws-win2019-x64-2 OutOfMemory adoptium/infrastructure#1602

Closed

smlambert mentioned this issue Nov 16, 2020

System unavailable: test-azure-win2012r2-x64-1 cannot clean workspace adoptium/infrastructure#1669

Closed

This was referenced Nov 18, 2020

maketest.sh: Add process cleanup for Windows at the end of testing adoptium/aqa-tests#2059

Merged

Jenkins continues regardless of the test compilation and test execution results adoptium/aqa-tests#2068

Closed

smlambert mentioned this issue Jan 23, 2021

Should we specify a limiting max heap size on test jobs to control things like max dump sizes ? adoptium/aqa-tests#2195

Open

llxia added enhancement New feature or request and removed bug Something isn't working labels Jan 23, 2021

smlambert added the Epic label Jun 29, 2021

llxia mentioned this issue Jul 11, 2022

Support machine info detection in TKG #336

Closed

smlambert mentioned this issue Feb 28, 2023

More detailed and specific machine info detection in TKG #414

Open

smlambert changed the title ~~Testing environment checks~~ EPIC: Testing environment checks Mar 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EPIC: Testing environment checks #45

EPIC: Testing environment checks #45

llxia commented Mar 11, 2020 •

edited

Loading

pshipton commented Mar 19, 2020 •

edited

Loading

smlambert commented Mar 19, 2020

smlambert commented Mar 20, 2020

smlambert commented Jun 24, 2020

smlambert commented Jul 14, 2020

smlambert commented Nov 18, 2020

smlambert commented Jun 29, 2021

llxia commented Jun 27, 2022

renfeiw commented Sep 7, 2022

smlambert commented Sep 15, 2022

llxia commented May 9, 2024

EPIC: Testing environment checks #45

EPIC: Testing environment checks #45

Comments

llxia commented Mar 11, 2020 • edited Loading

pshipton commented Mar 19, 2020 • edited Loading

smlambert commented Mar 19, 2020

smlambert commented Mar 20, 2020

smlambert commented Jun 24, 2020

smlambert commented Jul 14, 2020

smlambert commented Nov 18, 2020

smlambert commented Jun 29, 2021

llxia commented Jun 27, 2022

renfeiw commented Sep 7, 2022

smlambert commented Sep 15, 2022

llxia commented May 9, 2024

llxia commented Mar 11, 2020 •

edited

Loading

pshipton commented Mar 19, 2020 •

edited

Loading