Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GEODE-10327: Overhaul GfshRule to kill processes and save artifacts #7758

Merged

Conversation

kirklund
Copy link
Contributor

@kirklund kirklund commented Jun 2, 2022

PR needs to be squashed when merged to develop. There are three commits:

  1. The original commit
  2. The fixes that the 1st revert was reverted for (this is what was missing in the 2nd PR)
  3. Make the jdk17 tests gating (contributed by Owen)

If you already reviewed one or both of the first PRs that were reverted, then you can focus on commits 2-3. The most important changes that should be present and reviewed:

  • GfshRule should have a default constructor that uses FolderFactory
  • FolderRule should no longer use try-finally blocks or MultipleFailureException
  • Folder.delete() should log an error message instead of throwing IOException or UncheckedIOException
  • ResourceUtils methods for copying resources should all flow into the private method ResourceUtils.copyResourceToFile(URL, File) which should close both the InputStream and the OutputStream. It does this using a try-with-resources block.

GEODE-10327: Overhaul GfshRule to kill processes and save artifacts (#7731)

PROBLEM

Tests that use GfshRule leave behind orphaned processes and do not save
artifacts for debugging failures.

SOLUTION

GfshRule needs to cleanup all processes it forks. It also needs to save
off all runtime artifacts such as logging, stats, pid files, diskstores
to enable debugging of test failures.

DETAILS

Enhance GfshRule and modify all tests using it for proper debugging and
to prevent test pollution.

Overhaul of GfshRule:

  • kill ALL geode processes during cleanup
  • use FolderRule to ensure all logs and files are properly saved off
    when a test fails
  • extract GfshExecutor from JUnit rule code
  • GfshExecutor allows a test to use any number of Geode versions with
    just one GfshRule
  • add Gfsh log level support for easier debugging
  • add support for new VmConfiguration to allow control over Geode and
    Java versions
  • overhaul API of GfshRule and companion classes for better consistency
    and design

New FolderRule:

  • replaces TemporaryFolder and saves off all content when a test fails
  • creates root directory under the gradle worker instead of under temp

Update HTTP session caching module tests:

  • use new FolderRule to save all artifacts when a test fails
  • use nio Paths for filesystem variables

Update acceptance and upgrade tests that use GfshRule:

  • use new improved GfshRule and GfshExecutor
  • use new FolderRule instead of TemporaryFolder to save all artifacts
    when a test fails
  • use --disable-default-server in tests with no clients
  • fix flakiness of many tests by using random ports instead of default
    or hardcoded port values
  • reformat GfshRule API usage in tests to improve readability and
    consistency
  • add GfshStopper to provide common place to await process stop (stop
    locator/server is async so restarting with same ports is very prone
    to hitting BindExceptions)

Update ProcessUtils:

  • extract NativeProcessUtils and make it public for direct use
  • rename InternalProcessUtils as ProcessUtilsProvider and move to its
    own class
  • rethrow IOExceptions as UncheckedIOExceptions
  • fix flakiness in NativeProcessUtilsTest by moving findAvailablePid
    into test method

Minor changes:

  • improve code formatting and readability
  • convert from old io File to nio Path APIs as much as possible
  • close output streams to fix filesystem issues on Windows

Fixes flaky test tickets:

  • DeployJarAcceptanceTest GEODE-9615
  • possibly other tests that uses GfshRule

Changes for resubmit:

  • log error message if unable to delete folder

NOTES

The jdk8, jdk17 and windows labels were used to run tests on more
environments.

This PR contains mostly test and framework changes. The only product
code altered is ServerLauncher and several classes in
org.apache.geode.internal.process, all of which is in geode-core.

…pache#7731)

PROBLEM

Tests that use GfshRule leave behind orphaned processes and do not save
artifacts for debugging failures.

SOLUTION

GfshRule needs to cleanup all processes it forks. It also needs to save
off all runtime artifacts such as logging, stats, pid files, diskstores
to enable debugging of test failures.

DETAILS

Enhance GfshRule and modify all tests using it for proper debugging and
to prevent test pollution.

Overhaul of GfshRule:

* kill ALL geode processes during cleanup
* use FolderRule to ensure all logs and files are properly saved off
when a test fails
* extract GfshExecutor from JUnit rule code
* GfshExecutor allows a test to use any number of Geode versions with
just one GfshRule
* add Gfsh log level support for easier debugging
* add support for new VmConfiguration to allow control over Geode and
Java versions
* overhaul API of GfshRule and companion classes for better consistency
and design

New FolderRule:

* replaces TemporaryFolder and saves off all content when a test fails
* creates root directory under the gradle worker instead of under temp

Update HTTP session caching module tests:

* use new FolderRule to save all artifacts when a test fails
* use nio Paths for filesystem variables

Update acceptance and upgrade tests that use GfshRule:

* use new improved GfshRule and GfshExecutor
* use new FolderRule instead of TemporaryFolder to save all artifacts
when a test fails
* use --disable-default-server in tests with no clients
* fix flakiness of many tests by using random ports instead of default
or hardcoded port values
* reformat GfshRule API usage in tests to improve readability and
consistency
* add GfshStopper to provide common place to await process stop (stop
locator/server is async so restarting with same ports is very prone
to hitting BindExceptions)

Update ProcessUtils:

* extract NativeProcessUtils and make it public for direct use
* rename InternalProcessUtils as ProcessUtilsProvider and move to its
own class
* rethrow IOExceptions as UncheckedIOExceptions
* fix flakiness in NativeProcessUtilsTest by moving findAvailablePid
into test method

Minor changes:

* improve code formatting and readability
* convert from old io File to nio Path APIs as much as possible
* close output streams to fix filesystem issues on Windows

Fixes flaky test tickets:

* DeployJarAcceptanceTest GEODE-9615
* possibly other tests that uses GfshRule

Changes for resubmit:

* log error message if unable to delete folder

NOTES

The jdk8, jdk17 and windows labels were used to run tests on more
environments.

This PR contains mostly test and framework changes. The only product
code altered is ServerLauncher and several classes in
org.apache.geode.internal.process, all of which is in geode-core.
* Restore missing default constructor to GfshRule
* Ensure IO streams have proper error handling and don't leak
* Polish error handling in GfshRule and FolderRule
Copy link
Contributor

@onichols-pivotal onichols-pivotal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opting-in to automated commit message feedback via COMMITWATCHERS. https://cwiki.apache.org/confluence/display/GEODE/Commit+Message+Format suggests that your first commit summary should be 50 characters or less (not including bug# and pr#) to avoid truncation in github and other tools (yours is 62 characters...longer is ok and just check that you've prioritized the most salient words before the 47-character mark).

These suggestions are totally optional. What matters is including enough detail for future contributors to understand what you intended and why.

@kirklund kirklund added windows add this label to get Windows JDK11 PR checks too jdk8 add this label to get Linux JDK8 PR checks too jdk17 add this label to get Linux JDK17 PR checks too windows-jdk17 add this label to get Windows JDK17 PR checks too windows-jdk8 add this label to get Windows JDK8 PR checks too labels Jun 2, 2022
@kirklund
Copy link
Contributor Author

kirklund commented Jun 2, 2022

windows-core-integration-test-openjdk8 failed. It's unrelated to the PR (test doesn't even use GfshRule). It appears to be flaky and needs a ticket filed against it.

ConnectionManagerJUnitTest > testExclusiveConnectionAccess FAILED

@kirklund
Copy link
Contributor Author

kirklund commented Jun 2, 2022

All tests passed in acceptance-test-openjdk17 and upgrade-test-openjdk8. These jobs hit a concourse failure:

find or create container on worker concourse-worker-4: volume not found

@kirklund kirklund merged commit 495f3b0 into apache:develop Jun 2, 2022
kirklund added a commit that referenced this pull request Jun 2, 2022
…7758)

PROBLEM

Tests that use GfshRule leave behind orphaned processes and do not save
artifacts for debugging failures.

SOLUTION

GfshRule needs to cleanup all processes it forks. It also needs to save
off all runtime artifacts such as logging, stats, pid files, diskstores
to enable debugging of test failures.

DETAILS

Enhance GfshRule and modify all tests using it for proper debugging and
to prevent test pollution.

Overhaul of GfshRule:

* kill ALL geode processes during cleanup
* use FolderRule to ensure all logs and files are properly saved off
when a test fails
* extract GfshExecutor from JUnit rule code
* GfshExecutor allows a test to use any number of Geode versions with
just one GfshRule
* add Gfsh log level support for easier debugging
* add support for new VmConfiguration to allow control over Geode and
Java versions
* overhaul API of GfshRule and companion classes for better consistency
and design

New FolderRule:

* replaces TemporaryFolder and saves off all content when a test fails
* creates root directory under the gradle worker instead of under temp

Update HTTP session caching module tests:

* use new FolderRule to save all artifacts when a test fails
* use nio Paths for filesystem variables

Update acceptance and upgrade tests that use GfshRule:

* use new improved GfshRule and GfshExecutor
* use new FolderRule instead of TemporaryFolder to save all artifacts
when a test fails
* use --disable-default-server in tests with no clients
* fix flakiness of many tests by using random ports instead of default
or hardcoded port values
* reformat GfshRule API usage in tests to improve readability and
consistency
* add GfshStopper to provide common place to await process stop (stop
locator/server is async so restarting with same ports is very prone
to hitting BindExceptions)

Update ProcessUtils:

* extract NativeProcessUtils and make it public for direct use
* rename InternalProcessUtils as ProcessUtilsProvider and move to its
own class
* rethrow IOExceptions as UncheckedIOExceptions
* fix flakiness in NativeProcessUtilsTest by moving findAvailablePid
into test method

Minor changes:

* improve code formatting and readability
* convert from old io File to nio Path APIs as much as possible
* close output streams to fix filesystem issues on Windows

Fixes flaky test tickets:

* DeployJarAcceptanceTest GEODE-9615
* possibly other tests that uses GfshRule

Changes for resubmit:

* log error message if unable to delete folder
* keep default constructor on GfshRule
* ensure IO streams have proper error handling and don't cause failures
on windows

Changes to build pipelines:

* make jdk17 tests gating

NOTES

The labels jdk8, jdk17, windows, windows-jdk8 and windows-jdk17 were
used to run tests on more environments.

This PR contains mostly test and framework changes. The only product
code altered is ServerLauncher and several classes in
org.apache.geode.internal.process, all of which is in geode-core.

(cherry picked from commit 495f3b0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jdk8 add this label to get Linux JDK8 PR checks too jdk17 add this label to get Linux JDK17 PR checks too windows add this label to get Windows JDK11 PR checks too windows-jdk8 add this label to get Windows JDK8 PR checks too windows-jdk17 add this label to get Windows JDK17 PR checks too
Projects
None yet
10 participants