ci: run external e2e and sanity tests on pr on kind#490
Merged
Conversation
9d7175b to
7da907f
Compare
Replace the runtime tdnf-install init container with the prebuilt alpine/socat image and preload it into the kind cluster before applying the patch. The previous approach hit CoreDNS cold-start and double-NAT issues on CI, causing init-socat to hang for >10m. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add --fail-fast to test-sanity Makefile target so CI stops on the first failing spec instead of running through dozens of cascading failures (saves >10 minutes when the driver mis-installs). - Bump kind-setup-vg.sh default VG_SIZE from 100G to 500G. The file is sparse so this does not consume real host disk until written to, but it removes capacity-pressure flakes on the external-e2e suite. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The driver's pod-termination cleanup wipes the kind VG when the daemonset is restarted mid-suite, which surfaces as CreateVolume RPCs returning 0 capacity / 'no devices found'. Disable cleanup.enabled, lvGarbageCollection.enabled, and lvmOrphanCleanup.enabled the same way the sanity suite already does. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
0fb088b to
a6f79bb
Compare
- Switch kind-setup-vg.sh from a tmpfs mount to a sparse file under /var, which kind mounts as a real Docker volume on the host fs. tmpfs was RAM-backed, so parallel ephemeral suites that allocate hundreds of GiB worth of LVs ran the node OOM and crash-looped the driver (probes hit 'connection refused'). - Stop disabling cleanup.lvGarbageCollection and cleanup.lvmOrphanCleanup in external-e2e. Only cleanup.enabled (pod-termination VG wipe) needs to be off. The orphan controllers are what reclaim LVs left behind by flaky DeleteVolume RPCs - without them the VG filled up and the scheduler returned 'node(s) did not have enough free storage'. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
a6f79bb to
e6d11f7
Compare
rnhan
approved these changes
May 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This change add sanity and external-e2e tests to every PR run on kind clusters (no azure access). This can be executed on PRs from forks with no permissions needed on the repo.
AI Summary
This pull request introduces several improvements to the Kind-based integration test workflow and related test infrastructure. The most significant changes are the addition of a robust script to pre-create an LVM volume group on each Kind node, new options to configure Kind clusters for tests, and refactoring of test job definitions to support multiple test suites and configurations. There are also targeted test fixes and simplifications to improve reliability and clarity.
Kind Cluster Setup & LVM Volume Group Management:
hack/kind-setup-vg.shto automate the creation and teardown of a loop-backed LVM volume group on each Kind node, enabling tests that require persistent storage to run in CI without real NVMe hardware.Makefilewithkind-setup-vg,kind-teardown-vg, andkind-e2e-bootstraptargets to leverage the new script for cluster preparation and cleanup.Test Runner Enhancements:
.github/workflows/scripts/run_tests.pyto accept--kind-nodesand--kind-setup-vgarguments, allowing dynamic selection of single/multi-node clusters and optional LVM VG setup. TheKindClusterclass and cluster creation logic were updated accordingly. The script now also returns an exit code for better CI integration. [1] [2] [3] [4] [5] [6] [7]CI Workflow Refactoring:
.github/workflows/test-e2e-pr.ymlto define a matrix of Kind-based tests (E2E, external E2E, and sanity), each with specific arguments and Helm overrides. Jobs are now named dynamically, and Kind cluster creation is explicitly separated. [1] [2] [3] [4]Test Suite Fixes and Improvements:
alpine/socatimage, eliminating the need for unreliable runtime installation in CI. [1] [2]Minor Improvements:
stringsimport intest/sanity/sanity_suite_test.gofor future or existing string manipulation needs.test-sanityMakefile target to add--fail-fastfor quicker feedback on failures.