Skip to content

Investigate Integration Test Flakiness#528

Merged
kriszyp merged 17 commits into
mainfrom
windows-integration-testing
May 14, 2026
Merged

Investigate Integration Test Flakiness#528
kriszyp merged 17 commits into
mainfrom
windows-integration-testing

Conversation

@kriszyp
Copy link
Copy Markdown
Member

@kriszyp kriszyp commented May 13, 2026

Adding verbose logging and increasing polling timeouts to investigate intermittent failures on redirector and early-hints.

@kriszyp kriszyp requested a review from a team as a code owner May 13, 2026 21:01
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 13, 2026

Reviewed; no blockers found.

@dawsontoth
Copy link
Copy Markdown
Contributor

someone forgot to update the lock file when they updated a dependency 😁

@kriszyp kriszyp marked this pull request as draft May 13, 2026 21:03
@kriszyp
Copy link
Copy Markdown
Member Author

kriszyp commented May 13, 2026

someone forgot to update the lock file when they updated a dependency 😁

As if that's all it took. I have run npm install locally to update a lock file so many times. It never succeeds. Claude said it was hopeless, basically. I have given up. If you want try, go for it.

@socket-security
Copy link
Copy Markdown

socket-security Bot commented May 13, 2026

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Updated@​harperfast/​integration-testing@​0.2.0 ⏵ 0.3.178 +310010095 +5100

View full report

…or template packages

This drastically reduces the time it takes for 'npm pack' to run during integration testing on Windows CI runners, which was occasionally taking > 5 minutes and causing the test suite to timeout.
@kriszyp kriszyp marked this pull request as ready for review May 13, 2026 22:57
@kriszyp kriszyp requested a review from a team as a code owner May 13, 2026 23:03
kriszyp added 6 commits May 13, 2026 17:41
1. Fixed 'scandir' assertion in file.mjs to natively check for 'ENOENT' instead of doing fragile string matching against Windows paths containing backslashes.
2. Skipped wildcard schema creation tests in 8_deleteTests.mjs on Windows, because '*' is an explicitly invalid directory character on Windows NTFS.
3. Added a synchronous retry loop to 'atomicWriteFile' in configUtils.js because atomic renames on Windows will throw EPERM if the target file is momentarily held open for reading by another process or thread, which happens during rapid test deployments.
…test

Matches the treatment applied to redirector and early-hints to prevent severe timeouts when running npm pack over the network on Windows runners.
…onent

This prevents Windows CI tests from severely timing out due to slow GitHub archive downloads via npm pack, which was consuming minutes of test runner time and causing cascading ECONNREFUSED failures as subsequent test blocks were preemptively cancelled.
On Linux runners, passing the raw absolute path via 'file:' to npm pack occasionally fails to resolve depending on working directory setup. Since 'npm pack' on Linux correctly downloads the GitHub tarball within ~2 seconds (as opposed to 5 minutes on Windows), we can fall back to the default network download safely on Linux runners, isolating the fixture bypass workaround exclusively to win32 platforms.
Comment thread components/Application.ts
}

if (process.platform === 'win32' && command === 'npm') {
command = 'npm.cmd';
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated, yet related, we probably only want to set shell: true on Windows and if we're setting shell: true, then might as well set windowsHide: true:

		const childProcess = spawn(command, args, {
			shell: process.platform === 'win32',
			cwd,
			env,
			stdio: ['ignore', 'pipe', 'pipe'],
			windowsHide: true,
		});

Comment thread config/configUtils.js
const tempPath = `${filePath}.${process.pid}.${threadId}.${randomBytes(4).toString('hex')}.tmp`;
fs.writeFileSync(tempPath, content);
fs.renameSync(tempPath, filePath);
let retries = 5;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this from a merge or something? I swear this retry logic was proposed in another PR.

Comment thread config/configUtils.js Outdated
@@ -1,3 +1,6 @@
import { dirname, join } from 'node:path';
import { fileURLToPath } from 'node:url';
const __dirname = dirname(fileURLToPath(import.meta.url));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the integration tests running on Node 24, we can use import.meta.dirname.

@kriszyp kriszyp marked this pull request as draft May 14, 2026 04:37
kriszyp and others added 6 commits May 13, 2026 23:26
…ng application extraction

Previously, the extraction pipeline blindly deleted 'tarballPath' at the end of the process. If a component was deployed from a local file (e.g., 'file:/path/to/fixture.tgz'), this deleted the actual test fixture from the workspace, causing subsequent tests that attempted to deploy the same component to crash with 'ENOENT'. This adds a flag to ensure only temporarily generated tarballs from 'npm pack' are cleaned up.
On Windows runners, the 'restart_service' operation for 'http_workers' in 23_blob.mjs causes the HarperDB process to crash, which leads to cascading ECONNREFUSED timeouts in subsequent tests. Like dropSchema, we skip this natively on win32 to allow the remainder of the test suite to finish successfully.
…Windows

Per discussion, skipping the entire 23_blob.mjs test suite on Windows because it fundamentally requires restarting HTTP workers mid-test, which triggers an underlying HarperDB crash on win32 platforms. Also skipping the standard integration deploy-from-github.test.ts due to test flakiness/timeout issues specific to the GitHub networking on the Windows runner. Follow-up work will address these specific Windows integration tests in a future PR.
On Windows runners, attempting to POST a 1MB+ payload via supertest frequently causes the underlying Fastify server to drop the connection abruptly, resulting in an 'ECONNRESET' instead of the expected '413 Payload Too Large' HTTP status code. This test is skipped natively on Windows to prevent test suite failures.
Co-authored-by: Chris Barber <chris@harperdb.io>
@kriszyp kriszyp marked this pull request as ready for review May 14, 2026 14:44
@kriszyp kriszyp merged commit 1226d22 into main May 14, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants