Tests that include `t.Parallel()` causes concurrency issues on Windows test runner #809

smoyer64 · 2022-06-01T14:20:27Z

As discussed in #803, including a test that calls t.Parallel() and uses a temporary directory causes unpredictable failures on the Windows build server. Rerunning a failed check often succeeds but failures report that many different tests randomly fail. This is a pretty clear sign that there's a concurrency issue. Here's the code that originally caused the problem:

func TestGoGitRepo_Indexes(t *testing.T) {
	t.Parallel()

	plainRoot, err := ioutil.TempDir("", "")
	require.NoError(t, err)
	t.Cleanup(func() {
		require.NoError(t, os.RemoveAll(plainRoot))
	})

	repo, err := InitGoGitRepo(plainRoot, namespace)
	require.NoError(t, err)
	t.Cleanup(func() {
		require.NoError(t, repo.Close())
	})

	// Can create indices
	indexA, err := repo.GetBleveIndex("a")
	require.NoError(t, err)
	require.NotZero(t, indexA)
	require.FileExists(t, filepath.Join(plainRoot, ".git", namespace, "indexes", "a", "index_meta.json"))
	require.FileExists(t, filepath.Join(plainRoot, ".git", namespace, "indexes", "a", "store"))

	indexB, err := repo.GetBleveIndex("b")
	require.NoError(t, err)
	require.NotZero(t, indexB)

	// Can get an existing index
	indexA, err = repo.GetBleveIndex("a")
	require.NoError(t, err)
	require.NotZero(t, indexA)

	// Can delete an index
	err = repo.ClearBleveIndex("a")
	require.NoError(t, err)
	require.NoDirExists(t, filepath.Join(plainRoot, ".git", namespace, "indexes", "a"))
}

Linux and MacOS filesystems have a different behavior than Windows - if you delete a file that's in use, it will remove the directory entry but the inodes won't be deleted until all the related file handles have been closed. Changing the code above to the following makes the tests fail less frequently:

func TestGoGitRepo_Indexes(t *testing.T) {
	t.Parallel()

	plainRoot := t.TempDir()

	repo, err := InitGoGitRepo(plainRoot, namespace)
	require.NoError(t, err)
	t.Cleanup(func() {
		require.NoError(t, repo.Close())
	})

	// Can create indices
	indexA, err := repo.GetBleveIndex("a")
	require.NoError(t, err)
	require.NotZero(t, indexA)
	require.FileExists(t, filepath.Join(plainRoot, ".git", namespace, "indexes", "a", "index_meta.json"))
	require.FileExists(t, filepath.Join(plainRoot, ".git", namespace, "indexes", "a", "store"))

	indexB, err := repo.GetBleveIndex("b")
	require.NoError(t, err)
	require.NotZero(t, indexB)

	// Can get an existing index
	indexA, err = repo.GetBleveIndex("a")
	require.NoError(t, err)
	require.NotZero(t, indexA)

	// Can delete an index
	err = repo.ClearBleveIndex("a")
	require.NoError(t, err)
	require.NoDirExists(t, filepath.Join(plainRoot, ".git", namespace, "indexes", "a"))
}

Using t.TempDir() to create each test's temporary directory also registers an alternate cleanup function that has a different behavior from os.RemoveAll(). This function includes the following ToB comment (see https://github.com/golang/go/blob/8a56c7742d96c8ef8e8dcecaf3d1c0e9f022f708/src/testing/testing.go#L1115):

// removeAll is like os.RemoveAll, but retries Windows "Access is denied."
// errors up to an arbitrary timeout.
//
// Those errors have been known to occur spuriously on at least the
// windows-amd64-2012 builder (https://go.dev/issue/50051), and can only occur
// legitimately if the test leaves behind a temp file that either is still open
// or the test otherwise lacks permission to delete. In the case of legitimate
// failures, a failing test may take a bit longer to fail, but once the test is
// fixed the extra latency will go away.

It's possible that universally using t.TempDir() within git-bug will eliminate this issue. It's going to be a bit hard to be sure since the only way to tell is to run the tests repeatedly on Github. There are only three additional occurrences of os.TempDir() in the code as most of the tests rely on CreateGoGitTestRepo for their test repository instance. I've been trying different scenarios in #804 and will make this change next.

The text was updated successfully, but these errors were encountered:

smoyer64 · 2022-06-02T10:56:00Z

Here's an instance of a test repo that's created but missing the code to properly close it after the test is complete - this might be contributing to the above described problem:

git-bug/cache/repo_cache_test.go

Line 223 in 96327c3

repo := repository.CreateGoGitTestRepo(false)

smoyer64 · 2022-06-02T11:00:15Z

This test utility function creates three repo without closing them:

git-bug/entity/dag/common_test.go

Line 106 in 96327c3

repoA := repository.CreateGoGitTestRepo(false)

smoyer64 · 2022-06-04T13:01:29Z

I've got a very crufty branch I've been using for testing that includes 30-40 tests which include t.Parallel(). With these changes, the Windows build failed almost every time.

I then refactored the entire test suite to exclusively use t.TempDir() (with its built-in OS friendly version of removeAll()) and t.Cleanup(). After these changes, tests on Windows builds seem to be failing about once in 30 runs.

I've created #810, to describe a couple (final?) places we could end up with blocking behavior and will tackle those before returning to this issue.

MichaelMure · 2022-06-06T11:17:07Z

Here's an instance of a test repo that's created but missing the code to properly close it after the test is complete - this might be contributing to the above described problem:

git-bug/cache/repo_cache_test.go

Line 223 in 96327c3

repo := repository.CreateGoGitTestRepo(false)

This looks like a real issue.

This test utility function creates three repo without closing them:

git-bug/entity/dag/common_test.go

Line 106 in 96327c3

repoA := repository.CreateGoGitTestRepo(false)

I think this one is OK because each time this function is used, defer repository.CleanupTestRepos(repoA, repoB, remote) is called. Admittedly, that might be better to register the removal with t.Cleanup() directly as it's made exactly for that.

That said, looking again at that code, it seems that there is a bunch of case where cache.NewRepoCache(repo) is called, but the cache is never closed after that. As we have seen, that might cause problems on windows especially.

smoyer64 · 2022-06-07T00:44:45Z

Agreed ... on the branch where I've been testing change, I've refactored the tests so that the repositories are always created in the CreateGoGitTestRepo function and which includes the t.Cleanup(). Take a look at https://github.com/MichaelMure/git-bug/blob/affde987ab14507de7edfccb4b4913a5834d0d9b/repository/gogit_testing.go#L41 but please be advised that this branch is a mess because a) it's based off the branch with the failing test and b) I've tried several dozens of ideas including finding the races. Assuming it's okay with you, my plan was to start from master`, and then apply a similar set of refactors.

smoyer64 mentioned this issue Jun 1, 2022

Feat/configurable local storage #803

Merged

smoyer64 mentioned this issue Jun 4, 2022

Race conditions in graphql and bridge/github packages (or package tests) #810

Closed

smoyer64 mentioned this issue Jun 16, 2022

refactor(809): guarantee test cleanup #817

Merged

smoyer64 mentioned this issue Dec 31, 2022

add cleanup sub-command that remove local bugs and identities #933

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tests that include `t.Parallel()` causes concurrency issues on Windows test runner #809

Tests that include `t.Parallel()` causes concurrency issues on Windows test runner #809

smoyer64 commented Jun 1, 2022

smoyer64 commented Jun 2, 2022

smoyer64 commented Jun 2, 2022

smoyer64 commented Jun 4, 2022

MichaelMure commented Jun 6, 2022

smoyer64 commented Jun 7, 2022

Tests that include t.Parallel() causes concurrency issues on Windows test runner #809

Tests that include t.Parallel() causes concurrency issues on Windows test runner #809

Comments

smoyer64 commented Jun 1, 2022

smoyer64 commented Jun 2, 2022

smoyer64 commented Jun 2, 2022

smoyer64 commented Jun 4, 2022

MichaelMure commented Jun 6, 2022

smoyer64 commented Jun 7, 2022

Tests that include `t.Parallel()` causes concurrency issues on Windows test runner #809

Tests that include `t.Parallel()` causes concurrency issues on Windows test runner #809