Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests that include t.Parallel() causes concurrency issues on Windows test runner #809

Open
smoyer64 opened this issue Jun 1, 2022 · 5 comments

Comments

@smoyer64
Copy link
Collaborator

smoyer64 commented Jun 1, 2022

As discussed in #803, including a test that calls t.Parallel() and uses a temporary directory causes unpredictable failures on the Windows build server. Rerunning a failed check often succeeds but failures report that many different tests randomly fail. This is a pretty clear sign that there's a concurrency issue. Here's the code that originally caused the problem:

func TestGoGitRepo_Indexes(t *testing.T) {
	t.Parallel()

	plainRoot, err := ioutil.TempDir("", "")
	require.NoError(t, err)
	t.Cleanup(func() {
		require.NoError(t, os.RemoveAll(plainRoot))
	})

	repo, err := InitGoGitRepo(plainRoot, namespace)
	require.NoError(t, err)
	t.Cleanup(func() {
		require.NoError(t, repo.Close())
	})

	// Can create indices
	indexA, err := repo.GetBleveIndex("a")
	require.NoError(t, err)
	require.NotZero(t, indexA)
	require.FileExists(t, filepath.Join(plainRoot, ".git", namespace, "indexes", "a", "index_meta.json"))
	require.FileExists(t, filepath.Join(plainRoot, ".git", namespace, "indexes", "a", "store"))

	indexB, err := repo.GetBleveIndex("b")
	require.NoError(t, err)
	require.NotZero(t, indexB)

	// Can get an existing index
	indexA, err = repo.GetBleveIndex("a")
	require.NoError(t, err)
	require.NotZero(t, indexA)

	// Can delete an index
	err = repo.ClearBleveIndex("a")
	require.NoError(t, err)
	require.NoDirExists(t, filepath.Join(plainRoot, ".git", namespace, "indexes", "a"))
}

Linux and MacOS filesystems have a different behavior than Windows - if you delete a file that's in use, it will remove the directory entry but the inodes won't be deleted until all the related file handles have been closed. Changing the code above to the following makes the tests fail less frequently:

func TestGoGitRepo_Indexes(t *testing.T) {
	t.Parallel()

	plainRoot := t.TempDir()

	repo, err := InitGoGitRepo(plainRoot, namespace)
	require.NoError(t, err)
	t.Cleanup(func() {
		require.NoError(t, repo.Close())
	})

	// Can create indices
	indexA, err := repo.GetBleveIndex("a")
	require.NoError(t, err)
	require.NotZero(t, indexA)
	require.FileExists(t, filepath.Join(plainRoot, ".git", namespace, "indexes", "a", "index_meta.json"))
	require.FileExists(t, filepath.Join(plainRoot, ".git", namespace, "indexes", "a", "store"))

	indexB, err := repo.GetBleveIndex("b")
	require.NoError(t, err)
	require.NotZero(t, indexB)

	// Can get an existing index
	indexA, err = repo.GetBleveIndex("a")
	require.NoError(t, err)
	require.NotZero(t, indexA)

	// Can delete an index
	err = repo.ClearBleveIndex("a")
	require.NoError(t, err)
	require.NoDirExists(t, filepath.Join(plainRoot, ".git", namespace, "indexes", "a"))
}

Using t.TempDir() to create each test's temporary directory also registers an alternate cleanup function that has a different behavior from os.RemoveAll(). This function includes the following ToB comment (see https://github.com/golang/go/blob/8a56c7742d96c8ef8e8dcecaf3d1c0e9f022f708/src/testing/testing.go#L1115):

// removeAll is like os.RemoveAll, but retries Windows "Access is denied."
// errors up to an arbitrary timeout.
//
// Those errors have been known to occur spuriously on at least the
// windows-amd64-2012 builder (https://go.dev/issue/50051), and can only occur
// legitimately if the test leaves behind a temp file that either is still open
// or the test otherwise lacks permission to delete. In the case of legitimate
// failures, a failing test may take a bit longer to fail, but once the test is
// fixed the extra latency will go away.

It's possible that universally using t.TempDir() within git-bug will eliminate this issue. It's going to be a bit hard to be sure since the only way to tell is to run the tests repeatedly on Github. There are only three additional occurrences of os.TempDir() in the code as most of the tests rely on CreateGoGitTestRepo for their test repository instance. I've been trying different scenarios in #804 and will make this change next.

@smoyer64
Copy link
Collaborator Author

smoyer64 commented Jun 2, 2022

Here's an instance of a test repo that's created but missing the code to properly close it after the test is complete - this might be contributing to the above described problem:

repo := repository.CreateGoGitTestRepo(false)

@smoyer64
Copy link
Collaborator Author

smoyer64 commented Jun 2, 2022

This test utility function creates three repo without closing them:

repoA := repository.CreateGoGitTestRepo(false)

@smoyer64
Copy link
Collaborator Author

smoyer64 commented Jun 4, 2022

I've got a very crufty branch I've been using for testing that includes 30-40 tests which include t.Parallel(). With these changes, the Windows build failed almost every time.

I then refactored the entire test suite to exclusively use t.TempDir() (with its built-in OS friendly version of removeAll()) and t.Cleanup(). After these changes, tests on Windows builds seem to be failing about once in 30 runs.

I've created #810, to describe a couple (final?) places we could end up with blocking behavior and will tackle those before returning to this issue.

@MichaelMure
Copy link
Owner

Here's an instance of a test repo that's created but missing the code to properly close it after the test is complete - this might be contributing to the above described problem:

repo := repository.CreateGoGitTestRepo(false)

This looks like a real issue.

This test utility function creates three repo without closing them:

repoA := repository.CreateGoGitTestRepo(false)

I think this one is OK because each time this function is used, defer repository.CleanupTestRepos(repoA, repoB, remote) is called. Admittedly, that might be better to register the removal with t.Cleanup() directly as it's made exactly for that.

That said, looking again at that code, it seems that there is a bunch of case where cache.NewRepoCache(repo) is called, but the cache is never closed after that. As we have seen, that might cause problems on windows especially.

@smoyer64
Copy link
Collaborator Author

smoyer64 commented Jun 7, 2022

Agreed ... on the branch where I've been testing change, I've refactored the tests so that the repositories are always created in the CreateGoGitTestRepo function and which includes the t.Cleanup(). Take a look at https://github.com/MichaelMure/git-bug/blob/affde987ab14507de7edfccb4b4913a5834d0d9b/repository/gogit_testing.go#L41 but please be advised that this branch is a mess because a) it's based off the branch with the failing test and b) I've tried several dozens of ideas including finding the races. Assuming it's okay with you, my plan was to start from master`, and then apply a similar set of refactors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants