Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: prevent terraform init races and corrupted cache directories #4985

Merged
merged 2 commits into from
Nov 9, 2022

Conversation

deansheather
Copy link
Member

Before we were creating a new initMu to guard terraform init calls for each executor, and a new executor for each job. This changes the initMut to be a global mutex to guard all terraform init calls.

In production we only use a single terraform cache directory shared for all provisioner daemons so there aren't any optimizations to this locking scheme that can be made that would help production deployments. I was going to have a global map from e.cachePath to a mutex but decided it wasn't worth it since we never use a different cache path between runs.

@deansheather deansheather enabled auto-merge (squash) November 9, 2022 19:20
@deansheather deansheather merged commit 45f81a7 into main Nov 9, 2022
@deansheather deansheather deleted the dean/terraform-init-mutex branch November 9, 2022 19:40
@github-actions github-actions bot locked and limited conversation to collaborators Nov 9, 2022
exitTimeout time.Duration
}

func (s *server) executor(workdir string) executor {
return executor{
initMu: &s.initMu,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deansheather how does this change help? We were already sharing the server mutex with the executors.

If there were multiple servers sharing a cache path I’d understand the issue. I had always imagined we’d use subfolders for each server/provisionerd instance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All provisioners in the same process share the same cache directory, and each had it's own mutex.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I misunderstood your PR description I think. This change is fine for now but I do think we should just have IDs for each provisioner and create subpaths in the cache dir in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, I think one reason I implemented it like this was for better test concurrency. Right now I believe most tests using the provisions have their own cache path. So this global mutex might slow those down due to not betting able to run concurrently or reuse the cache.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants