Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add graceful exits to provisionerd #372

Merged
merged 16 commits into from
Feb 28, 2022
Merged

feat: Add graceful exits to provisionerd #372

merged 16 commits into from
Feb 28, 2022

Conversation

kylecarbs
Copy link
Member

Terraform (or other provisioners) may need to clean up state, or
cancel actions before exit. This adds the ability to gracefully
exit provisionerd.

We'll eventually make this operate on a signal to allow for usage in a Kubernetes environment with graceful shutdowns.

The added Shutdown() function in provisionerd has the expected usage:

daemon := provisionerd.New(...)

ctx, cancelFunc := context.WithTimeout(context.Background(), 30*time.Second)
defer cancelFunc()
err := daemon.Shutdown(ctx)
if !errors.Is(err, context.DeadlineExceeded) {
  return err
}
err = daemon.Close()

This was detecting branches, but not our "main" branch before.
Hopefully this fixes it!
This enables a consistent API for project import and provisioned resources.
This is a much cleaner abstraction. Explicitly declaring the user
parameters for each provisioner makes for significantly simpler
testing.
Terraform (or other provisioners) may need to cleanup state, or
cancel actions before exit. This adds the ability to gracefully
exit provisionerd.
@kylecarbs kylecarbs self-assigned this Feb 27, 2022
@codecov
Copy link

codecov bot commented Feb 27, 2022

Codecov Report

Merging #372 (d171e8a) into main (e5c9555) will decrease coverage by 0.17%.
The diff coverage is 66.12%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #372      +/-   ##
==========================================
- Coverage   68.19%   68.02%   -0.18%     
==========================================
  Files         148      148              
  Lines        8169     8271     +102     
  Branches       72       72              
==========================================
+ Hits         5571     5626      +55     
- Misses       2041     2085      +44     
- Partials      557      560       +3     
Flag Coverage Δ
unittest-go-macos-latest 66.29% <62.79%> (-0.02%) ⬇️
unittest-go-ubuntu-latest 67.24% <62.84%> (-0.41%) ⬇️
unittest-js 66.10% <ø> (ø)
Impacted Files Coverage Δ
provisioner/echo/serve.go 46.39% <0.00%> (-0.98%) ⬇️
coderd/provisionerdaemons.go 57.50% <22.72%> (-3.13%) ⬇️
provisioner/terraform/serve.go 63.63% <62.50%> (-5.12%) ⬇️
provisionerd/provisionerd.go 70.22% <71.00%> (+1.67%) ⬆️
provisioner/terraform/provision.go 74.08% <76.08%> (-0.57%) ⬇️
coderd/provisionerjobs.go 59.23% <100.00%> (-0.88%) ⬇️
peer/channel.go 82.45% <0.00%> (-1.76%) ⬇️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e5c9555...d171e8a. Read the comment docs.

Comment on lines +441 to +442
// This couldn't use terraform-exec, because it doesn't support cancellation, and there didn't appear
// to be a straight-forward way to add it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! Thanks for the comment

terraform.SetStdout(writer)
t.logger.Debug(ctx, "running apply", slog.F("options", options))
err = terraform.Apply(ctx, options...)
t.logger.Debug(ctx, "running apply", slog.F("vars", len(vars)), slog.F("env", len(env)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine some of these env vars may contain sensitive data - is there anything we need to do to protect them? Does slog have the ability to obfuscate sensitive items?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only printed the len here intentionally to obfuscate it!

Comment on lines +323 to +324
// Terraform can fail and apply and still need to store it's state.
// In this case, we return Complete with an explicit error message.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the description. Good call on catching this case where we would still need to pick up the statefile.

_ = cmd.Process.Signal(os.Kill)
}
}()
cmd.Stdout = stdout
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also care about Stderr here?

Copy link
Contributor

@bryphe-coder bryphe-coder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call on this! I just realized this was important when I started playing with the GCP provider... I'd hit Control+C to stop the terraform apply job and noted that it still updated the statefile - important for us to support that, too.

@kylecarbs
Copy link
Member Author

Exactly! That and being able to gracefully shutdown provisionerd instances is pretty important. That way our CI can gracefully upgrade provisionerd instances, instead of purging them and messing up workspace state.

Base automatically changed from nomagicparams to main February 28, 2022 18:26
@kylecarbs kylecarbs enabled auto-merge (squash) February 28, 2022 18:32
@kylecarbs kylecarbs merged commit 9d2803e into main Feb 28, 2022
@kylecarbs kylecarbs deleted the tfcancel branch February 28, 2022 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants