Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CC-8720]CMEK resource #67

Merged
merged 2 commits into from
Jan 5, 2023
Merged

[CC-8720]CMEK resource #67

merged 2 commits into from
Jan 5, 2023

Conversation

erademacher
Copy link
Contributor

@erademacher erademacher commented Dec 20, 2022

This commit adds a new CMEK resource, which primarily manages the configuration of customer-managed encryption keys (CMEK). CMEK is enabled for a cluster when the resource is created, and keys can be rotated by updating their specs.

Due to circular dependencies (new regions depend on CMEK config, which depends on role policy, which depends on cluster), new regions must be managed by the CMEK resource. That means the cluster resource and CMEK resource need to keep track of which regions they "own" and reconcile that with updates and the actual state of the cluster. That's the bulk of the complexity here.

Destroying a CMEK resource is a no-op, since CMEK cannot be disabled. That means that a user could be left with "floating" regions. I need to figure out a way to block targeted destroy operations on CMEK resources while allowing them to be removed from state when the parent cluster is destroyed. We could attempt to remove additional regions on destroy, but that would take significantly longer.

Happy to do a walkthrough and discuss tradeoffs at a watercooler.

Add a description of the problem this PR addresses and an overview of how this PR works.


This change is Reviewable

@adwittumuluri
Copy link

Hey evan, here is where we Acquire a cloud provider account by ID. Seems like that's plumbed all the way through to the admin console, and can be set via the admin-cli.

admin-cli cluster create [other args] --account-id [account id]

Seems like we're very vocal in the comments that it's for testing purposes, though if it makes the TF flow a lot easier, then we can argue against that.

Copy link

@ryanluu12345 ryanluu12345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Evan, also wanted to give a reminder for a diagram that shows the flow for how this feature works, that way we can better visualize the dependencies.

Reviewable status: 0 of 13 files reviewed, all discussions resolved (waiting on @adwittumuluri, @jenngeorge, @marksoper, and @prafull01)

Copy link

@ryanluu12345 ryanluu12345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this was not super hard to follow. Thanks again for the session today explaining the whole flow and reasonings for different regions lists that needed to be reconciled.

Reviewable status: 0 of 13 files reviewed, 4 unresolved discussions (waiting on @adwittumuluri, @erademacher, @jenngeorge, @marksoper, and @prafull01)


examples/workflows/cockroach_cmek/main.tf line 21 at r2 (raw file):

  type     = string
  nullable = false
  default  = "us-west2"

I believe for AWS the region would be us-west-2.


internal/provider/cluster_resource.go line 274 at r2 (raw file):

			}
			dedicated.Hardware = hardware
			if cfg.PrivateNetworkVisibility.ValueBool() {

Wouldn't we need another CCAPI release for this to be exposed and usable? Or is this currently exposed in production?


internal/provider/cluster_resource.go line 551 at r2 (raw file):

		var privateNetworkVisibility bool
		if networkVisibility, ok := clusterObj.AdditionalProperties["network_visibility"].(string); ok {
			privateNetworkVisibility = networkVisibility == "NETWORK_VISIBILITY_PRIVATE"

Nit: NETWORK_VISIBILITY_PRIVATE as a constant since it's reused elsewhere.


internal/provider/cluster_resource.go line 620 at r2 (raw file):

//
// A nil return value means no region update is required.
func reconcileRegionUpdate(ctx context.Context, state, plan []Region, clusterID string, service client.Service) (*map[string]int32, diag.Diagnostics) {

What are scenarios where these different region lists can go out of sync? Will this method address cases where we could go out of sync?

@erademacher erademacher marked this pull request as draft December 22, 2022 22:38
@erademacher erademacher marked this pull request as ready for review January 5, 2023 18:13
Copy link
Contributor Author

@erademacher erademacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm opening this back up before the single/multi key stuff because Twilio needs something to test with now. It shouldn't be a breaking change anyway.

I updated the Go SDK so we aren't doing anything janky with AdditionalProperties to get the cluster set up for private network visibility.

Reviewable status: 0 of 17 files reviewed, 4 unresolved discussions (waiting on @adwittumuluri, @jenngeorge, @marksoper, @prafull01, and @ryanluu12345)


examples/workflows/cockroach_cmek/main.tf line 21 at r2 (raw file):

Previously, ryanluu12345 wrote…

I believe for AWS the region would be us-west-2.

Good catch. Fixed.


internal/provider/cluster_resource.go line 274 at r2 (raw file):

Previously, ryanluu12345 wrote…

Wouldn't we need another CCAPI release for this to be exposed and usable? Or is this currently exposed in production?

No, it's currently exposed in production. API releases are only needed for breaking changes.


internal/provider/cluster_resource.go line 551 at r2 (raw file):

Previously, ryanluu12345 wrote…

Nit: NETWORK_VISIBILITY_PRIVATE as a constant since it's reused elsewhere.

Addressed with the Go SDK update.


internal/provider/cluster_resource.go line 620 at r2 (raw file):

Previously, ryanluu12345 wrote…

What are scenarios where these different region lists can go out of sync? Will this method address cases where we could go out of sync?

Cluster and CMEK region lists go out of sync when new regions are added after CMEK is enabled. That's the whole point of this method. The only thing that it doesn't address is when users add or remove regions outside of Terraform, which they aren't supposed to do. We have no way to assign them ownership under this system. We won't accidentally remove them, though.

This commit adds a new CMEK resource, which primarily manages the
configuration of customer-managed encryption keys (CMEK). CMEK is
enabled for a cluster when the resource is created, and keys can
be rotated by updating their specs.

Due to circular dependencies (new regions depend on CMEK config,
which depends on role policy, which depends on cluster), new
regions must be managed by the CMEK resource. That means the cluster
resource and CMEK resource need to keep track of which regions they
"own" and reconcile that with updates and the actual state of the
cluster. That's the bulk of the complexity here.

Destroying a CMEK resource is a no-op, since CMEK cannot be disabled.
That means that a user could be left with "floating" regions.
I need to figure out a way to block targeted destroy operations on
CMEK resources while allowing them to be removed from state when the
parent cluster is destroyed. We could attempt to remove additional
regions on destroy, but that would take significantly longer.

Happy to do a walkthrough and discuss tradeoffs at a watercooler.
- Update cockroach-cloud-sdk-go to v0.3.3
- Update HasPrivateNodes references
- Add and use a mock generator
Copy link

@ryanluu12345 ryanluu12345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 8 of 13 files at r1, 7 of 8 files at r3, all commit messages.
Reviewable status: 15 of 17 files reviewed, 2 unresolved discussions (waiting on @adwittumuluri, @erademacher, @jenngeorge, @marksoper, and @prafull01)


docs/data-sources/cluster.md line 42 at r4 (raw file):

- `memory_gib` (Number)
- `num_virtual_cpus` (Number)
- `private_network_visibility` (Boolean)

Wondering why you made the schema for this a boolean and not a string or enum (if supported)? The original decision for NetworkVisibility as an enum is to make it more extensible to future networking types without having to introduce a breaking type change.

If this is not too much of a concern and can be changed later with minimal impact, :lgtm:


internal/provider/cluster_resource.go line 620 at r2 (raw file):

Previously, erademacher (Evan Rademacher) wrote…

Cluster and CMEK region lists go out of sync when new regions are added after CMEK is enabled. That's the whole point of this method. The only thing that it doesn't address is when users add or remove regions outside of Terraform, which they aren't supposed to do. We have no way to assign them ownership under this system. We won't accidentally remove them, though.

Thanks for the explanation. Looks good to me then.

@marksoper
Copy link
Collaborator

example and docs look good @erademacher - I will test locally asap

Copy link
Contributor Author

@erademacher erademacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs! I'm gonna err on the side of expediency and get this out the door.

Reviewable status: 15 of 17 files reviewed, 2 unresolved discussions (waiting on @adwittumuluri, @jenngeorge, @prafull01, and @ryanluu12345)


docs/data-sources/cluster.md line 42 at r4 (raw file):

Previously, ryanluu12345 wrote…

Wondering why you made the schema for this a boolean and not a string or enum (if supported)? The original decision for NetworkVisibility as an enum is to make it more extensible to future networking types without having to introduce a breaking type change.

If this is not too much of a concern and can be changed later with minimal impact, :lgtm:

Simplicity, mostly. At this stage, it's relatively OK to make breaking changes. I can definitely change it to a string later.

@erademacher erademacher merged commit 1f35ce2 into main Jan 5, 2023
@erademacher erademacher deleted the cmek_resource branch January 5, 2023 23:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants