Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to create a dedicated cluster with encryption_key on GCP #65

Closed
felippe-mendonca opened this issue Jul 7, 2022 · 33 comments
Closed
Labels
blocked The issue should be fixed on API / backend level first bug Something isn't working work-in-progress

Comments

@felippe-mendonca
Copy link

When trying to create a new dedicated cluster on GCP with a self-managed KMS Key, I've faced an error. According with the doc, when you create through management console, you have to grant a set of permissions to a Google Group ID, which is given during the creation process. However, when creating using the Terraform provider, I have no information regarding the Google Group ID. Because of that, I'm not able to grant the permissions needed, so I got the following error.

Error: error creating Kafka Cluster "general": 400 Bad Request: Cluster configuration is invalid. Reasons=[encryption_key_id]

I've already checked the provider's source code but haven't figured out if it's possible to access the Google Group ID through any resource.

Does anybody know if that is a limitation of the Confluent Platform?

@linouk23
Copy link
Collaborator

linouk23 commented Jul 7, 2022

@felippe-mendonca thanks for creating an issue!

It might be a well known issue, let me look into that.

@linouk23
Copy link
Collaborator

linouk23 commented Jul 8, 2022

OK I double checked and it seems like it's indeed a limitation that we'll try to address in future versions but there's no timeline for it yet.

It might be a good idea to keep this issue open until it's fixed.

For now, the only workaround is to create a cluster using CLI / Confluent Cloud Console and then import it via TF.

@linouk23 linouk23 added the bug Something isn't working label Jul 8, 2022
@linouk23 linouk23 changed the title Not able to create a dedicated cluster with encryption_key Not able to create a dedicated cluster with encryption_key on GCP Jul 8, 2022
@linouk23
Copy link
Collaborator

linouk23 commented Jul 8, 2022

To add more context,

  • BYOK AWS -- you might ask our support for AWS account IDs / look it up in the CLI so it could work but managing permissions would be a manual step.

  • BYOK GCP -- creation via TF is not possible (the "getting Google Group ID" step is missing -- exactly as you mentioned).

Overall I feel like the best workaround is to create a cluster using CLI / Confluent Cloud Console and then import it via TF.

@larrywax
Copy link

larrywax commented Nov 29, 2022

To add more context,

  • BYOK AWS -- you might ask our support for AWS account IDs / look it up in the CLI so it could work but managing permissions would be a manual step.
  • BYOK GCP -- creation via TF is not possible (the "getting Google Group ID" step is missing -- exactly as you mentioned).

Overall I feel like the best workaround is to create a cluster using CLI / Confluent Cloud Console and then import it via TF.

Hi, I'm trying to setup a dedicated cluster trough pulumi and I was able to retrieve the Confluent AWS account ID from the network object (I guess this could also work in terraform, because pulumi wraps up this provider)
This means that I'm able to create a KMS key with the correct policy attached BEFORE creating the Confluent Kafka cluster.

Now I'm still stuck with the same exact error at the beginning of this issue Cluster configuration is invalid. Reasons=[encryption_key_id]

I have been lurking around the code to understand if there is something in the provider that could raise the error, but I can't find anything.

My guess is that something at API level is blocking cluster creation with an encryption key specified, doesn't matter if it has the correct policy or not.
So, if my guess is correct, would it be possible to unlock this feature? 🙏

@linouk23
Copy link
Collaborator

linouk23 commented Nov 29, 2022

Thanks for asking @larrywax!

My guess is that something at API level is blocking cluster creation with an encryption key specified, doesn't matter if it has the correct policy or not.

That sounds reasonable.

Overall I feel like the best workaround is to create a cluster using CLI / Confluent Cloud Console and then import it via TF.

Is there particular reason why importing is not sufficient in your use case? Let's say we'll hack around our existing APIs but I'm not sure we'll be able to provide support for this undocumented usage anyway.

@patrickherrera
Copy link

patrickherrera commented Nov 29, 2022

We faced the same issue with AWS (and Terraform). The workaround I did was to run the CLI first (following these instructions) and extract all the account IDs from the generated statements. Luckily you don't even need to provide a valid ARN for the key so you can run this before you have a key. Then quit the CLI and add those account IDs to your Terraform or other IaC tool and use that during your deployment to generate a key that allows access to those accounts.
It does appear you need to provide them all, even though each cluster would have a specific account you would think.

This is a manual step unfortunately, but it only needs to be done once and then everything else can be automated as many times as you like. Creating a Cluster manually and importing each time defeats the point. We did find that the list of Account IDs will change (rarely), so deploying a new cluster might fail until you update the list, but existing ones will continue to work.

So we have some Terraform config with:

confluent_account_ids = [
  "XXX", "YYY", "XXX"
]

And then create a policy which is applied to the Key with the correct permissions (which you will also get from the output of the CLI):

(snip)

  dynamic "statement" {
    for_each = var.confluent_account_ids 
    content {
      sid = "Enable ${statement.value} IAM User Permissions"
      principals {
        type = "AWS"
        identifiers = [
          "arn:aws:iam::${statement.value}:root"
        ]
      }
      effect = "Allow"
      actions = [
        "kms:Encrypt",
        "kms:Decrypt",
        "kms:ReEncryptFrom",
        "kms:ReEncryptTo",
        "kms:GenerateDataKey",
        "kms:GenerateDataKeyWithoutPlaintext",
        "kms:DescribeKey",
        "kms:CreateGrant",
        "kms:list*",
        "kms:RevokeGrant"
      ]
      resources = [
        "*"
      ]
    }
  }

(snip)

@larrywax
Copy link

larrywax commented Nov 29, 2022

Is there particular reason why importing is not sufficient in your use case? Let's say we'll hack around our existing APIs but I'm not sure we'll be able to provide support for this undocumented usage anyway.

@linouk23 thanks for your fast response!

We are trying to define a custom pulumi component (very similar to a terraform module) that wraps up all the resources needed to create and configure a kafka cluster from the ground up, applying sane defaults for our use case. This component will then be used by developers and executed via CI/CD.

Often, developers are not very confident with low level infra stuff (aws networking, kms key management, etc...) and, more important, they don't have the required privileges to create these resources manually.

Creating clusters manually via API/console just doesn't work for our use case, unfortunately :(
Also worth to mention that importing a pulumi resource that should be part of a custom component will not work because of parent resource mismatch, but I must admit this is a very specific corner case that may not apply to terraform users

@pkmec
Copy link

pkmec commented Nov 29, 2022

@larrywax I was able to the work-around here, groupID you need to assign is defined per environment, so once you have your environment ready, you can go through "add new cluster" wizard until point where you ask to specify custom KMS, at this screen you can grab your groupID, cancel wizard and use it during cluster creations. Not 100% automated, but if you are not creating dynamic environments only clusters, can work. I was not able to find any other how to retrieve this automated way, and went through all APIs possible.

@larrywax
Copy link

larrywax commented Dec 5, 2022

Thanks @patrickherrera, I tested your workaround and I was able to create a cluster! Thank you very much for sharing this!

So my assumption was wrong, there isn't any kind of block on confluent API, simply the key policy must grant access to all confluent AWS accounts

@nick-zi
Copy link

nick-zi commented Jan 12, 2023

To add more context,

  • BYOK AWS -- you might ask our support for AWS account IDs / look it up in the CLI so it could work but managing permissions would be a manual step.
  • BYOK GCP -- creation via TF is not possible (the "getting Google Group ID" step is missing -- exactly as you mentioned).

Overall I feel like the best workaround is to create a cluster using CLI / Confluent Cloud Console and then import it via TF.

@linouk23 is there a date when this will be fixed? Or are there any actions being actively taken to get it fixed?

@linouk23
Copy link
Collaborator

@nick-zi thanks for the ping! It's on the roadmap.

@schlbra
Copy link

schlbra commented Feb 16, 2023

@linouk23, are you able to disclose when this feature is planned on the roadmap and is it safe to assume the feature will be supported by both GCP and AWS?

@linouk23
Copy link
Collaborator

@schlbra thanks for asking! I'm not sure whether I can disclose a lot other than the fact we're working on it.

@schlbra
Copy link

schlbra commented Feb 16, 2023

@linouk23 Glad to hear it's already been prioritized and is being actively worked on. It sounds like the feature will include support for both GCP and AWS? Thank you for the update!

@linouk23
Copy link
Collaborator

linouk23 commented Feb 27, 2023

cc @larrywax @patrickherrera @pkmec @schlbra there're some good news for BYOK on AWS / Azure coming and we're working on an example for AWS where we kinda run into circular dependency for AWS (for Azure there's no issues) so we've got a quick question.

Will it be OK to have an e2e example that asks a user to provider KMS Key ARN as an input variable (kinda like UI does) or you'd rather still create KMS key as a part of our e2e terraform apply from scratch (as a part of terraform apply that creates all of the resources) ideally?
image

@patrickherrera
Copy link

Hi @linouk23, thanks but I'm only interested in 100% Terraform, and the only thing really blocking that (in AWS anyway) is the lack of an official way to get the list of Confluent Account IDs that I need to create policies for.

@linouk23
Copy link
Collaborator

@patrickherrera that's great to hear! What about this question:

Will it be OK to have an e2e example that asks a user to provider KMS Key ARN as an input variable (kinda like UI does) or you'd rather still create KMS key as a part of our e2e terraform apply from scratch (as a part of terraform apply that creates all of the resources) ideally?

@patrickherrera
Copy link

@linouk23 I personally want the e2e Terraform apply solution but both need to be supported if others are happy to use the UI.

@linouk23
Copy link
Collaborator

linouk23 commented Feb 28, 2023

both need to be supported

@patrickherrera could you confirm that by 'both' you mean

passing KMS Key ARN as an input variable (kinda like UI does)

and

creating KMS key as a part of our e2e terraform apply from scratch (as a part of terraform apply that creates all of the resources including environment, cluster etc)

@patrickherrera
Copy link

@linouk23 Ok, sorry, I thought you were suggesting adding the ability to pass a key ARN when creating a cluster in the UI was a new feature (I've never seen it or used it so it is all new to me except creating a Standard Cluster for initial testing, which doesn't support it anyway).

So you are literally just asking about what to put in an "example' and nothing to do with actual implementation in the provider? I suggest a full e2e example that does everything including creating the key would be the easiest entry point for new users, and it will be trivial to decompose that and integrate into an existing project.

@linouk23
Copy link
Collaborator

linouk23 commented Mar 1, 2023

@patrickherrera that's right 😁

To give you a little bit more context, there's a new BYOK API in Open Preview lifecycle stage (i.e., not ready for prod usage yet) that confluent_byok_key resource is based on but it seems like a full e2e TF example will contain a cyclic dependency between KMS key, BYOK key and a list of policies if we were to provision everything in a single example.

Another way to avoid a cycle is

to pass KMS Key ARN as an input variable (kinda like UI does)

and I wonder if that pattern of usage is common.

In other words, do you think the following is accurate:

There's often a security team which owns keys, and the security team creates a key and then asks "what permissions do you need" and then goes and updates the key. I'm thinking this may be the actual real-world workflow vs. the ideal world we're building towards (i.e., be able to provision everything in a single terraform apply run). We still want to provide an end-to-end TF example to provision everything in a single terraform apply run, but i'm curious how often end-to-end actually ends up being a single terraform template in real life.

Thanks!

@patrickherrera
Copy link

Thanks @linouk23 I think I can imagine how that could happen given you need to setup the policies to allow Cluster access before you can create the Cluster that needs those policies :-)
I checked out the API link you shared and I'm not sure I understand what it is trying to do. It says it is for "Create a Key", but expects the ARN of an existing key?

Anyway, in answer to your second question, I've never encountered an arrangement like that. I would have thought complete ownership within the team was more common, with a DevOps approach to provisioning and supporting an application. In my situation I've developed all the infrastructure-as-code to deploy a number of Clusters to different regions and support different environments (dev/staging/production for the moment). Creating and configuring Keys is part of that, and they sit right alongside the code to call the Confluent Provider to provision the Cluster itself. Various people have reviewed the code from a security perspective etc, but all ownership is within that repository and there are no external dependencies.
It wouldn't be the worst thing in the world to provision the Key in a separate step in the same repository if that remove the dependency but I don't really see how it would be different. Terraform will create things in the right order so whether the Key is created "externally" or not would still have the same dependency wouldn't it?

Is there any chance you could share a branch with your example as it stands at the moment?
Cheers

@linouk23
Copy link
Collaborator

linouk23 commented Mar 1, 2023

Is there any chance you could share a branch with your example as it stands at the moment?

@patrickherrera thanks for the reply!

Sure, see https://pastebin.com/w7yf10zi

@derekwinters
Copy link

I am also looking to implement a similar workflow with these resources as @patrickherrera. Ideally being able to deploy an AWS KMS key and create the cluster in a single step, with Terraform handling the dependency order of creating the KMS key before moving to the Confluent resources. We are very interested in this feature so we could likely help with POC/beta testing too, if there is any need for that.

@pixie79
Copy link

pixie79 commented Mar 3, 2023

Absolutely this is a must for our Banking POC to be converted to a sale in the next few days or we will be picking another provider. This did work before Christmas for a period as I managed to create a cluster but where I supplied the list of Confluent accounts manually. That was not a great pattern but at least it allowed me to build an encrypted cluster - now that facility has been blocked with is unsatisfactory.

@linouk23
Copy link
Collaborator

upd: I know I'm a little bit late here but hashicorp/terraform-provider-aws#29923 should unblock one-step apply that we've been asking about and I can't promise everything but most likely we'll release confluent_byok_key (to support BYOK API) resource next week in a Preview lifecycle stage on AWS and Azure.

@linouk23
Copy link
Collaborator

Update: we

Added new confluent_byok_key resource and a corresponding data source in a Preview lifecycle stage.
Added 2 new examples for confluent_byok_key resource:
dedicated-public-aws-byok-kafka-acls
dedicated-public-azure-byok-kafka-acls
Added support for new computed byok_key block of confluent_kafka_cluster resource and a corresponding data source in a Preview lifecycle stage.

in our latest 1.36.0 version of TF Provider.

Note: all the examples can be run in a single step 👏

cc @larrywax @patrickherrera @pkmec @schlbra @pixie79 @derekwinters

@schlbra
Copy link

schlbra commented Mar 18, 2023

Update: we

Added new confluent_byok_key resource and a corresponding data source in a Preview lifecycle stage.
Added 2 new examples for confluent_byok_key resource:
dedicated-public-aws-byok-kafka-acls
dedicated-public-azure-byok-kafka-acls
Added support for new computed byok_key block of confluent_kafka_cluster resource and a corresponding data source in a Preview lifecycle stage.

in our latest 1.36.0 version of TF Provider.

Note: all the examples can be run in a single step 👏

cc @larrywax @patrickherrera @pkmec @schlbra @pixie79 @derekwinters

This is great news, thank you for the update!

@linouk23 linouk23 added blocked The issue should be fixed on API / backend level first and removed work-in-progress labels May 10, 2023
@harideveloper
Copy link

+1

Hi,
I'm working on similar use-case in GCP. I'd appreciate if there are any updates on the BYOK in GCP Platform. Thanks

@linouk23
Copy link
Collaborator

@harideveloper this feature hasn't been added to our backend APIs yet which blocks its addition to TF Provider.

@cijujoseph
Copy link

Any update on this on GCP?

@linouk23
Copy link
Collaborator

@cijujoseph we are hoping to release it this week, we've merged it in the internal repo already

@linouk23
Copy link
Collaborator

@cijujoseph @felippe-mendonca @schlbra @harideveloper

Update: we

Added GCP support for confluent_byok_key resource and a corresponding data source in a General Availability lifecycle stage (#65).
Example: dedicated-public-gcp-byok-kafka-acls

in our latest 1.56.0 version of TF Provider.

Note: all BYOK examples can be run in a single step 👏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked The issue should be fixed on API / backend level first bug Something isn't working work-in-progress
Projects
None yet
Development

No branches or pull requests