Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The stack fails at last stage due to custom key store failing to connect with HSM cluster #12

Closed
awspankj opened this issue May 10, 2023 · 5 comments

Comments

@awspankj
Copy link

When you choose to create custom key store, the stack gets stuck for more than 1.5 hours and then rolls back. The custom key store gets created but fails to connect with HSM cluster with error- "KMS cannot connect the custom key store to its CloudHSM cluster. Error code: USER_NOT_FOUND". I assume 'kmsuser' is not getting configured correctly.

@ckamps
Copy link
Contributor

ckamps commented May 12, 2023

@awspankj thanks for the report. We're in the process of preparing a heavily refactored version of this automation for publishing here in aws-samples. The refactored version has many enhancements including more robust error handling.

In the meantime, did you get a chance to inspect the cfn-init.log data to potentially better understand the issue? As mentioned in Troubleshooting stack creation.

Selecting the following option during stack creation can help preserve some of the resources so that it's easier to troubleshoot:

image

Separately, I've also provided you with a pointer to the heavily refactored fork in case you'd like to try that version.

@ckamps
Copy link
Contributor

ckamps commented May 12, 2023

@awspankj it appears that the kmsuser was created, but at the point that the CloudHSM key store was being connected, the kmuser is in an inconsistent state. i.e. the user is not present on each of the two HSMs in the cluster. The connect operation fails due to the user not being present in all HSMs of the cluster.

I'm investigating why, under some circumstances, the user gets into that state. This failure appears to be a result of enhancing the code to use the cloudhsm-cli package vs the cloudhsm-client package.

While the CloudFormation stack is waiting for the key store to get into the connected state, a workaround is to access the EC2 client, delete, and create again the kmsuser using the the cloudhsm-cli.

@awspankj
Copy link
Author

awspankj commented May 13, 2023 via email

@ckamps
Copy link
Contributor

ckamps commented May 14, 2023

@awspankj I reverted this repository to the commit prior to introducing use of the cloudhsm-cli package in place of the cloudhsm-client package so that the kmsuser creation is stable. I'll send a note to you once the newly refactored form of the overall automation is published to this repository. In the meantime, you can use the internal fork I referenced separately.

@ckamps ckamps closed this as completed May 14, 2023
@awspankj
Copy link
Author

awspankj commented May 15, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants