Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

broker storage: BadRequestException: The request does not include any updates to the EBS volumes of the cluster #99

Closed
michal-rybinski opened this issue Nov 21, 2023 · 14 comments
Labels
bug 🐛 An issue with the system

Comments

@michal-rybinski
Copy link

Describe the Bug

Changing broker volume size along with autoscaling max capacity and target values caused terraform to end up in a state where it is permanently throwing en error during run and is trying to remove section with provisioned throughput from broker config without any changes of the code.

Error: updating MSK Cluster (arn:aws:kafka) broker storage: BadRequestException: The request does not include any updates to the EBS volumes of the cluster. Verify the request, then try again. { RespMetadata: { StatusCode: 400, RequestID: "" }, Message: "The request does not include any updates to the EBS volumes of the cluster. Verify the request, then try again." }
with module.kafka_prod.module.kafka.aws_msk_cluster.default[0]
on .terraform/modules/kafka_prod.kafka/main.tf line 124, in resource "aws_msk_cluster" "default":
resource "aws_msk_cluster" "default" {

Screenshot from 2023-11-21 10-51-23

I suspect this MIGHT have something to do with the fact that volume_size is being ignored in the lifecycle policy here:


but this might be incorrect assumption.

Expected Behavior

no changes are expected

Steps to Reproduce

hard to pin point, but I was increasing storage on the UI and then catching up with changes in terraform, maybe this caused some differences?

Screenshots

No response

Environment

Linux
module version 2.0.0, also tried 2.3.0
Terraform version 1.5.3, also tested on 1.6.4

Additional Context

No response

@michal-rybinski michal-rybinski added the bug 🐛 An issue with the system label Nov 21, 2023
@michal-rybinski
Copy link
Author

wow, this really seem to draw attention... 2 months and not even a single comment from the team

@stroobl
Copy link
Contributor

stroobl commented Feb 23, 2024

@michal-rybinski My PR #103 includes a simple fix that works for us. The work in #101 might have a better version in the future, but is more complicated. We don't need that now and can't wait for it, so just wanted to share something that works for the time being.

@michal-rybinski
Copy link
Author

Thanks for that, will test it when it is out.
I see you've deleted in your PR ignoring the ebs_storage_info[0].volume_size parameter, was it not there for a specific reason as well? Why not just add the new one to the list?

@stroobl
Copy link
Contributor

stroobl commented Feb 23, 2024

Thanks for that, will test it when it is out. I see you've deleted in your PR ignoring the ebs_storage_info[0].volume_size parameter, was it not there for a specific reason as well? Why not just add the new one to the list?
I was not able to get things going if I kept ebs_storage_info[0].volume_size in place.

In our case: a staging cluster filled up and went down. We had to resize it in the console because the module fails to run when the cluster is down. I tried to re-import the cluster and storage autoscaling policy, but terraform was still trying to alter the cluster and failed with the same EBS error as you posted. Configure more storage in Terraform and try to apply: same error. After removing the ignore ebs_storage_info[0].volume_size line in the module the resize worked, but then on a second run I still got the EBS error. It only went away by adding broker_node_group_info[0].storage_info[0].ebs_storage_info[0].provisioned_throughput to lifecycle ignore_changes. (I got that pointer from the linked AWS provider issue)

@stroobl
Copy link
Contributor

stroobl commented Feb 23, 2024

The PR #103 got closed by Github while I was working in my fork to build a local module. @hans-d @gberenice : want to implement it this way for now or do you prefer to work on 101? Just let me know if I should open a new PR then.

@hans-d
Copy link

hans-d commented Feb 26, 2024

@stroobl I see no indication that #103 got closed automagically - we have no automation (yet) for those kind of things - and that should only be for stale ones.
As I'm not into the MSK area that much (at least not now), I have no preference for 101 or 103. Do whatever you think is best / easier.

@michal-rybinski
Copy link
Author

I see your PR was merged @stroobl , now just need to wait for the new release to be published to test it, thanks!

@hans-d
Copy link

hans-d commented Feb 27, 2024

@michal-rybinski the only thing merged was the pr in the fork unfortunately, not in this repo.

@michal-rybinski
Copy link
Author

ah, dang, have not looked at the full path of the repo, if I had time and resources I'd just ditch this module completely and write my own but cannot do it atm so am kind of stuck with this bug

@stroobl
Copy link
Contributor

stroobl commented Mar 6, 2024

I suppose Github closed the PR after my branch merged and was removed in the fork. Didn't think about that.

Anyhow: tests passed on the PR now. The only downside that I can think of for this code is people using autoscaling: they will have to match the autoscaled storage size in the input to avoid failing changes. But I guess that's still better than possibly ending up in an unrecoverable broken state with the module.

@hans-d
Copy link

hans-d commented Mar 7, 2024

@michal-rybinski if you can confirm that the issue is now resolved? Thanks

@michal-rybinski
Copy link
Author

Hi @hans-d , Thank you, I will try to find time this week to test it and will let you know.

@michal-rybinski
Copy link
Author

michal-rybinski commented Mar 13, 2024

Hello, I can confirm the fix worked fine and we no longer see the orphaned config lines erroring the TF run, thank you for all involved!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 An issue with the system
Projects
None yet
Development

No branches or pull requests

3 participants