-
Notifications
You must be signed in to change notification settings - Fork 494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bottlerocket migration eats all available disk space from old version #2589
Comments
Hmmmmm I seem to get the same type of error on a m5.xlarge host with 50gb root volume and an additional 250gb disk volume. But looks like we are running the migrations from memory. The m5.xlarge hosts I was using have 16GB memory. Even on a host with 384gb memory and 1tb root volume, it still fails in the same way. Next steps is to try tests with updates from |
ExperimentMethodologyStart ECS variants starting at v1.0.6 using m5.2xlarge hosts (32gb memory), 20gb root volumes, and 200gb additional volume. Then, via an SSM session, apply an update manually via ResultsAble to upgrade from
|
Can you check how many migrations there are we're running from |
On a fresh host, before applying the update, here are the migrations it downloads:
so minus the first 3 lines, that would be 75 migrations from 1.0.8 to 1.10.1. |
After some investigation off-line with the team, looks like we are creating additional datastores on the bottlerocket/sources/api/migration/migrator/src/main.rs Lines 279 to 281 in 67e5abf
We'll want to consider what to do with this use case in the future and possibly dropping the creating of intermediate datastores during migration. |
Same problem for 1.7.2 -> 1.11.0 in EKS |
Hi @behradeslamifar - thanks for surfacing this! A work around would be to upgrade to a "pivot" version and then upgrade to Maybe try setting the locked version to |
I mocked up a code change to migrator to prevent the problem (though there's no way to backport a fix into earlier versions of Bottlerocket, so the issue will only be fixed when migrating into or between versions that have the fix). I can take this issue unless someone else wants it. |
@jpmcb This config not usefull for me. Because nodes are belong to ASG. So each time i want to add or remove the config it need to remove the the instance. So it create new instance with old ami. |
I have made some progress on this, but...
@behradeslamifar Are you unable to update your ASG to point to a newer AMI? I don't quite understand what you mean by "each time i want to add or remove the config". Can you describe your setup a little more so we can help you get migrated to a newer version of Bottlerocket? |
Now that the fix has been merged, it is likely to be included in Bottlerocket v1.12.0. Hosts upgrading into versions that have this fix should not experience this problem. However: If they needed this fix to upgrade, then they will not be able to downgrade back to the old version they came from. The safest way to upgrade, if a downgrade path is required, will be to upgrade just a few versions at a time until reaching versions that have this fix. @jpmcb @bcressey I'm not sure if there is a better place for this warning than having it in this issue. |
Hmmm I'm not sure either. The error should be discoverable if people search the repository. And if others open issues, we should be able to quickly identify it and then point to this one. Maybe in the future, if we have a "Troubleshooting" section of some website, we can post a notice there. For now, I think this issue is fine to point people to. |
Image I'm using:
ami-0fae9eeb7a17df155
What I expected to happen:
When attempting an upgrade, I expect upgrades in place to work (even when having to migrate through many versions of bottlerocket).
What actually happened:
However, when attempting to use an old AMI with the default disk sizes for many AWS instance types (2GB boot disk), there is a migration error indicating the disk is full during boot. This is reproduced here:
Processes of attempting an upgrade:
Then, on reboot:
Notice the migration failures with
No space left on device
How to reproduce the problem:
1.0.5
)ami-0fae9eeb7a17df155
and an instance with 2GB storage / 20GB user storageapiclient update check
andapiclient update apply
Background:
I recently ran into this with the ecs-updater integration tests. Some use that very old AMI. The update request for the updater would hang until timeout since the reboot would not be successful.
Is this the expectation? Do we need to instruct users upgrading from very old versions to expand the size of their boot disk to accommodate the many migrations?
The text was updated successfully, but these errors were encountered: