New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The ec2 instance with localnvme storage fails to boot with fcos stable version: 36.20220906.3.2 #1306
Comments
can you give a few exact names of instance types you've tried this on? We should probably enhance our AWS test to test a few more instance types. |
i3.large and c5d.4xlarge |
This is reported upstream. It appears the summary of the investigation is that the Linux If you want a status update on the firmware rollout please ask on the upstream thread or contact AWS customer service. |
Thank you for the update |
This test ensures that if an nvme device exists it is accessible. See coreos/fedora-coreos-tracker#1306 This commit also denylists the test with a snooze for the next few weeks. The hope is that Amazon does the firmware rollout soon.
This test ensures that if an nvme device exists it is accessible. See coreos/fedora-coreos-tracker#1306 This commit also denylists the test with a snooze for the next few weeks. The hope is that Amazon does the firmware rollout soon.
This test ensures that if an nvme device exists it is accessible. See coreos/fedora-coreos-tracker#1306 This commit also denylists the test with a snooze for the next few weeks. The hope is that Amazon does the firmware rollout soon.
With coreos/fedora-coreos-config#2005 and coreos/fedora-coreos-pipeline#669 we added a test and we'll know if local NVMe storage regresses again in the future. The test will be enabled properly once AWS rolls out the controller firmware update. |
OK the test seemed to pass in recent tests:
I assume AWS performed the necessary firmware update. @gongx - can you confirm things are looking good for you now? |
Cool. Thank you for the update. I will do the test today and will report back whether the fix has been fixed or not. |
I double checked. Looks like that it still fails with
|
Just to make sure we're comparing apples to apples - can you try with |
I am using
Which version are you using for the test? It will be easier for us to just use the table version. |
Hmm. Yes. Testing with Testing with Testing with So it may be some combination of new software? The |
It looks like our F37+ streams pass this test now [1] so let's also only deny the test on streams where it's known to fail. [1] coreos/fedora-coreos-tracker#1306 (comment)
It looks like our F37+ streams pass this test now [1] so let's also only deny the test on streams where it's known to fail. [1] coreos/fedora-coreos-tracker#1306 (comment)
ok I almost wonder if AWS backed out their firmware update. The test started failing in our
Where we should see |
It seems as if the fix that AWS had applied is no longer working. See coreos/fedora-coreos-tracker#1306 (comment)
It seems as if the fix that AWS had applied is no longer working. See coreos/fedora-coreos-tracker#1306 (comment)
This is still a problem as of today. We need to extend the snooze for this test again. |
Looking into this at AWS internally. I have an internal tracking ticket. |
It seems as if the fix that AWS had applied is no longer working. See: coreos/fedora-coreos-tracker#1306 (comment)
It seems as if the fix that AWS had applied is no longer working. See: coreos/fedora-coreos-tracker#1306 (comment)
This is still failing with whatever environment AWS has as of today (2023-01-11) and |
This is still failing with whatever environment AWS has as of today (2023-02-10) and |
Okay. I am taking this back to the AWS EBS team for review. |
It appears this is passing our test now. In the most recent
Hopefully it's really resolved this time! |
Recent tests are passing. Hopefull the issue in the AWS environment is fully resolved now. Closes coreos/fedora-coreos-tracker#1306 (comment)
Recent tests are passing. Hopefully the issue in the AWS environment is fully resolved now. Closes coreos/fedora-coreos-tracker#1306 (comment)
Recent tests are passing. Hopefully the issue in the AWS environment is fully resolved now. Closes coreos/fedora-coreos-tracker#1306 (comment)
This test ensures that if an nvme device exists it is accessible. See coreos/fedora-coreos-tracker#1306 This commit also denylists the test with a snooze for the next few weeks. The hope is that Amazon does the firmware rollout soon.
It looks like our F37+ streams pass this test now [1] so let's also only deny the test on streams where it's known to fail. [1] coreos/fedora-coreos-tracker#1306 (comment)
It seems as if the fix that AWS had applied is no longer working. See coreos/fedora-coreos-tracker#1306 (comment)
It seems as if the fix that AWS had applied is no longer working. See: coreos/fedora-coreos-tracker#1306 (comment)
Recent tests are passing. Hopefully the issue in the AWS environment is fully resolved now. Closes coreos/fedora-coreos-tracker#1306 (comment)
This test ensures that if an nvme device exists it is accessible. See coreos/fedora-coreos-tracker#1306 This commit also denylists the test with a snooze for the next few weeks. The hope is that Amazon does the firmware rollout soon.
It looks like our F37+ streams pass this test now [1] so let's also only deny the test on streams where it's known to fail. [1] coreos/fedora-coreos-tracker#1306 (comment)
It seems as if the fix that AWS had applied is no longer working. See coreos/fedora-coreos-tracker#1306 (comment)
It seems as if the fix that AWS had applied is no longer working. See: coreos/fedora-coreos-tracker#1306 (comment)
Recent tests are passing. Hopefully the issue in the AWS environment is fully resolved now. Closes coreos/fedora-coreos-tracker#1306 (comment)
Describe the bug
The ec2 instance with localnvme storage (i3, c5d, etc) fails to boot by using fcos stable version: 36.20220906.3.2 with ami-0dbce9bea71a2ee29
Expected behavior
The ec2 instance with localnvme storage can start successfully by using fcos stable version: 36.20220906.3.2 with ami-0dbce9bea71a2ee29
we have tried with the previous stable version: Fedora CoreOS 36.20220806.3.0 which works fine
Actual behavior
Noticed the following errors from system logs
systemd[1]: dev-nvme0n1.device: Job dev-nvme0n1.device/start timed out. TIME �[0m] Timed out waiting for device �[0;1;�nvme0n1.device[ 97.678110] systemd[1]: Timed out waiting for device dev-nvme0n1.device - /dev/nvme0n1. �[0m - /dev/nvme0n1. systemd[1]: dev-nvme0n1.device: Job dev-nvme0n1.device/start failed with result 'timeout'. [ 97.690574] ignition[703]: disks: createPartitions: op(1): [failed] waiting for devices [/dev/nvme0n1]: device unit dev-nvme0n1.device timeout [ 97.696737] systemd[1]: ignition-disks.service: Main process exited, code=exited, status=1/FAILURE [ 97.701377] ignition[703]: disks failed
System details
The text was updated successfully, but these errors were encountered: