-
Notifications
You must be signed in to change notification settings - Fork 2k
[RFC] fix incorrect GSI numbering causing device limit on aarch64 #5283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
On aarch64, Linux expects device tree interrupts (GSIs) to start at 0, as it internally adds an offset of 32 for SPIs. Firecracker previously defined IRQ_BASE as 32, unintentionally causing a double offset and limiting the maximum number of attachable devices to 64. This commit introduces new constants, GSI_BASE and GSI_MAX, properly adjusted for aarch64, with GSIs starting at 1 (since GSI 0 is disallowed by device_manager/mmio.rs). These changes ensure interrupts are numbered correctly, resolving the issue and allowing an aarch64 VM to handle up to 95 devices. Additional adjustments are made to tests to reflect this correction. Signed-off-by: Sheng-Wei (Way) Chen <waychensw@gmail.com>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5283 +/- ##
==========================================
+ Coverage 82.86% 82.91% +0.05%
==========================================
Files 250 250
Lines 26897 26897
==========================================
+ Hits 22288 22302 +14
+ Misses 4609 4595 -14
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Remove the documentation note regarding the 64-device limit on aarch64, as this limitation was resolved by an earlier commit. The limitation no longer applies, and the documentation is now updated. Signed-off-by: Sheng-Wei (Way) Chen <waychensw@gmail.com>
Set `GSI_BASE` to `0` for aarch64, enabling the use of the full interrupt range and allowing up to 96 devices to be attached. Change `MMIODeviceInfo::irq` from `Option<NonZeroU32>` to `Option<u32>` to accommodate `0` as a valid IRQ number. Adjust relevant tests to reflect this change and ensure correctness. Signed-off-by: Sheng-Wei (Way) Chen <waychensw@gmail.com>
aarch64 complains about:
|
We also need to update a few entries in the documentation:
PS: kudos for finding the problem and fixing the issue! Thanks! |
The test_max_devices test verifies the maximum number of devices that can be attached, as permitted by the platform. It checks both that the expected number of devices can be successfully attached and that attaching more than the limit results in a failure. Previously, this test was only executed on x86_64, where 18 devices are expected. This update extends the test to run on aarch64 as well, which supports up to 94 devices. Signed-off-by: Sheng-Wei (Way) Chen <waychensw@gmail.com>
After changing the type of `MMIODeviceInfo::irq` from `Option<NonZeroU32>` to `Option<u32>`, we don't need to convert NonZeroU32 to u32 with .into() anymore. Thus any occurance is removed. Signed-off-by: Sheng-Wei (Way) Chen <waychensw@gmail.com>
Hi, I've fixed the clippy error, updated the CHANGELOG, and removed the description of the 64-device limit from the docs. I will try to run the extended test as soon as I fix my settings. I will also clean up commits if there are no further changes to be made. |
Establishing 94 SSH connections exhausts the default 256 MiB memory of the `uvm_plain_machine` with `basic_config` previously used in tests `test_max_devices` on aarch64. Increase the VM's memory limit to 512 MiB to ensure stable and consistent test execution. Signed-off-by: Sheng-Wei (Way) Chen <waychensw@gmail.com>
Alright, I think the last failures we can fix by just disabling the memory monitor in the "too many devices" test (pass |
Attaching a large number of devices (94) on aarch64 consumes significant host memory, causing memory monitor checks to fail. Since this test specifically validates behavior when exceeding device limits, memory usage isn't relevant here. Disable the memory monitor check to ensure the test passes consistently. Signed-off-by: Sheng-Wei (Way) Chen <waychensw@gmail.com>
I’m opening this as an RFC because I would appreciate feedback on my proposed solution. Please see my detailed analysis and reasoning on the issue page #4207 . Additionally, I’m encountering timeout problems running the test suite with my current setup and would appreciate assistance verifying this patch.
Changes
Introduce two new constants (GSI_BASE and GSI_MAX) in the layout.rs files for both aarch64 and x86_64. These constants correctly adjust the GSI numbering for aarch64 VMs, aligning Firecracker’s interrupt assignments with Linux’s expectations and ensuring that the interrupts provided in the device tree and registered via irqfd are correctly numbered.
Reason
To resolve issue #4207 , where incorrect GSI numbering caused by using an offset of 32 (IRQ_BASE) resulted in an artificial limitation of 64 devices on aarch64 microVMs. This change removes that limitation, enabling aarch64 microVMs to correctly use up to 95 devices.
License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md
.PR Checklist
tools/devtool checkstyle
to verify that the PR passes theautomated style checks.
how they are solving the problem in a clear and encompassing way.
in the PR.
I have mentioned all user-facing changes inCHANGELOG.md
.When making API changes, I have followed theRunbook for Firecracker API changes.
integration tests.
I have linked an issue to every newTODO
.rust-vmm
.