Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Force ANS2MSIWorkaround to avoid I/O CQ timeouts
Force enable ANS2MSIWorkaround. We would often get a panic with I/O Read command timeout on VMware and Samsung PM981. Some investigation showed that CQ head entry phase and CQ phase would mismatch. This implies there is a race such that CQ head gets updated to point to an entry with inverted phase. Since FilterIRQ detects phase mismatch, it does not schedule HandleIRQ at the workloop of NVMe controller, so a request is never handled and we get a timeout. If Filter and Handle IRQ calls can possibly race, this may happen because HandleIRQ does not manage to update CQ phase in time before FilterIRQ phase check is scheduled, observing the old phase. This is the case in situation where two FilterIRQ calls happen with no HandleIRQ in between where the controller was too slow to notice INTMS being set. Normally, IRQ is masked just before HandleIRQ is scheduled in FilterIRQ, and unmasked when HandleIRQ is done. IONVMeController::ANS2MSIWorkaround forces IRQ to be masked at the very start of FilterIRQ instead so that FilterIRQ does not race with itself. This seems to eliminate the timeouts.
- Loading branch information