-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about host configurations and support for virtualization (VT-x, VT-d) #78
Comments
VT-x is on confirmed. VT-d is available in BIOS. Its state appears to be off by default. We will test with you after rebuild on a node with it on to determine any benefit. |
VT-d would be useful for us as well, as this is required for GPU virtualization (which we might be able to get to work). |
We have confirmed VT-x and VT-d are enabled on at least gpu-3-9. We are doing further checks as we have time. However, if you could TEST when we open the environment back up soon on this node I have personally validated the BIOS settings for both are enabled. Stay tuned for test period announcement. |
You may have already done this but just as a reminder, you need to turn on kernel support for VT-d by setting the kernel option intel_iommu=on. From: tatarsky <notifications@github.commailto:notifications@github.com> We have confirmed VT-x and VT-d are enabled on at least gpu-3-9. We are doing further checks as we have time. However, if you could TEST when we open the environment back up soon on this node I have personally validated the BIOS settings for both are enabled. Stay tuned for test period announcement. — |
I would like consider scheduling a time to drain a node and reboot it with the flags defined above and compare performance of a virtual machine. I would propose gpu-3-9. |
This shouldn't have much impact on the running system as long as it is kept
|
Offlined gpu-3-9 and will reboot with options after drain confirmed. |
Looking at this now it appears to be enabled in the kernel but not the module: dmesg | grep -e DMAR -e IOMMU And if we look closely at the code that results in that being printed: #if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 35) && defined(CONFIG_IOMMU_API) So unless the kernel revision is 2.6.35. vboxpci does not activate IOMMU for some reason. And we are running 2.6.32-358.18.1 CentOS sticks with the 2.6.32 line of kernel code all the way to current revisions I'm afraid so thats non-trivial to fix. I'll Google around and see what exactly 2.6.35 is needed or adjust that code if it is not just to see... |
Others have been down this road. The code does not compile on < 2.6.35 and CentOS 6.X kernels will not likely become 2.6.35. There is mention of possible backports of the iommu code but I've not found that to be true yet. So this is appearing for now to be a dead end or requiring considerable effort to add. I will mention this on our weekly call. |
Verified that this does not compile on even the latest CentOS 6.5 kernel which remains of the 2.6.32 branch. Unlikely to be solvable with current OS revision. Adding gpu-3-9 back into scheduler for now. |
Actually trying one more idea. Draining again gpu-3-9 |
Thanks for going into the module code and making these efforts. I am looking at the kernel version chart on http://en.wikipedia.org/wiki/Linux_kernel#Maintenance, and 2.6.32 is the only surviving kernel with support on the 2.x kernel list. Support appears to end "mid-2015". Do we plan a transition away from Cent-OS 6.4 down the road? An alternative virtual host platform that has been recommended by several collaborators is docker, and I was told that a 3.8 or higher kernel is what works smoothly for that platform. |
This is not fixable at this time with a CentOS 6 environment. Leaving it open for future version discussions. |
The question is: are the nodes on the cluster configured for both VT-x and also VT-d ?
(these may be bios settings + kernel compilation options)
This is motivated in part to investigate possible causes for disk access problems, and in part to plan for future development.
The text was updated successfully, but these errors were encountered: