Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about host configurations and support for virtualization (VT-x, VT-d) #78

Open
sheridancbio opened this issue Jul 10, 2014 · 14 comments

Comments

@sheridancbio
Copy link

The question is: are the nodes on the cluster configured for both VT-x and also VT-d ?
(these may be bios settings + kernel compilation options)

This is motivated in part to investigate possible causes for disk access problems, and in part to plan for future development.

@tatarsky
Copy link
Contributor

VT-x is on confirmed. VT-d is available in BIOS. Its state appears to be off by default. We will test with you after rebuild on a node with it on to determine any benefit.

@jchodera
Copy link
Member

VT-d would be useful for us as well, as this is required for GPU virtualization (which we might be able to get to work).

@tatarsky
Copy link
Contributor

We have confirmed VT-x and VT-d are enabled on at least gpu-3-9. We are doing further checks as we have time. However, if you could TEST when we open the environment back up soon on this node I have personally validated the BIOS settings for both are enabled. Stay tuned for test period announcement.

@csirving
Copy link

You may have already done this but just as a reminder, you need to turn on kernel support for VT-d by setting the kernel option intel_iommu=on.
-Christopher

From: tatarsky <notifications@github.commailto:notifications@github.com>
Reply-To: cBio/cbio-cluster <reply@reply.github.commailto:reply@reply.github.com>
Date: Tue, 29 Jul 2014 07:33:03 -0700
To: cBio/cbio-cluster <cbio-cluster@noreply.github.commailto:cbio-cluster@noreply.github.com>
Subject: Re: [cbio-cluster] Question about host configurations and support for virtualization (VT-x, VT-d) (#78)

We have confirmed VT-x and VT-d are enabled on at least gpu-3-9. We are doing further checks as we have time. However, if you could TEST when we open the environment back up soon on this node I have personally validated the BIOS settings for both are enabled. Stay tuned for test period announcement.


Reply to this email directly or view it on GitHubhttps://github.com//issues/78#issuecomment-50483336.

@tatarsky tatarsky self-assigned this Aug 11, 2014
@tatarsky
Copy link
Contributor

tatarsky commented Sep 9, 2014

I would like consider scheduling a time to drain a node and reboot it with the flags defined above and compare performance of a virtual machine. I would propose gpu-3-9.

@jchodera
Copy link
Member

jchodera commented Sep 9, 2014

This shouldn't have much impact on the running system as long as it is kept
out of the standard queues, so proceed as you see fit!
On Sep 9, 2014 1:38 PM, "tatarsky" notifications@github.com wrote:

I would like consider scheduling a time to drain a node and reboot it with
the flags defined above and compare performance of a virtual machine. I
would propose gpu-3-9.


Reply to this email directly or view it on GitHub
#78 (comment).

@tatarsky
Copy link
Contributor

tatarsky commented Sep 9, 2014

Offlined gpu-3-9 and will reboot with options after drain confirmed.

@tatarsky
Copy link
Contributor

Looking at this now it appears to be enabled in the kernel but not the module:

dmesg | grep -e DMAR -e IOMMU
ACPI: DMAR 000000007ca945b0 000E0 (v01 A M I OEMDMAR 00000001 INTL 00000001)
dmar: IOMMU 0: reg_base_addr fbffe000 ver 1:0 cap d2078c106f0462 ecap f020fa
dmar: IOMMU 1: reg_base_addr cfffc000 ver 1:0 cap d2078c106f0462 ecap f020fa
vboxpci: IOMMU not found (not compiled)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

And if we look closely at the code that results in that being printed:

#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 35) && defined(CONFIG_IOMMU_API)
#define VBOX_WITH_IOMMU
#endif

So unless the kernel revision is 2.6.35. vboxpci does not activate IOMMU for some reason. And we are running

2.6.32-358.18.1

CentOS sticks with the 2.6.32 line of kernel code all the way to current revisions I'm afraid so thats non-trivial to fix.

I'll Google around and see what exactly 2.6.35 is needed or adjust that code if it is not just to see...

@tatarsky
Copy link
Contributor

Others have been down this road. The code does not compile on < 2.6.35 and CentOS 6.X kernels will not likely become 2.6.35. There is mention of possible backports of the iommu code but I've not found that to be true yet.

So this is appearing for now to be a dead end or requiring considerable effort to add.

I will mention this on our weekly call.

@tatarsky
Copy link
Contributor

Verified that this does not compile on even the latest CentOS 6.5 kernel which remains of the 2.6.32 branch. Unlikely to be solvable with current OS revision. Adding gpu-3-9 back into scheduler for now.

@tatarsky
Copy link
Contributor

Actually trying one more idea. Draining again gpu-3-9

@sheridancbio
Copy link
Author

Thanks for going into the module code and making these efforts. I am looking at the kernel version chart on http://en.wikipedia.org/wiki/Linux_kernel#Maintenance, and 2.6.32 is the only surviving kernel with support on the 2.x kernel list. Support appears to end "mid-2015". Do we plan a transition away from Cent-OS 6.4 down the road? An alternative virtual host platform that has been recommended by several collaborators is docker, and I was told that a 3.8 or higher kernel is what works smoothly for that platform.
(Additional note: some more googling through forums shows that people are running docker under 2.6.32 kernels .. and that some updates needed were back-ported into the kernel code for RHEL and CentOS)

@tatarsky
Copy link
Contributor

Please note docker was implemented although we cannot run that 3.8 version kernel at this time initial tests seem to show it mostly working. If you would like to test it with @akahles please comment in #140.

@tatarsky
Copy link
Contributor

This is not fixable at this time with a CentOS 6 environment. Leaving it open for future version discussions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants