-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CFE-1563: Fix logic to detect when running under a Xen Hypervisor #2746
Conversation
When running cfengine with this change on a Xen guest the output looks like this:
|
Reviewed by: cy, gjb Approved by: cy Differential Revision: D8607
{ | ||
if ((eax - base) < 2) | ||
{ | ||
Log(LOG_LEVEL_ERR, "Insufficient Xen CPUID Leaves. eax=%x at base %x\n", eax, base); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the odds that this error occurs? If Xen is not detected we wouldn't like to log errors in the console. Maybe downgrade verbosity to LOG_LEVEL_VERBOSE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the odds that this error occurs? If Xen is not detected we wouldn't like to log errors in the console. Maybe downgrade verbosity to LOG_LEVEL_VERBOSE?
The odds of this ever happening are very slim. For that code to execute it means that the call to get the CPU ID returned XenVMMXenVMM meaning that the host is running on a hypervisor but someone was mucking around with an invalid CPUID mask.
@skreuzer It's good for merging after we figure out the logging issue, sorry I couldn't review earlier. |
Xen_Cpuid(0x40000000, &eax, &sig.u[0], &sig.u[1], &sig.u[2]); | ||
|
||
if (strcmp("XenVMMXenVMM", sig.s) || (eax < 0x40000002)) | ||
for (base = 0x40000000; base < 0x40010000; base += 0x100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the logic in going from 0x40000000
to 0x40010000
? I don't see this value mentioned in the link in the comment, and it's not very well explained in the commit message why this is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the logic in going from 0x40000000 to 0x40010000?
Because hypervisor CPUID leaves don't have a standard Xen can be found at the first unused 0x100 aligned offset between 0x40000000 and 0x40010000. While most of the time the vendor id will be found at address 0x40000000, to properly detect if Xen is the hypervisor it is necessary to scan this whole range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This unfortunately is causing segfaults on our older platforms (RHEL4 in particular). I still don't fully understand the choice of this value, since I can't see it mentioned anywhere, but admittedly I don't know much about Xen detection.
Did you just choose a value that you thought was "high enough"? Do you think it might fix the segfaults if we reduced this value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To properly detect Xen you need to scan the leaves between 0x40000000 to 0x40010000. Take a look at xen-detect.c for some additional details.
I am not quite sure why this is segfaulting and since I don't have access to a box running RHEL 4, per your suggestion on #cfengine-dev, the new pull request will just return false if the redhat_4 or centos_4 class is defined.
A quick test:
# /var/cfengine/bin/cf-promises --show-classes --debug | grep -i Xen
verbose: Found Xen CPUID Leaf. eax=40000002 at base 40000000
verbose: This appears to be a xen hv system.
debug: Setting hard class: default:xen
debug: Setting hard class: default:xen_domu_hv
verbose: C: discovered hard class xen
verbose: C: discovered hard class xen_domu_hv
xen inventory,attribute_name=Virtual host,source=agent,hardclass
xen_domu_hv source=agent,hardclass
# /var/cfengine/bin/cf-promises --show-classes -Dredhat_4 --debug | grep -i Xen
debug: Skipping Xen_Hv_Check() to avoid a segfault on RHEL 4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @skreuzer! Retrying...
On Linux the Xen_Hv_Check() function can never be called because the stat on /proc/self/status will always return true which results in the OpenVZ_Detect() function always being called. Move the call to Xen_Hv_Check() to happen if the stat on /proc/xen/capabilities returns false since this is an indication that the host is acting as dom0. Change the Xen_Hv_Check() function to scan the CPUID range from 0x40000000 to 0x40010000 looking for Xen leaves. Return false if the class redhat_4 or centos_4 is defined to avoid a segfault on these platforms.
* skreuzer/xen: CFE-1563: Fix logic to detect when running under a Xen Hypervisor
Thanks @skreuzer, that did the trick! Merged manually after adding changelog and a reference to the |
Cherry-picked to 3.7.x and 3.10.x as well. |
On Linux the Xen_Hv_Check() function can never be called because the stat on
/proc/self/status will always return true which results in the OpenVZ_Detect()
function always being called. Move the call to Xen_Hv_Check() to happen if the
stat on /proc/xen/capabilities returns false since this is an indication that
the host is acting as dom0.
Change the Xen_Hv_Check() function to scan the CPUID range from
0x40000000 to 0x40010000 looking for Xen leaves.