Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default cirrus VGA driver unsuitable for Windows due to pathological performance issues in text mode #215

Closed
ghost opened this issue May 19, 2013 · 4 comments

Comments

@ghost
Copy link

ghost commented May 19, 2013

Hi,

The Default QEMU cirrus VGA driver has a pathological performance issue in text mode. This can result in a Windows Server taking several hours (3+) to reboot after performing Windows updates where the updates trigger a walk of the registry on boot up.

Windows will write progress to the VGA driver in text mode as it processes each registry update. As the cirrus driver is slow to do this, it slows the registry update procedure down to a crawl. A big registry can have tens or hundreds of keys, resulting in hundreds of thousands of character writes being sent to the VGA driver, so latency here can make a very big difference to the reboot time.

We have seen this numerous times (5+), where a Windows Server reboots after updates and sits performing the registry changes for 3+ hours, resulting in unhappy customers. We initially had no clue what was causing the poor performance, but as the VM showed hardly any CPU or Disk IO whilst it performed these operations, I suspected it might be at the QEMU layer perhaps with the VGA driver.

Today, finally lots of Googling confirmed this - others have seen the same behaviour:

http://lists.gnu.org/archive/html/qemu-devel/2011-06/msg02105.html
https://bugs.launchpad.net/qemu/+bug/589231
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=574988
etc...

This will eventually catch out a large number of people running Windows on SmartOS. Changing to the "std" VGA driver works around the problem:

vmadm update UUID vga=std

After performing a test on a VM that took 4 hours to reboot after windows updates, setting to vga=std reduced the boot time to under 2 minutes.

I'd suggest that SmartOS strongly consider switching the default VGA driver to "std" to work around this. We're going to patch /usr/vm/node_modules/VM.js in our build to do just that. Alternatively there may be a fix in the cirrus driver/QEMU but I haven't looked into this.

I have blogged about the issue here so more people can quickly find the problem and the workaround:

http://blogs.everycity.co.uk/alasdair/2013/05/windows-updates-slow-takes-hours-after-reboot-updating-registry-under-qemu-kvm/

Thanks,

Alasdair

@joshwilsdon
Copy link
Contributor

Thanks for finding this. I'll discuss internally here to see if there's any reason we shouldn't switch to std as the default. That said, you really shouldn't need to modify VM.js to work around this. Is the 'vga' option not working in your VM payloads? The man page says:

     vga:

         This property allows one to specify the VGA emulation to be used by
         KVM VMs. The default is 'cirrus'. NOTE: with the Qemu bundled in
         SmartOS qxl and xenfb do not work.

         type: string (one of: 'cirrus','std','vmware','qxl','xenfb')
         vmtype: KVM
         listable: no
         create: yes
         update: yes
         default: 'cirrus'

if adding "vga": "std", to your payload does not work, please create a separate issue for that.

@ghost
Copy link
Author

ghost commented May 19, 2013

Hi Josh,

Thanks for responding. We run hundreds of VMs across a large fleet of servers with our own SmartOS build. Adding the vga property to all the VMs (which would work, the property works fine) is a lot more effort than just rolling the default change into our next SmartOS build.

Missing the vga property off a VM can result in what our Windows customers would classify as an outage, it's safer and easier for us to just modify the default. We already run with quite a few changes in our SmartOS build (such as resurrected Solaris 10 branded zone support, aggr changes, Nexenta mpt_sas drive timeout changes, SMF fixes from OI, etc) so one more isn't a big deal :-)

Cheers,

Alasdair

@gkyildirim
Copy link

Hi,

I believe this is done (by setting vga=std). Is it safe to close?

@ghost
Copy link
Author

ghost commented Sep 23, 2013

Hi,

This can be closed - the default was changed to vga=std so its all fine now.

Cheers,

Alasdair

@ghost ghost closed this as completed Sep 23, 2013
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants