-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve System VM startup and memory usage [4.11] #3127
Conversation
Add vm.min_free_kbytes to sysctl periodically clear disk cache (depending on memory size) only start guest services specific to hypervisor use systemvm code to determine hypervisor type (not systemd) start cloud service at end of post init rather than through systemd reduce initial threads started for httpd fix vmtools config file disable all required services (do not start on boot) start only required services during post init.
@blueorangutan package |
@PaulAngus a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
Packaging result: ✔centos6 ✔centos7 ✔debian. JID-2530 |
@blueorangutan matrix |
@blueorangutan test matrix |
@PaulAngus a Trillian-Jenkins matrix job (centos6 mgmt + xs71, centos7 mgmt + vmware65, centos7 mgmt + kvmcentos7) has been kicked to run smoke tests |
# log = true | ||
|
||
# Disables core dumps on fatal errors; they're enabled by default. | ||
enableCoreDump = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work with openvmtools @PaulAngus as we don't install vmware tools, unless you've changed that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this a bug in the Debian distribution of open-vm-tools.
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=889884
HYPERVISOR=$(</etc/cloudstack-agent_detected_hypervisor) | ||
case $HYPERVISOR in | ||
xen-pv|xen-domU) | ||
systemctl stop ntpd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can have all three xen cases in this block.
@@ -583,14 +583,14 @@ setup_ntp() { | |||
} | |||
|
|||
routing_svcs() { | |||
echo "haproxy apache2" > /var/cache/cloud/enabled_svcs | |||
echo "cloud nfs-common portmap" > /var/cache/cloud/disabled_svcs | |||
echo "cloud haproxy apache2" > /var/cache/cloud/enabled_svcs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On VRs we don't need to run any agent that is run by the cloud service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes we could, i kept the separation in case it was useful in future, (this is why i left the virtualbox case in despite it not required at this time).
I can combine the xen cases if that is preferred.
@@ -108,6 +108,19 @@ function configure_services() { | |||
systemctl disable strongswan | |||
systemctl disable x11-common | |||
systemctl disable xl2tpd | |||
systemctl disable vgauth | |||
systemctl disable sshd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may cause error in somecases, ideally we want sshd enabled in all cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My recollection is that sshd is actually only started after the firewall rules are established, otherwise there will be a period of vulnerability during startup.
@@ -68,9 +68,10 @@ function install_packages() { | |||
python-flask \ | |||
haproxy \ | |||
radvd \ | |||
sharutils genisoimage aria2 \ | |||
sharutils genisoimage \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may break one of the register template features that uses aria2, @nvazquez can you comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the use of aria2 was scrapped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aria2 Used by KVM hosts with "bypass Sec.Stor." during template download.
Just tested Add template (from remote http) - agent/java download the template...
"Upload from local" uses POST, so should not be relevant.
strongswan libcharon-extra-plugins libstrongswan-extra-plugins \ | ||
virt-what open-vm-tools qemu-guest-agent hyperv-daemons | ||
virt-what open-vm-tools qemu-guest-agent hyperv-daemons \ | ||
haveged | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Trillian test result (tid-3309)
|
Trillian test result (tid-3313)
|
Trillian test result (tid-3310)
|
Note: Debian 9.7 has been released, build will fail due to incorrect iso url. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just have a few comments about the memory
|
||
# clear memory cache to ultimately reduce swapping | ||
|
||
sync && echo 1 > /proc/sys/vm/drop_caches |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand what you are doing there, but it is the Linux kernel which will remove files from it's case if it needs this memory for something else.
I wouldn't mess with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As recommended amount of memory for a Debian host is 4GB as we are working so far below that, I would say that we are operating well outside 'normal' practices. Personally, I have long questioned the use of Debian, and would have thought FreeBSD was a better choice. wrt to memory, the minimum memory for Debian is 96MB.
at 256MB RAM we can't really afford almost ANY caching of disk to memory. If we started at 4GB then i wouldn't dream of touching this.
The specific, highly replicable issue that is designed to fix, is that the cached disk access is not given back, instead the host starts swapping memory to disk, quickly making the VR unusably due to the CPU usage that this causes (see #3039)
I hope you can see why this sub-optimal fix is proposed to remedy the massively sub-optimal host spec that the VR is trying to operate in.
I should add that this was 'discovered' and tested through manually monitoring the VR memory usage over time and determining that the disk cache was the 'culprit' for the growing memory footprint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is safe in general, AND we are should not be concerned with disk latency, since we are doing network IO 99% of time.
|
||
phymem=$(free|awk '/^Mem:/{print $2}') | ||
if [ $phymem -lt 513000 ]; then | ||
sync && echo 1 > /proc/sys/vm/drop_caches |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same applies here. The kernel is the one who clears the page cache if it needs the memory for something else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As recommended amount of memory for a Debian host is 4GB as we are working so far below that, I would say that we are operating well outside 'normal' practices. Personally, I have long questioned the use of Debian, and would have thought FreeBSD was a better choice. wrt to memory, the minimum memory for Debian is 96MB.
at 256MB RAM we can't really afford almost ANY caching of disk to memory. If we started at 4GB then i wouldn't dream of touching this.
The specific, highly replicable issue that is designed to fix, is that the cached disk access is not given back, instead the host starts swapping memory to disk, quickly making the VR unusably due to the CPU usage that this causes (see #3039)
I hope you can see why this sub-optimal fix is proposed to remedy the massively sub-optimal host spec that the VR is trying to operate in.
This ridiculous level of outside intervention is only being applied to VRs with 512MB or less available to it, which itself is a ridiculously low amount of RAM to give a Debian host.
I should add that this was 'discovered' and tested through manually monitoring the VR memory usage over time and determining that the disk cache was the 'culprit' for the growing memory footprint.
echo "conntrackd keepalived haproxy dnsmasq" > /var/cache/cloud/disabled_svcs | ||
mkdir -p /var/log/cloud | ||
} | ||
|
||
setup_secstorage() { | ||
log_it "Setting up secondary storage system vm" | ||
sysctl vm.min_free_kbytes=8192 | ||
#sysctl vm.min_free_kbytes=8192 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why comment it and not remove it?
I understand @PaulAngus I'd really opt that the default VR then gets more memory. The changes you want to make are fine with me, but memory is cheap nowadays, so why don't we bump the default to 1GB. |
Thanks @wido. I could (if i was allowed) name at least 3 organisations with 3000+ VRs. That adds up to a lot of memory and hosts to put them in, plus cooling yadda, yadda, yadda.... |
@blueorangutan package |
@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
echo "dnsmasq" >> /var/cache/cloud/disabled_svcs | ||
else | ||
echo "dnsmasq" >> /var/cache/cloud/enabled_svcs | ||
echo "cloud dnsmasq" >> /var/cache/cloud/enabled_svcs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cloud service should only run on cpvm, ssvm.
|
||
configure_apache2 | ||
configure_strongswan | ||
configure_issue | ||
configure_cacerts | ||
|
||
# patch known systemd/sshd memory leak - https://github.com/systemd/systemd/issues/8015#issuecomment-476160981 | ||
echo '@include null' >> /etc/pam.d/systemd-user |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Packaging result: ✔centos6 ✔centos7 ✔debian. JID-2748 |
Per Paul's advise, closing this one in favour of the PR on master branch that brings in enhancements and memory optimisation changes #3126 |
Description
In order to reduce memory footprint and improve boot speed/predictability; The following changes have been made:
Fixes: #3039
Types of changes
Screenshots (if appropriate):
How Has This Been Tested?
Changes have been subjected to regression testing and a burn-in test where portforwarding rules were constantly updated over a period of 3 days. no swapping occured in a 256MB RAM VR.