Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upMemory balancing stops for some VMs #3265
Comments
andrewdavidwong
added
bug
C: core
labels
Nov 1, 2017
andrewdavidwong
added this to the Release 4.0 milestone
Nov 1, 2017
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
0spinboson
Nov 1, 2017
This appears to be quite extensive -- I'm also seeing VMs failing to claim more ram than assigned at boot. New with r4rc2. Worked around it for now by increasing the minimums, but it seems to affect both fedora- and debian-based VMs.
0spinboson
commented
Nov 1, 2017
•
|
This appears to be quite extensive -- I'm also seeing VMs failing to claim more ram than assigned at boot. New with r4rc2. Worked around it for now by increasing the minimums, but it seems to affect both fedora- and debian-based VMs. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Nov 1, 2017
Member
I guess what is going on there:
When starting a VM, qubesd connects to qmemman to request memory for it. Then, it blocks qmemman until the VM is actually started (to not take its memory out in the meantime). This part "until the VM is actually started" is done by closing the connection to qmemman. If some other process is started there in the meantime, it may inherit the connection and keep it open (potentially forever), blocking qmemman. Solution: set this connection to be not inheritable, or at least O_CLOEXEC.
cc @woju
|
I guess what is going on there: |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Nov 9, 2017
Member
Hmm, my guess was wrong, at least partially. There is O_CLOEXEC already.
|
Hmm, my guess was wrong, at least partially. There is O_CLOEXEC already. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
na--
Nov 13, 2017
I'm hitting this one regularly, though it follows Murphy's law very well - it happens when it does the most damage, but I can't reliably reproduce it on purpose... Running sudo systemctl restart qubes-qmemman.service in dom0 temporarily fixes the issue most of the time, but that only works when I notice that the VM I am using is about to run out of RAM.
na--
commented
Nov 13, 2017
|
I'm hitting this one regularly, though it follows Murphy's law very well - it happens when it does the most damage, but I can't reliably reproduce it on purpose... Running |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Nov 14, 2017
Member
Can you check logs (from journalctl) of qmemman just before restarting it? See if there is anything about the mishandled VM (you can get its id from xl list).
|
Can you check logs (from |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
tasket
Nov 14, 2017
I have a VM (id=28) right now stuck at 614MB. The log shows dozens of "qmemman.daemon.algo[7972] ... balance_when_enough_memory..."
Prior to that:
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: do_balloon(memsize=525336576)
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: xenfree=57353491
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: balloon: dom 28 has actual memory 685740032
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: balloon: dom 30 has actual memory 1741400886
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: balloon: dom 0 has actual memory 2534057843
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: balloon: dom 7 has actual memory 408589538
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: balloon: dom 31 has actual memory 757534720
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: balloon: dom 11 has actual memory 526052701
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: req=520411885 avail=2005532084.8 donors=[('28', 216630476.8), ('30', 522474294.0), ('0', 760295974.2), ('7', 122589538), ('31', 275304857.59999996), ('11', 108236944.19999999)]
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: borrow 59023702.19658609 from 28
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: borrow 142354702.76372287 from 30
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: borrow 207152215.26228067 from 0
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: borrow 33401063.830964494 from 7
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: borrow 75010276.33917822 from 31
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: borrow 29490518.857267752 from 11
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: memset_reqs=[('28', 626716329), ('30', 1599046183), ('0', 2326905627), ('7', 375188474), ('31', 682524443), ('11', 496562182)]
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: mem-set domain 28 to 626716329
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: mem-set domain 30 to 1599046183
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: mem-set domain 0 to 2326905627
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: mem-set domain 7 to 375188474
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: mem-set domain 31 to 682524443
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: mem-set domain 11 to 496562182
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: xenfree=466552095
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: domain 28 stuck at 685740032
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: domain 31 stuck at 757534720
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: balloon: dom 30 has actual memory 1599046183
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: balloon: dom 0 has actual memory 2326905627
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: balloon: dom 7 has actual memory 375188474
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: balloon: dom 11 has actual memory 496562182
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: req=111213281 avail=1101198248.4 donors=[('30', 380119591.0), ('0', 553143758.2), ('7', 89188474), ('11', 78746425.19999999)]
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: borrow 40308876.53186579 from 30
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: borrow 58656812.17587132 from 0
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: borrow 9457779.266450705 from 7
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: borrow 8350477.075812187 from 11
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: memset_reqs=[('30', 1558737306), ('0', 2268248814), ('7', 365730694), ('11', 488211704)]
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: mem-set domain 30 to 1558737306
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: mem-set domain 0 to 2268248814
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: mem-set domain 7 to 365730694
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: mem-set domain 11 to 488211704
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: xenfree=582425147
Nov 14 16:32:23 dom0 qmemman.daemon.reqhandler[7972]: EOF
Nov 14 16:32:23 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=-3051430, total_mem_pref=3696504217.6000004, total_available_memory=981372870.4000001)
Nov 14 16:32:24 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=-3051430, total_mem_pref=3696504217.6000004, total_available_memory=981372870.4000001)
Nov 14 16:32:44 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=55778707, total_mem_pref=3737515827.2000003, total_available_memory=999191397.8)
Nov 14 16:36:12 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=97254461, total_mem_pref=3777632870.4000006, total_available_memory=1000550108.5999999)
Nov 14 16:36:12 dom0 qmemman.systemstate[7972]: stat: dom '28' act=643936256 pref=469109555.2
Nov 14 16:36:12 dom0 qmemman.systemstate[7972]: stat: dom '30' act=1558737306 pref=1218926592.0
Nov 14 16:36:12 dom0 qmemman.systemstate[7972]: stat: dom '0' act=2268248814 pref=1854890521.6000001
Nov 14 16:36:12 dom0 qmemman.systemstate[7972]: stat: dom '7' act=365730694 pref=286000000
Nov 14 16:36:12 dom0 qmemman.systemstate[7972]: stat: dom '31' act=699744256 pref=482229862.40000004
Nov 14 16:36:12 dom0 qmemman.systemstate[7972]: stat: dom '11' act=488211704 pref=417815756.8
Nov 14 16:36:12 dom0 qmemman.systemstate[7972]: stat: xenfree=149683261 memset_reqs=[('30', 1540231737), ('7', 361388683), ('0', 2343833722), ('11', 527950652)]
Nov 14 16:36:12 dom0 qmemman.systemstate[7972]: mem-set domain 30 to 1540231737
Nov 14 16:36:12 dom0 qmemman.systemstate[7972]: mem-set domain 7 to 361388683
Nov 14 16:36:12 dom0 qmemman.systemstate[7972]: mem-set domain 0 to 2343833722
Nov 14 16:36:12 dom0 qmemman.systemstate[7972]: mem-set domain 11 to 527950652
Nov 14 16:36:17 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5490789, total_mem_pref=3818921369.6000004, total_available_memory=959974213.4000001)
Nov 14 16:36:26 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5494853, total_mem_pref=3765039718.4000006, total_available_memory=1013859928.5999999)
Nov 14 16:36:35 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5494853, total_mem_pref=3805007667.2000003, total_available_memory=973891979.8)
Nov 14 16:43:47 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5494853, total_mem_pref=3845619916.8, total_available_memory=933279730.2)
Nov 14 16:43:56 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5494853, total_mem_pref=3783474176.0, total_available_memory=995425471.0)
Nov 14 16:44:04 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5494853, total_mem_pref=3823607193.6000004, total_available_memory=955292453.4000001)
Nov 14 16:49:51 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5494853, total_mem_pref=3865145958.4, total_available_memory=913753688.6000001)
Nov 14 17:36:42 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5624909, total_mem_pref=3917275750.4, total_available_memory=861753952.6000001)
Nov 14 17:37:18 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5624909, total_mem_pref=3917275750.4, total_available_memory=861753952.6000001)
Nov 14 17:37:19 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5624909, total_mem_pref=3917275750.4, total_available_memory=861753952.6000001)
Nov 14 17:37:19 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5624909, total_mem_pref=3917275750.4, total_available_memory=861753952.6000001)
Nov 14 17:37:20 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5624909, total_mem_pref=3917275750.4, total_available_memory=861753952.6000001)
Nov 14 17:37:20 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5624909, total_mem_pref=3917275750.4, total_available_memory=861753952.6000001)
Nov 14 17:37:21 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5624909, total_mem_pref=3917275750.4, total_available_memory=861753952.6000001)
Nov 14 17:37:21 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5620845, total_mem_pref=3917275750.4, total_available_memory=861749888.6000001)
Nov 14 17:37:22 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5624909, total_mem_pref=3917275750.4, total_available_memory=861753952.6000001)
Nov 14 17:37:23 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5624909, total_mem_pref=3917275750.4, total_available_memory=861753952.6000001)
Nov 14 17:37:24 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5624909, total_mem_pref=3917275750.4, total_available_memory=861753952.6000001)
Nov 14 17:37:25 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5624909, total_mem_pref=3917275750.4, total_available_memory=861753952.6000001)
Nov 14 17:37:27 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5624909, total_mem_pref=3917275750.4, total_available_memory=861753952.6000001)
Nov 14 17:37:30 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5624909, total_mem_pref=3917275750.4, total_available_memory=861753952.6000001)
Nov 14 17:37:32 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5624909,
total_mem_pref=3917275750.4, total_available_memory=861753952.6000001)
Nov 14 17:38:24 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5624909, total_mem_pref=3917275750.4, total_available_memory=861753952.6000001)
Nov 14 17:38:25 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=170938609, total_mem_pref=3917275750.4, total_available_memory=1027067652.6000001)
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: stat: dom '28' act=643936256 pref=801675264.0
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: stat: dom '30' act=1540231737 pref=1260465356.8
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: stat: dom '0' act=2343833722 pref=1900864844.8
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: stat: dom '7' act=361388683 pref=286000000
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: stat: dom '31' act=699744256 pref=482229862.40000004
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: stat: dom '11' act=527950652 pref=469945548.8
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: stat: xenfree=223367409 memset_reqs=[('7', 360625143), ('30', 1589354897), ('0', 2396851951), ('11', 592567066)]
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: mem-set domain 7 to 360625143
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: mem-set domain 30 to 1589354897
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: mem-set domain 0 to 2396851951
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: mem-set domain 11 to 592567066
Nov 14 17:38:25 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=368868182, total_mem_pref=3917275750.4, total_available_memory=1390991488.6000001)
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: stat: dom '28' act=643936256 pref=801675264.0
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: stat: dom '30' act=1589354897 pref=1260465356.8
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: stat: dom '0' act=2396851951 pref=1900864844.8
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: stat: dom '7' act=360625143 pref=286000000
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: stat: dom '31' act=699744256 pref=482229862.40000004
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: stat: dom '11' act=592567066 pref=469945548.8
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: stat: xenfree=421296982 memset_reqs=[('30', 1706337898), ('7', 387168624), ('0', 2573270028), ('11', 636182419)]
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: mem-set domain 30 to 1706337898
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: mem-set domain 7 to 387168624
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: mem-set domain 0 to 2573270028
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: mem-set domain 11 to 636182419
(Updated with more of the log.)
tasket
commented
Nov 14, 2017
•
|
I have a VM (id=28) right now stuck at 614MB. The log shows dozens of "qmemman.daemon.algo[7972] ... balance_when_enough_memory..." Prior to that:
(Updated with more of the log.) |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
tasket
commented
Nov 18, 2017
|
I saved another log from 'stuck' session today, if that is of interest. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Nov 18, 2017
Member
Something is clearly wrong with the algorithm deciding to which VM distribute available memory: it divide it across VMs already having more than requested, but not for those with actual need.
|
Something is clearly wrong with the algorithm deciding to which VM distribute available memory: it divide it across VMs already having more than requested, but not for those with actual need. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
Check if restarting |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
dakka2
commented
Jan 9, 2018
|
restarting qubes-qmemman.service doesn't seem to help for me. |
added a commit
to marmarek/qubes-core-admin
that referenced
this issue
Jan 11, 2018
added a commit
to marmarek/qubes-core-admin
that referenced
this issue
Jan 11, 2018
marmarek
closed this
in
QubesOS/qubes-core-admin#179
Jan 15, 2018
added a commit
to marmarek/qubes-core-admin
that referenced
this issue
Jan 18, 2018
added a commit
to marmarek/qubes-core-admin
that referenced
this issue
Jan 18, 2018
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
qubesos-bot
Jan 18, 2018
Automated announcement from builder-github
The package qubes-core-dom0-4.0.16-1.fc25 has been pushed to the r4.0 testing repository for dom0.
To test this update, please install it with the following command:
sudo qubes-dom0-update --enablerepo=qubes-dom0-current-testing
qubesos-bot
commented
Jan 18, 2018
|
Automated announcement from builder-github The package
|
qubesos-bot
added
the
r4.0-dom0-cur-test
label
Jan 18, 2018
qubesos-bot
referenced this issue
in QubesOS/updates-status
Jan 18, 2018
Closed
core-admin v4.0.16 (r4.0) #365
This was referenced Jan 19, 2018
qubesos-bot
referenced this issue
in QubesOS/updates-status
Jan 19, 2018
Closed
core-admin v4.0.18 (r4.0) #368
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
qubesos-bot
Feb 6, 2018
Automated announcement from builder-github
The package qubes-core-dom0-4.0.21-1.fc25 has been pushed to the r4.0 stable repository for dom0.
To install this update, please use the standard update command:
sudo qubes-dom0-update
Or update dom0 via Qubes Manager.
qubesos-bot
commented
Feb 6, 2018
|
Automated announcement from builder-github The package
Or update dom0 via Qubes Manager. |
tasket commentedOct 31, 2017
Qubes OS version:
R4.0rc2
Affected TemplateVMs:
All
Steps to reproduce the behavior:
Run memory-intensive programs like Firefox in more than one appVM.
Expected behavior:
VMs re-adjust memory size so neither one is starved of RAM.
Actual behavior:
Eventually one of the VMs will stop acquiring RAM, even when it is experiencing high demand when the other VMs are no longer running. Not uncommon to see a browsing VM stuck at 700-1100MB and swapping heavily even when other appVMs have exited and only other VMs are sys-net and VPN/sys-firewall.
VM sessions can become unusable due to this problem, and re-starting them is unlikely to work right away because issue #3221 seems related.
General notes:
Total system memory is 8GB.
Related issues: