New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory balancing stops for some VMs #3265

Closed
tasket opened this Issue Oct 31, 2017 · 12 comments

Comments

Projects
None yet
7 participants
@tasket

tasket commented Oct 31, 2017

Qubes OS version:

R4.0rc2

Affected TemplateVMs:

All


Steps to reproduce the behavior:

Run memory-intensive programs like Firefox in more than one appVM.

Expected behavior:

VMs re-adjust memory size so neither one is starved of RAM.

Actual behavior:

Eventually one of the VMs will stop acquiring RAM, even when it is experiencing high demand when the other VMs are no longer running. Not uncommon to see a browsing VM stuck at 700-1100MB and swapping heavily even when other appVMs have exited and only other VMs are sys-net and VPN/sys-firewall.

VM sessions can become unusable due to this problem, and re-starting them is unlikely to work right away because issue #3221 seems related.

General notes:

Total system memory is 8GB.


Related issues:

@andrewdavidwong andrewdavidwong added this to the Release 4.0 milestone Nov 1, 2017

@0spinboson

This comment has been minimized.

Show comment
Hide comment
@0spinboson

0spinboson Nov 1, 2017

This appears to be quite extensive -- I'm also seeing VMs failing to claim more ram than assigned at boot. New with r4rc2. Worked around it for now by increasing the minimums, but it seems to affect both fedora- and debian-based VMs.

0spinboson commented Nov 1, 2017

This appears to be quite extensive -- I'm also seeing VMs failing to claim more ram than assigned at boot. New with r4rc2. Worked around it for now by increasing the minimums, but it seems to affect both fedora- and debian-based VMs.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Nov 1, 2017

Member

I guess what is going on there:
When starting a VM, qubesd connects to qmemman to request memory for it. Then, it blocks qmemman until the VM is actually started (to not take its memory out in the meantime). This part "until the VM is actually started" is done by closing the connection to qmemman. If some other process is started there in the meantime, it may inherit the connection and keep it open (potentially forever), blocking qmemman. Solution: set this connection to be not inheritable, or at least O_CLOEXEC.
cc @woju

Member

marmarek commented Nov 1, 2017

I guess what is going on there:
When starting a VM, qubesd connects to qmemman to request memory for it. Then, it blocks qmemman until the VM is actually started (to not take its memory out in the meantime). This part "until the VM is actually started" is done by closing the connection to qmemman. If some other process is started there in the meantime, it may inherit the connection and keep it open (potentially forever), blocking qmemman. Solution: set this connection to be not inheritable, or at least O_CLOEXEC.
cc @woju

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Nov 9, 2017

Member

Hmm, my guess was wrong, at least partially. There is O_CLOEXEC already.

Member

marmarek commented Nov 9, 2017

Hmm, my guess was wrong, at least partially. There is O_CLOEXEC already.

@na--

This comment has been minimized.

Show comment
Hide comment
@na--

na-- Nov 13, 2017

I'm hitting this one regularly, though it follows Murphy's law very well - it happens when it does the most damage, but I can't reliably reproduce it on purpose... Running sudo systemctl restart qubes-qmemman.service in dom0 temporarily fixes the issue most of the time, but that only works when I notice that the VM I am using is about to run out of RAM.

na-- commented Nov 13, 2017

I'm hitting this one regularly, though it follows Murphy's law very well - it happens when it does the most damage, but I can't reliably reproduce it on purpose... Running sudo systemctl restart qubes-qmemman.service in dom0 temporarily fixes the issue most of the time, but that only works when I notice that the VM I am using is about to run out of RAM.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Nov 14, 2017

Member

Can you check logs (from journalctl) of qmemman just before restarting it? See if there is anything about the mishandled VM (you can get its id from xl list).

Member

marmarek commented Nov 14, 2017

Can you check logs (from journalctl) of qmemman just before restarting it? See if there is anything about the mishandled VM (you can get its id from xl list).

@tasket

This comment has been minimized.

Show comment
Hide comment
@tasket

tasket Nov 14, 2017

I have a VM (id=28) right now stuck at 614MB. The log shows dozens of "qmemman.daemon.algo[7972] ... balance_when_enough_memory..."

Prior to that:

Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: do_balloon(memsize=525336576)
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: xenfree=57353491
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: balloon: dom 28 has actual memory 685740032
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: balloon: dom 30 has actual memory 1741400886
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: balloon: dom 0 has actual memory 2534057843
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: balloon: dom 7 has actual memory 408589538
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: balloon: dom 31 has actual memory 757534720
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: balloon: dom 11 has actual memory 526052701
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: req=520411885 avail=2005532084.8 donors=[('28', 216630476.8), ('30', 522474294.0), ('0', 760295974.2), ('7', 122589538), ('31', 275304857.59999996), ('11', 108236944.19999999)]
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: borrow 59023702.19658609 from 28
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: borrow 142354702.76372287 from 30
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: borrow 207152215.26228067 from 0
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: borrow 33401063.830964494 from 7
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: borrow 75010276.33917822 from 31
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: borrow 29490518.857267752 from 11
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: memset_reqs=[('28', 626716329), ('30', 1599046183), ('0', 2326905627), ('7', 375188474), ('31', 682524443), ('11', 496562182)]
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: mem-set domain 28 to 626716329
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: mem-set domain 30 to 1599046183
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: mem-set domain 0 to 2326905627
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: mem-set domain 7 to 375188474
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: mem-set domain 31 to 682524443
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: mem-set domain 11 to 496562182
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: xenfree=466552095
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: domain 28 stuck at 685740032
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: domain 31 stuck at 757534720
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: balloon: dom 30 has actual memory 1599046183
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: balloon: dom 0 has actual memory 2326905627
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: balloon: dom 7 has actual memory 375188474
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: balloon: dom 11 has actual memory 496562182
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: req=111213281 avail=1101198248.4 donors=[('30', 380119591.0), ('0', 553143758.2), ('7', 89188474), ('11', 78746425.19999999)]
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: borrow 40308876.53186579 from 30
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: borrow 58656812.17587132 from 0
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: borrow 9457779.266450705 from 7
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: borrow 8350477.075812187 from 11
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: memset_reqs=[('30', 1558737306), ('0', 2268248814), ('7', 365730694), ('11', 488211704)]
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: mem-set domain 30 to 1558737306
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: mem-set domain 0 to 2268248814
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: mem-set domain 7 to 365730694
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: mem-set domain 11 to 488211704
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: xenfree=582425147
Nov 14 16:32:23 dom0 qmemman.daemon.reqhandler[7972]: EOF
Nov 14 16:32:23 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=-3051430, total_mem_pref=3696504217.6000004, total_available_memory=981372870.4000001)
Nov 14 16:32:24 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=-3051430, total_mem_pref=3696504217.6000004, total_available_memory=981372870.4000001)
Nov 14 16:32:44 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=55778707, total_mem_pref=3737515827.2000003, total_available_memory=999191397.8)
Nov 14 16:36:12 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=97254461, total_mem_pref=3777632870.4000006, total_available_memory=1000550108.5999999)
Nov 14 16:36:12 dom0 qmemman.systemstate[7972]: stat: dom '28' act=643936256 pref=469109555.2
Nov 14 16:36:12 dom0 qmemman.systemstate[7972]: stat: dom '30' act=1558737306 pref=1218926592.0
Nov 14 16:36:12 dom0 qmemman.systemstate[7972]: stat: dom '0' act=2268248814 pref=1854890521.6000001
Nov 14 16:36:12 dom0 qmemman.systemstate[7972]: stat: dom '7' act=365730694 pref=286000000
Nov 14 16:36:12 dom0 qmemman.systemstate[7972]: stat: dom '31' act=699744256 pref=482229862.40000004
Nov 14 16:36:12 dom0 qmemman.systemstate[7972]: stat: dom '11' act=488211704 pref=417815756.8
Nov 14 16:36:12 dom0 qmemman.systemstate[7972]: stat: xenfree=149683261 memset_reqs=[('30', 1540231737), ('7', 361388683), ('0', 2343833722), ('11', 527950652)]
Nov 14 16:36:12 dom0 qmemman.systemstate[7972]: mem-set domain 30 to 1540231737
Nov 14 16:36:12 dom0 qmemman.systemstate[7972]: mem-set domain 7 to 361388683
Nov 14 16:36:12 dom0 qmemman.systemstate[7972]: mem-set domain 0 to 2343833722
Nov 14 16:36:12 dom0 qmemman.systemstate[7972]: mem-set domain 11 to 527950652
Nov 14 16:36:17 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5490789, total_mem_pref=3818921369.6000004, total_available_memory=959974213.4000001)
Nov 14 16:36:26 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5494853, total_mem_pref=3765039718.4000006, total_available_memory=1013859928.5999999)
Nov 14 16:36:35 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5494853, total_mem_pref=3805007667.2000003, total_available_memory=973891979.8)
Nov 14 16:43:47 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5494853, total_mem_pref=3845619916.8, total_available_memory=933279730.2)
Nov 14 16:43:56 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5494853, total_mem_pref=3783474176.0, total_available_memory=995425471.0)
Nov 14 16:44:04 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5494853, total_mem_pref=3823607193.6000004, total_available_memory=955292453.4000001)
Nov 14 16:49:51 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5494853, total_mem_pref=3865145958.4, total_available_memory=913753688.6000001)
Nov 14 17:36:42 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5624909, total_mem_pref=3917275750.4, total_available_memory=861753952.6000001)
Nov 14 17:37:18 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5624909, total_mem_pref=3917275750.4, total_available_memory=861753952.6000001)
Nov 14 17:37:19 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5624909, total_mem_pref=3917275750.4, total_available_memory=861753952.6000001)
Nov 14 17:37:19 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5624909, total_mem_pref=3917275750.4, total_available_memory=861753952.6000001)
Nov 14 17:37:20 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5624909, total_mem_pref=3917275750.4, total_available_memory=861753952.6000001)
Nov 14 17:37:20 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5624909, total_mem_pref=3917275750.4, total_available_memory=861753952.6000001)
Nov 14 17:37:21 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5624909, total_mem_pref=3917275750.4, total_available_memory=861753952.6000001)
Nov 14 17:37:21 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5620845, total_mem_pref=3917275750.4, total_available_memory=861749888.6000001)
Nov 14 17:37:22 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5624909, total_mem_pref=3917275750.4, total_available_memory=861753952.6000001)
Nov 14 17:37:23 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5624909, total_mem_pref=3917275750.4, total_available_memory=861753952.6000001)
Nov 14 17:37:24 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5624909, total_mem_pref=3917275750.4, total_available_memory=861753952.6000001)
Nov 14 17:37:25 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5624909, total_mem_pref=3917275750.4, total_available_memory=861753952.6000001)
Nov 14 17:37:27 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5624909, total_mem_pref=3917275750.4, total_available_memory=861753952.6000001)
Nov 14 17:37:30 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5624909, total_mem_pref=3917275750.4, total_available_memory=861753952.6000001)
Nov 14 17:37:32 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5624909,
total_mem_pref=3917275750.4, total_available_memory=861753952.6000001)
Nov 14 17:38:24 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5624909, total_mem_pref=3917275750.4, total_available_memory=861753952.6000001)
Nov 14 17:38:25 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=170938609, total_mem_pref=3917275750.4, total_available_memory=1027067652.6000001)
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: stat: dom '28' act=643936256 pref=801675264.0
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: stat: dom '30' act=1540231737 pref=1260465356.8
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: stat: dom '0' act=2343833722 pref=1900864844.8
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: stat: dom '7' act=361388683 pref=286000000
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: stat: dom '31' act=699744256 pref=482229862.40000004
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: stat: dom '11' act=527950652 pref=469945548.8
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: stat: xenfree=223367409 memset_reqs=[('7', 360625143), ('30', 1589354897), ('0', 2396851951), ('11', 592567066)]
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: mem-set domain 7 to 360625143
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: mem-set domain 30 to 1589354897
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: mem-set domain 0 to 2396851951
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: mem-set domain 11 to 592567066
Nov 14 17:38:25 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=368868182, total_mem_pref=3917275750.4, total_available_memory=1390991488.6000001)
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: stat: dom '28' act=643936256 pref=801675264.0
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: stat: dom '30' act=1589354897 pref=1260465356.8
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: stat: dom '0' act=2396851951 pref=1900864844.8
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: stat: dom '7' act=360625143 pref=286000000
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: stat: dom '31' act=699744256 pref=482229862.40000004
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: stat: dom '11' act=592567066 pref=469945548.8
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: stat: xenfree=421296982 memset_reqs=[('30', 1706337898), ('7', 387168624), ('0', 2573270028), ('11', 636182419)]
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: mem-set domain 30 to 1706337898
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: mem-set domain 7 to 387168624
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: mem-set domain 0 to 2573270028
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: mem-set domain 11 to 636182419

(Updated with more of the log.)

tasket commented Nov 14, 2017

I have a VM (id=28) right now stuck at 614MB. The log shows dozens of "qmemman.daemon.algo[7972] ... balance_when_enough_memory..."

Prior to that:

Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: do_balloon(memsize=525336576)
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: xenfree=57353491
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: balloon: dom 28 has actual memory 685740032
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: balloon: dom 30 has actual memory 1741400886
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: balloon: dom 0 has actual memory 2534057843
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: balloon: dom 7 has actual memory 408589538
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: balloon: dom 31 has actual memory 757534720
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: balloon: dom 11 has actual memory 526052701
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: req=520411885 avail=2005532084.8 donors=[('28', 216630476.8), ('30', 522474294.0), ('0', 760295974.2), ('7', 122589538), ('31', 275304857.59999996), ('11', 108236944.19999999)]
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: borrow 59023702.19658609 from 28
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: borrow 142354702.76372287 from 30
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: borrow 207152215.26228067 from 0
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: borrow 33401063.830964494 from 7
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: borrow 75010276.33917822 from 31
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: borrow 29490518.857267752 from 11
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: memset_reqs=[('28', 626716329), ('30', 1599046183), ('0', 2326905627), ('7', 375188474), ('31', 682524443), ('11', 496562182)]
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: mem-set domain 28 to 626716329
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: mem-set domain 30 to 1599046183
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: mem-set domain 0 to 2326905627
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: mem-set domain 7 to 375188474
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: mem-set domain 31 to 682524443
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: mem-set domain 11 to 496562182
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: xenfree=466552095
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: domain 28 stuck at 685740032
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: domain 31 stuck at 757534720
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: balloon: dom 30 has actual memory 1599046183
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: balloon: dom 0 has actual memory 2326905627
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: balloon: dom 7 has actual memory 375188474
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: balloon: dom 11 has actual memory 496562182
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: req=111213281 avail=1101198248.4 donors=[('30', 380119591.0), ('0', 553143758.2), ('7', 89188474), ('11', 78746425.19999999)]
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: borrow 40308876.53186579 from 30
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: borrow 58656812.17587132 from 0
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: borrow 9457779.266450705 from 7
Nov 14 16:32:19 dom0 qmemman.daemon.algo[7972]: borrow 8350477.075812187 from 11
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: memset_reqs=[('30', 1558737306), ('0', 2268248814), ('7', 365730694), ('11', 488211704)]
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: mem-set domain 30 to 1558737306
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: mem-set domain 0 to 2268248814
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: mem-set domain 7 to 365730694
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: mem-set domain 11 to 488211704
Nov 14 16:32:19 dom0 qmemman.systemstate[7972]: xenfree=582425147
Nov 14 16:32:23 dom0 qmemman.daemon.reqhandler[7972]: EOF
Nov 14 16:32:23 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=-3051430, total_mem_pref=3696504217.6000004, total_available_memory=981372870.4000001)
Nov 14 16:32:24 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=-3051430, total_mem_pref=3696504217.6000004, total_available_memory=981372870.4000001)
Nov 14 16:32:44 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=55778707, total_mem_pref=3737515827.2000003, total_available_memory=999191397.8)
Nov 14 16:36:12 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=97254461, total_mem_pref=3777632870.4000006, total_available_memory=1000550108.5999999)
Nov 14 16:36:12 dom0 qmemman.systemstate[7972]: stat: dom '28' act=643936256 pref=469109555.2
Nov 14 16:36:12 dom0 qmemman.systemstate[7972]: stat: dom '30' act=1558737306 pref=1218926592.0
Nov 14 16:36:12 dom0 qmemman.systemstate[7972]: stat: dom '0' act=2268248814 pref=1854890521.6000001
Nov 14 16:36:12 dom0 qmemman.systemstate[7972]: stat: dom '7' act=365730694 pref=286000000
Nov 14 16:36:12 dom0 qmemman.systemstate[7972]: stat: dom '31' act=699744256 pref=482229862.40000004
Nov 14 16:36:12 dom0 qmemman.systemstate[7972]: stat: dom '11' act=488211704 pref=417815756.8
Nov 14 16:36:12 dom0 qmemman.systemstate[7972]: stat: xenfree=149683261 memset_reqs=[('30', 1540231737), ('7', 361388683), ('0', 2343833722), ('11', 527950652)]
Nov 14 16:36:12 dom0 qmemman.systemstate[7972]: mem-set domain 30 to 1540231737
Nov 14 16:36:12 dom0 qmemman.systemstate[7972]: mem-set domain 7 to 361388683
Nov 14 16:36:12 dom0 qmemman.systemstate[7972]: mem-set domain 0 to 2343833722
Nov 14 16:36:12 dom0 qmemman.systemstate[7972]: mem-set domain 11 to 527950652
Nov 14 16:36:17 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5490789, total_mem_pref=3818921369.6000004, total_available_memory=959974213.4000001)
Nov 14 16:36:26 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5494853, total_mem_pref=3765039718.4000006, total_available_memory=1013859928.5999999)
Nov 14 16:36:35 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5494853, total_mem_pref=3805007667.2000003, total_available_memory=973891979.8)
Nov 14 16:43:47 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5494853, total_mem_pref=3845619916.8, total_available_memory=933279730.2)
Nov 14 16:43:56 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5494853, total_mem_pref=3783474176.0, total_available_memory=995425471.0)
Nov 14 16:44:04 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5494853, total_mem_pref=3823607193.6000004, total_available_memory=955292453.4000001)
Nov 14 16:49:51 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5494853, total_mem_pref=3865145958.4, total_available_memory=913753688.6000001)
Nov 14 17:36:42 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5624909, total_mem_pref=3917275750.4, total_available_memory=861753952.6000001)
Nov 14 17:37:18 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5624909, total_mem_pref=3917275750.4, total_available_memory=861753952.6000001)
Nov 14 17:37:19 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5624909, total_mem_pref=3917275750.4, total_available_memory=861753952.6000001)
Nov 14 17:37:19 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5624909, total_mem_pref=3917275750.4, total_available_memory=861753952.6000001)
Nov 14 17:37:20 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5624909, total_mem_pref=3917275750.4, total_available_memory=861753952.6000001)
Nov 14 17:37:20 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5624909, total_mem_pref=3917275750.4, total_available_memory=861753952.6000001)
Nov 14 17:37:21 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5624909, total_mem_pref=3917275750.4, total_available_memory=861753952.6000001)
Nov 14 17:37:21 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5620845, total_mem_pref=3917275750.4, total_available_memory=861749888.6000001)
Nov 14 17:37:22 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5624909, total_mem_pref=3917275750.4, total_available_memory=861753952.6000001)
Nov 14 17:37:23 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5624909, total_mem_pref=3917275750.4, total_available_memory=861753952.6000001)
Nov 14 17:37:24 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5624909, total_mem_pref=3917275750.4, total_available_memory=861753952.6000001)
Nov 14 17:37:25 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5624909, total_mem_pref=3917275750.4, total_available_memory=861753952.6000001)
Nov 14 17:37:27 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5624909, total_mem_pref=3917275750.4, total_available_memory=861753952.6000001)
Nov 14 17:37:30 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5624909, total_mem_pref=3917275750.4, total_available_memory=861753952.6000001)
Nov 14 17:37:32 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5624909,
total_mem_pref=3917275750.4, total_available_memory=861753952.6000001)
Nov 14 17:38:24 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=5624909, total_mem_pref=3917275750.4, total_available_memory=861753952.6000001)
Nov 14 17:38:25 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=170938609, total_mem_pref=3917275750.4, total_available_memory=1027067652.6000001)
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: stat: dom '28' act=643936256 pref=801675264.0
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: stat: dom '30' act=1540231737 pref=1260465356.8
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: stat: dom '0' act=2343833722 pref=1900864844.8
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: stat: dom '7' act=361388683 pref=286000000
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: stat: dom '31' act=699744256 pref=482229862.40000004
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: stat: dom '11' act=527950652 pref=469945548.8
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: stat: xenfree=223367409 memset_reqs=[('7', 360625143), ('30', 1589354897), ('0', 2396851951), ('11', 592567066)]
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: mem-set domain 7 to 360625143
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: mem-set domain 30 to 1589354897
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: mem-set domain 0 to 2396851951
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: mem-set domain 11 to 592567066
Nov 14 17:38:25 dom0 qmemman.daemon.algo[7972]: balance_when_enough_memory(xen_free_memory=368868182, total_mem_pref=3917275750.4, total_available_memory=1390991488.6000001)
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: stat: dom '28' act=643936256 pref=801675264.0
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: stat: dom '30' act=1589354897 pref=1260465356.8
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: stat: dom '0' act=2396851951 pref=1900864844.8
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: stat: dom '7' act=360625143 pref=286000000
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: stat: dom '31' act=699744256 pref=482229862.40000004
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: stat: dom '11' act=592567066 pref=469945548.8
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: stat: xenfree=421296982 memset_reqs=[('30', 1706337898), ('7', 387168624), ('0', 2573270028), ('11', 636182419)]
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: mem-set domain 30 to 1706337898
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: mem-set domain 7 to 387168624
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: mem-set domain 0 to 2573270028
Nov 14 17:38:25 dom0 qmemman.systemstate[7972]: mem-set domain 11 to 636182419

(Updated with more of the log.)

@tasket

This comment has been minimized.

Show comment
Hide comment
@tasket

tasket Nov 18, 2017

I saved another log from 'stuck' session today, if that is of interest.

tasket commented Nov 18, 2017

I saved another log from 'stuck' session today, if that is of interest.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Nov 18, 2017

Member

Something is clearly wrong with the algorithm deciding to which VM distribute available memory: it divide it across VMs already having more than requested, but not for those with actual need.

Member

marmarek commented Nov 18, 2017

Something is clearly wrong with the algorithm deciding to which VM distribute available memory: it divide it across VMs already having more than requested, but not for those with actual need.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Nov 18, 2017

Member

Check if restarting qubes-qmemman.service helps.

Member

marmarek commented Nov 18, 2017

Check if restarting qubes-qmemman.service helps.

@dakka2

This comment has been minimized.

Show comment
Hide comment
@dakka2

dakka2 Jan 9, 2018

restarting qubes-qmemman.service doesn't seem to help for me.

dakka2 commented Jan 9, 2018

restarting qubes-qmemman.service doesn't seem to help for me.

marmarek added a commit to marmarek/qubes-core-admin that referenced this issue Jan 11, 2018

qmemman: clear "not responding" flags when VM require more memory
Clear slow_memset_react/no_progress flags when VM request more memory
than it have assigned. If there is some available, it may be given to
such VM, solving the original problem (not reacting to balloon down
request). In any case, qmemman algorithm should not try to take away
memory from under-provisioned VM.

Fixes QubesOS/qubes-issues#3265

marmarek added a commit to marmarek/qubes-core-admin that referenced this issue Jan 11, 2018

qmemman: request VMs balloon down with 16MB safety margin
It looks like Linux balloon driver do not always precisely respect
requested target memory, but perform some rounding. Also, in some cases
(HVM domains), VM do not see all the memory that Xen have assigned to it
- there are some additional Xen pools for internal usage.
Include 16MB safety margin in memory requests to account for those two
things. This will avoid setting "no_response" flag for most of VMs.

QubesOS/qubes-issues#3265

@marmarek marmarek referenced this issue in QubesOS/qubes-core-admin Jan 11, 2018

Merged

Qmemman fixes #179

marmarek added a commit to marmarek/qubes-core-admin that referenced this issue Jan 18, 2018

qmemman: fix early crash
clear_outdated_error_markers crashes if memory stats are not retrieved
yet. In practice it crashes at the very first call during daemon
startup, making the whole qmemman unusable.

This fixes bf4306b
    qmemman: clear "not responding" flags when VM require more memory

QubesOS/qubes-issues#3265

marmarek added a commit to marmarek/qubes-core-admin that referenced this issue Jan 18, 2018

qmemman: fix early crash
clear_outdated_error_markers crashes if memory stats are not retrieved
yet. In practice it crashes at the very first call during daemon
startup, making the whole qmemman unusable.

This fixes bf4306b
    qmemman: clear "not responding" flags when VM require more memory

QubesOS/qubes-issues#3265
@qubesos-bot

This comment has been minimized.

Show comment
Hide comment
@qubesos-bot

qubesos-bot Jan 18, 2018

Automated announcement from builder-github

The package qubes-core-dom0-4.0.16-1.fc25 has been pushed to the r4.0 testing repository for dom0.
To test this update, please install it with the following command:

sudo qubes-dom0-update --enablerepo=qubes-dom0-current-testing

Changes included in this update

Automated announcement from builder-github

The package qubes-core-dom0-4.0.16-1.fc25 has been pushed to the r4.0 testing repository for dom0.
To test this update, please install it with the following command:

sudo qubes-dom0-update --enablerepo=qubes-dom0-current-testing

Changes included in this update

@qubesos-bot

This comment has been minimized.

Show comment
Hide comment
@qubesos-bot

qubesos-bot Feb 6, 2018

Automated announcement from builder-github

The package qubes-core-dom0-4.0.21-1.fc25 has been pushed to the r4.0 stable repository for dom0.
To install this update, please use the standard update command:

sudo qubes-dom0-update

Or update dom0 via Qubes Manager.

Changes included in this update

Automated announcement from builder-github

The package qubes-core-dom0-4.0.21-1.fc25 has been pushed to the r4.0 stable repository for dom0.
To install this update, please use the standard update command:

sudo qubes-dom0-update

Or update dom0 via Qubes Manager.

Changes included in this update

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment