New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve memory management #162

Closed
marmarek opened this Issue Mar 8, 2015 · 5 comments

Comments

Projects
None yet
2 participants
@marmarek
Member

marmarek commented Mar 8, 2015

Reported by joanna on 29 Mar 2011 13:09 UTC

  • I keep experiencing a situation when Qubes complains it cannot start a new VM because of lack of memory, despite the fact there is enough memory in the system -- it's just generously spread among the few VMs that have been started previously (and which don't need it for sure).
  • When running a lot of AppVMs (10+) for a long time, when suddenly the memory consumption goes up, the kernel in Dom0 might crash!!! This is likely due to OOM condition.

For the person that will be testing/debugging this -- please run 5-10 AppVMs, use Firefox + flash in each of it.

Migrated-From: https://wiki.qubes-os.org/ticket/162

@marmarek marmarek added this to the Release 1 Beta 1 milestone Mar 8, 2015

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Mar 8, 2015

Member

Comment by joanna on 29 Mar 2011 13:11 UTC
We should also reconsider running memory mgmt agent in Netvms -- with our new approach to handling resume, we should no longer fear about fragmentation in netvm.

Member

marmarek commented Mar 8, 2015

Comment by joanna on 29 Mar 2011 13:11 UTC
We should also reconsider running memory mgmt agent in Netvms -- with our new approach to handling resume, we should no longer fear about fragmentation in netvm.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Mar 8, 2015

Member

Modified by joanna on 30 Mar 2011 13:56 UTC

Member

marmarek commented Mar 8, 2015

Modified by joanna on 30 Mar 2011 13:56 UTC

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Mar 8, 2015

Member

Comment by rafal on 30 Mar 2011 14:46 UTC

and which don't need it for sure
If it is repeatable, please provide the output of "free" in each VM, including dom0, as well as /var/log/qubes/qmemman.log file.
it's just generously spread
I rather suspect it is done according to qmemman specification. If so, we may elect to change the specification, but the perfect solution simply does not exist.

kernel in Dom0 might crash!!! This is likely due to OOM condition.
If this is repeatable, try collecting kernel logs. There can be plethora of reasons. I will try this as well.

Member

marmarek commented Mar 8, 2015

Comment by rafal on 30 Mar 2011 14:46 UTC

and which don't need it for sure
If it is repeatable, please provide the output of "free" in each VM, including dom0, as well as /var/log/qubes/qmemman.log file.
it's just generously spread
I rather suspect it is done according to qmemman specification. If so, we may elect to change the specification, but the perfect solution simply does not exist.

kernel in Dom0 might crash!!! This is likely due to OOM condition.
If this is repeatable, try collecting kernel logs. There can be plethora of reasons. I will try this as well.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Mar 8, 2015

Member

Comment by rafal on 31 Mar 2011 10:14 UTC
Cannot reproduce the crash.
On 4G RAM, I can spawn 8 AppVMs, each playing the same 26 minutes long youtube video. I cannot start 9th one because there is no memory.
All flash players have completed successfully. No sign of malfunction in any domain.
Closing until someone can provide better instructions on how to reproduce the crash, or log information allowing to determine whether qmemman misbehaves.
Some log/statistics:
qmemman.log:
dom 17 is below pref, allowing balance
dom 11 act/pref 336355328 319834112.0
dom 10 act/pref 335888384 319392153.6
dom 13 act/pref 335196160 318731878.4
dom 12 act/pref 334426112 318002380.8
dom 15 act/pref 312086528 296756428.8
dom 14 act/pref 322461696 306623283.2
dom 17 act/pref 313569280 318561484.8
dom 16 act/pref 339533824 322858598.4
dom 0 act/pref 1071656960 1019018035.2
xenfree= 56152064 balance req: [334418582), ('10', 333956471), ('13', 333266087), ('12', 332503324), ('15', 310288554), ('14', 320605338), ('16', 337580986), ('0', 1065485369), ('17', 333087923)](('11',)
mem-set domain 11 to 334418582
mem-set domain 10 to 333956471
mem-set domain 13 to 333266087
mem-set domain 12 to 332503324
mem-set domain 15 to 310288554
mem-set domain 14 to 320605338
mem-set domain 16 to 337580986
mem-set domain 0 to 1065485369
mem-set domain 17 to 333087923

xentop, somewhere in the middle of the replay:

xentop - 12:00:59 Xen 3.4.3
10 domains: 3 running, 3 blocked, 0 paused, 0 crashed, 0 dying, 0 shutdown
Mem: 3983468k total, 3928636k used, 54832k free CPUs: 4 @ 3192MHz
NAME STATE CPU(sec) CPU(%) MEM(k) MEM(%) MAXMEM(k) MAXMEM(%) VCPUS NETS NETTX(k) NETRX(k) VBDS VBD_OO VBD_R
D VBD_WR SSID
Domain-0 -----r 1409 24.7 1040512 26.1 no limit n/a 4 0 0 0 0 0
0 0 0
netvm1 ------ 83 5.0 256000 6.4 409600 10.3 1 0 0 0 3 0 748
8 1673 0
test0 -----r 459 29.0 326128 8.2 3983360 100.0 4 0 0 0 3 6 1948
0 19701 0
test1 -----r 434 22.7 326580 8.2 3983360 100.0 4 0 0 0 3 14 2033
9 18974 0
test2 --b--- 228 13.0 324708 8.2 3983360 100.0 4 0 0 0 3 0 1015
4 6303 0
test3 --b--- 366 30.3 325452 8.2 3983360 100.0 4 0 0 0 3 30 1841
8 17889 0
test4 ------ 334 27.1 313088 7.9 3983360 100.0 4 0 0 0 3 52 2326
9 16143 0
test5 ------ 307 29.3 303016 7.6 3983360 100.0 4 0 0 0 3 8 2478
4 15680 0
test6 --b--- 266 30.4 329668 8.3 3983360 100.0 4 0 0 0 3 7 2485
1 14219 0
test7 ------ 212 31.2 325280 8.2 3983360 100.0 4 0 0 0 3 1 2647
8 11140 0

Member

marmarek commented Mar 8, 2015

Comment by rafal on 31 Mar 2011 10:14 UTC
Cannot reproduce the crash.
On 4G RAM, I can spawn 8 AppVMs, each playing the same 26 minutes long youtube video. I cannot start 9th one because there is no memory.
All flash players have completed successfully. No sign of malfunction in any domain.
Closing until someone can provide better instructions on how to reproduce the crash, or log information allowing to determine whether qmemman misbehaves.
Some log/statistics:
qmemman.log:
dom 17 is below pref, allowing balance
dom 11 act/pref 336355328 319834112.0
dom 10 act/pref 335888384 319392153.6
dom 13 act/pref 335196160 318731878.4
dom 12 act/pref 334426112 318002380.8
dom 15 act/pref 312086528 296756428.8
dom 14 act/pref 322461696 306623283.2
dom 17 act/pref 313569280 318561484.8
dom 16 act/pref 339533824 322858598.4
dom 0 act/pref 1071656960 1019018035.2
xenfree= 56152064 balance req: [334418582), ('10', 333956471), ('13', 333266087), ('12', 332503324), ('15', 310288554), ('14', 320605338), ('16', 337580986), ('0', 1065485369), ('17', 333087923)](('11',)
mem-set domain 11 to 334418582
mem-set domain 10 to 333956471
mem-set domain 13 to 333266087
mem-set domain 12 to 332503324
mem-set domain 15 to 310288554
mem-set domain 14 to 320605338
mem-set domain 16 to 337580986
mem-set domain 0 to 1065485369
mem-set domain 17 to 333087923

xentop, somewhere in the middle of the replay:

xentop - 12:00:59 Xen 3.4.3
10 domains: 3 running, 3 blocked, 0 paused, 0 crashed, 0 dying, 0 shutdown
Mem: 3983468k total, 3928636k used, 54832k free CPUs: 4 @ 3192MHz
NAME STATE CPU(sec) CPU(%) MEM(k) MEM(%) MAXMEM(k) MAXMEM(%) VCPUS NETS NETTX(k) NETRX(k) VBDS VBD_OO VBD_R
D VBD_WR SSID
Domain-0 -----r 1409 24.7 1040512 26.1 no limit n/a 4 0 0 0 0 0
0 0 0
netvm1 ------ 83 5.0 256000 6.4 409600 10.3 1 0 0 0 3 0 748
8 1673 0
test0 -----r 459 29.0 326128 8.2 3983360 100.0 4 0 0 0 3 6 1948
0 19701 0
test1 -----r 434 22.7 326580 8.2 3983360 100.0 4 0 0 0 3 14 2033
9 18974 0
test2 --b--- 228 13.0 324708 8.2 3983360 100.0 4 0 0 0 3 0 1015
4 6303 0
test3 --b--- 366 30.3 325452 8.2 3983360 100.0 4 0 0 0 3 30 1841
8 17889 0
test4 ------ 334 27.1 313088 7.9 3983360 100.0 4 0 0 0 3 52 2326
9 16143 0
test5 ------ 307 29.3 303016 7.6 3983360 100.0 4 0 0 0 3 8 2478
4 15680 0
test6 --b--- 266 30.4 329668 8.3 3983360 100.0 4 0 0 0 3 7 2485
1 14219 0
test7 ------ 212 31.2 325280 8.2 3983360 100.0 4 0 0 0 3 1 2647
8 11140 0

@marmarek marmarek added the worksforme label Mar 8, 2015

@marmarek marmarek closed this Mar 8, 2015

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Mar 8, 2015

Member

Comment by rafal on 5 Apr 2011 09:04 UTC
One problem case has been deduced from the logs. Fix at
http://git.qubes-os.org/?p=rafal/core.git;a=commit;h=37e06d19e4339abb3cfd8e17b6c2b05cc73caef8

Member

marmarek commented Mar 8, 2015

Comment by rafal on 5 Apr 2011 09:04 UTC
One problem case has been deduced from the logs. Fix at
http://git.qubes-os.org/?p=rafal/core.git;a=commit;h=37e06d19e4339abb3cfd8e17b6c2b05cc73caef8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment