Commit e23754e
committed
Merge: mm/memcg: Free percpu stats memory of dying memcg's
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/2580
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2176388
Upstream Status: RHEL-only
For systems with large number of CPUs, the majority of the memory
consumed by the mem_cgroup structure is actually the percpu stats
memory. When a large number of memory cgroups are continuously created
and destroyed (like in a container host), it is possible that more
and more mem_cgroup structures remained in the dying state holding up
increasing amount of percpu memory.
We can't free up the memory of the dying mem_cgroup structure due to
active references mainly from pages in the page cache. However, the
percpu stats memory allocated to that mem_cgroup is a different story.
There are 2 sets of percpu stat counters in the mem_cgroup structure
and the associated mem_cgroup_per_node structure.
- vmstats_percpu (struct mem_cgroup)
- lruvec_stat_percpu (struct mem_cgroup_per_node)
There is discussion upstream about the best way to handle dying memory
cgroups that hang around indefinitely, mostly due to shared memory. See
https://lwn.net/Articles/932070/ for more information. It looks like
a final solution may still need some more time.
This patch is a workaround by freeing the percpu stats memory associated
with a dying memory cgroup. This will eliminates the percpu memory
increase problem, but we will still see increase in slab memory
consumption associated with the dying memory cgroups. As a workaround,
it is not likely to be accepted upstream, but a lot of RHEL customers
are seeing this percpu memory increase problem.
A new percpu_stats_disabled variable is added to keep track of the
state of the percpu stats memory. If the variable is set, percpu stats
update will be disabled for that particular memcg and forwarded to a
parent memcg.
The disabling, flushing and freeing of the percpu stats memory is a
multi-step process.
The percpu_stats_disabled variable is set to MEMCG_PERCPU_STATS_DISABLED
first when the memcg is being set to an offline state. At this point,
the cgroup filesystem control files corresponding to the offline cgroups
is being removed and will no longer be visible in user space.
After a grace period with the help of rcu_work, no task should be
reading or updating percpu stats at that point. The percpu_stats_disabled
variable is then atomically set to PERCPU_STATS_FLUSHING before flushing
out the percpu stats and changing its state to PERCPU_STATS_FLUSHED.
The percpu memory is then freed and the state is changed to
PERCPU_STATS_FREED.
This will greatly reduce the amount of memory held up by dying memory
cgroups.
For the compiled RHEL9 kernel, memcg_vmstats_percpu and
lruvec_stats_percpu have a size of 1080 and 672 bytes respectively. The
mem_cgroup and mem_cgroup_per_node structures have a size of 2240 and
1096 bytes respectively. For a 2-socket 96-thread system, that means
each dying memory cgroup use 232,704 bytes of percpu data and 3,338
bytes of memcg slab data. The percpu/slab ratio is 69. The ratio can
be even higher for larger systems with many CPUs.
By freeing the percpu memory, the dying memory cgroups will now consume
much less memory than before.
This patch does introduce a bit of performance overhead when doing
memcg stat update especially __mod_memcg_lruvec_state().
This RHEL-only patch will be reverted when the upstream fix is finalized
and being merged into RHEL9.
Signed-off-by: Waiman Long <longman@redhat.com>
Approved-by: Rafael Aquini <aquini@redhat.com>
Approved-by: Aristeu Rozanski <arozansk@redhat.com>
Approved-by: Phil Auld <pauld@redhat.com>
Signed-off-by: Jan Stancek <jstancek@redhat.com>2 files changed
+125
-7
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
134 | 134 | | |
135 | 135 | | |
136 | 136 | | |
| 137 | + | |
| 138 | + | |
137 | 139 | | |
138 | 140 | | |
139 | 141 | | |
| |||
343 | 345 | | |
344 | 346 | | |
345 | 347 | | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
346 | 354 | | |
347 | 355 | | |
348 | 356 | | |
| |||
1013 | 1021 | | |
1014 | 1022 | | |
1015 | 1023 | | |
| 1024 | + | |
| 1025 | + | |
| 1026 | + | |
1016 | 1027 | | |
1017 | 1028 | | |
1018 | 1029 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
214 | 214 | | |
215 | 215 | | |
216 | 216 | | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
217 | 225 | | |
218 | 226 | | |
219 | 227 | | |
| |||
737 | 745 | | |
738 | 746 | | |
739 | 747 | | |
| 748 | + | |
| 749 | + | |
| 750 | + | |
| 751 | + | |
| 752 | + | |
| 753 | + | |
| 754 | + | |
| 755 | + | |
| 756 | + | |
| 757 | + | |
| 758 | + | |
| 759 | + | |
| 760 | + | |
| 761 | + | |
| 762 | + | |
| 763 | + | |
| 764 | + | |
| 765 | + | |
| 766 | + | |
| 767 | + | |
| 768 | + | |
| 769 | + | |
| 770 | + | |
| 771 | + | |
740 | 772 | | |
741 | 773 | | |
742 | 774 | | |
| |||
748 | 780 | | |
749 | 781 | | |
750 | 782 | | |
| 783 | + | |
751 | 784 | | |
752 | 785 | | |
753 | 786 | | |
| |||
758 | 791 | | |
759 | 792 | | |
760 | 793 | | |
| 794 | + | |
| 795 | + | |
| 796 | + | |
761 | 797 | | |
762 | 798 | | |
763 | 799 | | |
| |||
774 | 810 | | |
775 | 811 | | |
776 | 812 | | |
777 | | - | |
| 813 | + | |
778 | 814 | | |
779 | 815 | | |
780 | 816 | | |
| |||
838 | 874 | | |
839 | 875 | | |
840 | 876 | | |
| 877 | + | |
841 | 878 | | |
842 | 879 | | |
843 | 880 | | |
| |||
889 | 926 | | |
890 | 927 | | |
891 | 928 | | |
| 929 | + | |
892 | 930 | | |
893 | 931 | | |
894 | 932 | | |
| |||
913 | 951 | | |
914 | 952 | | |
915 | 953 | | |
| 954 | + | |
| 955 | + | |
| 956 | + | |
916 | 957 | | |
917 | 958 | | |
918 | 959 | | |
| |||
921 | 962 | | |
922 | 963 | | |
923 | 964 | | |
| 965 | + | |
| 966 | + | |
924 | 967 | | |
925 | 968 | | |
926 | 969 | | |
| |||
937 | 980 | | |
938 | 981 | | |
939 | 982 | | |
| 983 | + | |
| 984 | + | |
940 | 985 | | |
941 | 986 | | |
942 | 987 | | |
| |||
5220 | 5265 | | |
5221 | 5266 | | |
5222 | 5267 | | |
| 5268 | + | |
5223 | 5269 | | |
5224 | 5270 | | |
5225 | 5271 | | |
| |||
5232 | 5278 | | |
5233 | 5279 | | |
5234 | 5280 | | |
5235 | | - | |
| 5281 | + | |
5236 | 5282 | | |
5237 | 5283 | | |
5238 | 5284 | | |
| |||
5243 | 5289 | | |
5244 | 5290 | | |
5245 | 5291 | | |
5246 | | - | |
| 5292 | + | |
5247 | 5293 | | |
5248 | 5294 | | |
5249 | 5295 | | |
| |||
5318 | 5364 | | |
5319 | 5365 | | |
5320 | 5366 | | |
| 5367 | + | |
| 5368 | + | |
| 5369 | + | |
| 5370 | + | |
| 5371 | + | |
| 5372 | + | |
| 5373 | + | |
| 5374 | + | |
| 5375 | + | |
| 5376 | + | |
| 5377 | + | |
| 5378 | + | |
| 5379 | + | |
| 5380 | + | |
| 5381 | + | |
| 5382 | + | |
| 5383 | + | |
| 5384 | + | |
| 5385 | + | |
| 5386 | + | |
| 5387 | + | |
| 5388 | + | |
| 5389 | + | |
| 5390 | + | |
| 5391 | + | |
| 5392 | + | |
| 5393 | + | |
| 5394 | + | |
| 5395 | + | |
| 5396 | + | |
| 5397 | + | |
| 5398 | + | |
| 5399 | + | |
| 5400 | + | |
| 5401 | + | |
| 5402 | + | |
| 5403 | + | |
| 5404 | + | |
| 5405 | + | |
| 5406 | + | |
| 5407 | + | |
| 5408 | + | |
| 5409 | + | |
| 5410 | + | |
| 5411 | + | |
| 5412 | + | |
| 5413 | + | |
| 5414 | + | |
| 5415 | + | |
| 5416 | + | |
| 5417 | + | |
| 5418 | + | |
| 5419 | + | |
| 5420 | + | |
| 5421 | + | |
5321 | 5422 | | |
5322 | 5423 | | |
5323 | 5424 | | |
| |||
5417 | 5518 | | |
5418 | 5519 | | |
5419 | 5520 | | |
5420 | | - | |
| 5521 | + | |
5421 | 5522 | | |
5422 | 5523 | | |
5423 | 5524 | | |
| |||
5486 | 5587 | | |
5487 | 5588 | | |
5488 | 5589 | | |
| 5590 | + | |
| 5591 | + | |
| 5592 | + | |
5489 | 5593 | | |
5490 | 5594 | | |
5491 | 5595 | | |
| |||
6981 | 7085 | | |
6982 | 7086 | | |
6983 | 7087 | | |
| 7088 | + | |
6984 | 7089 | | |
6985 | 7090 | | |
6986 | 7091 | | |
| |||
6991 | 7096 | | |
6992 | 7097 | | |
6993 | 7098 | | |
| 7099 | + | |
| 7100 | + | |
6994 | 7101 | | |
6995 | | - | |
6996 | | - | |
6997 | | - | |
| 7102 | + | |
| 7103 | + | |
| 7104 | + | |
6998 | 7105 | | |
6999 | 7106 | | |
7000 | 7107 | | |
| |||
0 commit comments