Join GitHub today
Who stolen my VM's memory? #2516
My server‘s memory is gone
pcpu_get_vm_areas allocated much more memory
Container Linux Version
Azure VM Standard DS3 (4 core、14 GiB mem)
kernel memory leak？
Have no idea
The upstream v4.15 and v4.19 kernels have the same/similar issue and it looks people there have made patches:
Comment 7 for bug 1792349
Here the kernel version is v4.14.11-coreos, but obviously some of the patches listed in the above link are also required.
Can CoreOS please integrate these upstream fixes and generate a new build for songyingjun to test?
Thanks for tracking this down.
With the exception of critical security bugs, Container Linux typically waits for patches to make it into kernel releases before including them in our own releases. Bad backports happen and we want to be sure they're reviewed/tested by the upstream kernel folks first.
Do you know if there are plans to backport these to 4.14.x upstream?
I believe this issue has explained some very odd behavior we've been seeing in our Kubernetes cluster recently. We run around 100 nodes that use Container Linux and run thousands of pods. Our infrastructure has been experiencing unexplained OOM events across the board recently, despite nothing changing on the workload or resourcing side.
Do you have any estimates for how long that process typically takes? If this is a fix that could take on the order of weeks to months to have a patch for, we will need to switch our nodes to run a different OS as this is killing our application.
Once the needed patches get in a 4.14.x and 4.19.x release we can do an release with those new kernels in beta/alpha. After the changes have baked in beta a bit we can promote that to stable (~1 week).
I asked about the mm patches: https://www.lkml.org/lkml/2018/11/2/160
The other patches have the "Cc: firstname.lastname@example.org" tag, so will be automatically backported to the longterm kernels including v4.14.y.
It looks the ipv6 fix (ipv6: fix memory leak on dst->_metrics) doesn't apply to v4.14.y.
But I guess the required backport for v4.14.y may not finish very fast -- at least a few weeks may be required...