Fix memory leak caused by object pool and zero copy by storyicon · Pull Request #2839 · cortexproject/cortex

storyicon · 2020-07-06T09:47:35Z

What this PR does:
The purpose of this PR is to fix the memory leak of Cortex Ingester. After this PR is merged, it is expected to reduce the runtime memory of Ingester by at least 30%.

Which issue(s) this PR fixes:
Fixes #2665

Although the PR code looks very simple, the principle inside is more complicated. I try to make it as easy to understand as possible. If there are errors in some parts, please correct me.
When troubleshooting the memory problem of Ingester, I simplified Ingester.Push to make it do almost nothing. Like below：

// Push implements client.IngesterServer
func (i *Ingester) Push(ctx context.Context, req *client.WriteRequest) (*client.WriteResponse, error) {
    defer client.ReuseSlice(req.Timeseries)
    return &client.WriteResponse{}, nil
}

After the above modifications, Ingester still takes up very high memory (60G+), until OOM, I think it must be a problem with client.ReuseSlice.

However, when I commented out the reuse of TimeSeries, the problem was solved, Ingester only took up 700M memory, and it was very stable, like the following:

func ReuseTimeseries(ts *TimeSeries) {
	// ts.Labels = ts.Labels[:0]
	// ts.Samples = ts.Samples[:0]
	// timeSeriesPool.Put(ts)
}

By looking closely at the code, I found the source of the problem. First, in ReuseTimeseries，

ts.Labels = ts.Labels[:0]
ts.Samples = ts.Samples[:0]

It does not release the real underlying array. We know that slice’s real structure is as follows：

type sliceHeader struct {
    Length        int
    Capacity      int
    ZerothElement *byte
}

The above operation will only change the Length in sliceHeader to 0, the underlying array still cannot be released. Secondly, you can see in the deserialization of Label:

cortex/pkg/ingester/client/timeseries.go

Line 205 in 8fe97dc

bs.Value = yoloString(dAtA[iNdEx:postIndex])

cortex/pkg/ingester/client/timeseries.go

Lines 232 to 234 in 8fe97dc

    
           func yoloString(buf []byte) string { 
        
           	return *((*string)(unsafe.Pointer(&buf))) 
        
           }

In the deserialization, the zero-copy is used. Label.Name and Label.Value refer to the original []byte, which will cause the original request body to not be released.

This may have some abstractions, I can cite such an example：

The A request body has a total of 1M []byte (I call it RawSliceA). After Unmarshal, timeSeriesA0 contains 10 Labels. Due to the use of zero copy, these 10 Labels actually refer to RawSliceA.
After the A request is processed, timeSeriesA0 is put into timeSeriesPool, and the reference to RawSliceA is still not canceled at this time。
The B request body has a total of 500KB []byte. After Unmarshal, timeSeriesB0 contains 3 Labels. In Unmarshal, it takes and reuses timeSeriesA0 from timeSeriesPool. At this time, RawSliceA was released?
No, RawSliceA may not be released until the B request processing is completed, and even to the next request.
The whole process is like this：

This example is a bit similar to：golang/go#23199

For the problems mentioned above, I have the following thoughts:

I think there is no problem using zero copy like yoloString, because most of the bytes of RawSlice are describing Labels, which is cost-effective.
I think it is no problem to use memory pool for timeSeries, it can reuse Labels and Samples Slice.
Even though the GC will periodically clean up the Pool, a large number of consecutive requests will still cause a lot of memory to be wasted.
Before putting timeSeries into timeSeriesPool, must clear the Name and Value of timeSeries.Labels to dereference RawSlice.

After testing, this modification can greatly reduce the runtime memory of Ingester without introducing new problems.

before:

after:

(The illustration above was made after commenting out Push's real logic)

Signed-off-by: storyicon <storyicon@foxmail.com>

jtlisi

Great work figuring this out. This looks good to me and I’m excited to see what effects this change has on our environments.

gouthamve

Looks sane! Thanks for the deep dive into this!

bboreham · 2020-07-07T09:55:01Z

May I ask how large the incoming messages are? E.g. how many samples do you allow per Push() ?

bboreham · 2020-07-07T09:56:10Z

pkg/ingester/client/timeseries.go

 // ReuseTimeseries puts the timeseries back into a sync.Pool for reuse.
 func ReuseTimeseries(ts *TimeSeries) {
+	for i := 0; i < len(ts.Labels); i++ {
+		ts.Labels[i].Name = ""


Suggest // Name and Value may point into a large gRPC buffer, so clear the reference to allow GC

I accepted your suggestion, thanks!

Signed-off-by: storyicon <storyicon@foxmail.com>

storyicon · 2020-07-07T11:22:25Z

@bboreham Hello, in my production environment, the scrape_interval of Prometheus is 25s, Cortex receives 53 Pushes per second on average, and each Push contains 5530 Timeseries on average, among them:

A single TimeSeries has up to 16 Labels and at least 1 Label
Each TimeSeries contains only 1 sample.

On average, the buffer for each Push request is 1211066byte, of which samples occupy 88480byte and labels occupy 1122582byte.
At present, my cortex uses BlockStorage+S3, and it will always be OOM after running 10h. I am committed to reducing scrape_interval to 10s, and keep the memory of Cortex stable, I use three 252GB, 48Core linux machines, replication_factor is 2(configuring 3 will make OOM faster).
If you can give me some suggestions, I would appreciate it.

storyicon · 2020-07-07T11:58:19Z

Reproducing this problem is simple, I can give this Demo:

Test0

package main

import (
	"fmt"
	"runtime"
	"runtime/debug"
	"sync"
	"time"
	"unsafe"
)

var timeSeriesPool = sync.Pool{
	New: func() interface{} {
		return &TimeSeries{
			Labels: make([]Label, 0, 10),
		}
	},
}

func yoloString(buf []byte) string {
	return *((*string)(unsafe.Pointer(&buf)))
}

type TimeSeries struct {
	Labels []Label
}

type Label struct {
	Name  string
	Value string
}

func unmarshal(data []byte, labelsSize int) *TimeSeries {
	ts := timeSeriesPool.Get().(*TimeSeries)
	for i := 0; i < labelsSize; i++ {
		ts.Labels = append(ts.Labels, Label{
			Name:  yoloString(data[:5]),
			Value: yoloString(data[5:10]),
		})
	}
	return ts
}

func reuseTimeSeries(ts *TimeSeries) {
	// for i := 0; i < len(ts.Labels); i++ {
	// 	ts.Labels[i].Name = ""
	// 	ts.Labels[i].Value = ""
	// }
	ts.Labels = ts.Labels[:0]
	timeSeriesPool.Put(ts)
}

func Push(data []byte, labelSize int, withGC bool) {
	ts := unmarshal(data, labelSize)
	defer reuseTimeSeries(ts)
	if withGC {
		// Manually trigger GC
		runtime.GC()
		// Return memory to operating system
		debug.FreeOSMemory()
	}
	// Reference variables to prevent memory optimization
	fmt.Println("Push Done", ts)
}

func buildPushBuffer(size int, fillByte byte) []byte {
	data := make([]byte, size)
	for i := 0; i < len(data); i++ {
		// Fill bytes to prevent Unix memory optimization.
		// This is not needed on windows.
		data[i] = fillByte
	}
	return data
}

func main() {
         // allocate 500M memory and pass in Push
	Push(buildPushBuffer(1024*1024*500, 1), 10, false)
        // allocate 500M memory and pass in Push
	Push(buildPushBuffer(1024*1024*500, 1), 1, true)

	// sleep, view memory
	fmt.Println("sleep")
	time.Sleep(time.Hour)
}

Running the above code, you can see that the memory usage of the program is 1G instead of 500M, the principle is the same as the above picture.

Test1

Uncomment the following code in Test0 and re-run：

// for i := 0; i < len(ts.Labels); i++ {
// 	ts.Labels[i].Name = ""
// 	ts.Labels[i].Value = ""
// }

You can see that the memory usage is 500M instead of 1G, because the reference to the original buffer generated by the zero copy is cleared before it is put into timeSeriesPool.

Test2

Change the following code in Test0

ts.Labels = append(ts.Labels, Label{
    Name:  yoloString(data[:5]),
    Value: yoloString(data[5:10]),
})

into：

ts.Labels = append(ts.Labels, Label{
    Name:  string(data[:5]),
    Value: string(data[5:10]),
})

The memory usage should be only 30M+, because no zero copy is used, the ts from unmarshal no longer refer to the original buffer, and the two 500M buffers are cleaned up by the GC.

bboreham · 2020-07-07T12:20:03Z

(sorry my comments are somewhat duplicated in #2665)

I understand the behaviour in Go of the part you fixed. What I don't understand is the gap between Go heap of 1-3GB and a RSS of 90+GB.

In my Cortex clusters RSS is between 30% and 50% more than max Go heap.

pracucci

Thanks for working on this! Approving because I don't see any downside of doing it, but I would be glad to learn more about this, because it's quite surprising to me you're getting this benefit.

Few questions, please:

Could you share a heap pprof showing the memory utilization without the fix?
Could you share a graph to show how frequently the Go GC runs before and after the fix?

storyicon · 2020-07-07T15:27:39Z

@pracucci I'm sorry I will provide these data later, because it is already late at night in my city.

storyicon · 2020-07-07T15:30:32Z

@bboreham Hello, I replied in #2665 about the reason for the difference in our pprof data.

storyicon · 2020-07-08T08:32:41Z

@pracucci The GC gap is huge and even exceeds my expectations.

Before Fix

After Fix

In order to rule out other influences, I modified the Push function like this during whole testing(both Before Fix and After Fix):

func (i *Ingester) Push(ctx context.Context, req *client.WriteRequest) (*client.WriteResponse, error) {
	defer client.ReuseSlice(req.Timeseries)
	return &client.WriteResponse{}, nil
}

The only difference between Before Fix and After Fix is that After Fix is added:

for i := 0; i < len(ts.Labels); i++ {
    ts.Labels[i].Name = ""
    ts.Labels[i].Value = ""
}

That is, the content submitted in this PR.

The latest release version (cd9e38d) was used in the test.
Although this optimization looks exciting, at present I cannot guarantee that Cortex will still enjoy such a large memory advantage after restoring the normal logic of the Push function, because I guess there should be other object pools in Ingester that refer to some addresses that should be released.

pracucci · 2020-07-08T14:39:40Z

These charts looks something proving my gut feeling, that what you actually get is the GC running way more frequently. This explains why the "leak" is not visible in the heap, because it's all about how frequently the GC runs.

storyicon · 2020-07-08T15:00:47Z

@pracucci Your gut feeling is really great. Can you explain why GC just makes the number of heap objects drop, but fails to reduce ingester's memory resident? Do you think if I use ticker to manually trigger GC (Including debug.FreeOSMemory()）frequently, can I keep the memory at 500M instead of 50G?

pracucci · 2020-07-08T15:23:40Z

@pracucci Your gut feeling is really great! Can you explain why GC just makes the number of heap objects drop, but fails to reduce ingester's memory resident? Do you think if I use ticker to manually trigger GC (Including debug.FreeOSMemory()）frequently, can I keep the memory at 500M instead of 50G?

I don't have an answer (I would need to further investigate it, possibly being able to reproduce it locally). Manually triggering the GC could be an experiment you can do just for testing, but if this PR fixes your issue then I would just be happy and move on 😉

storyicon · 2020-07-08T15:46:11Z

@pracucci Your gut feeling is really great! Can you explain why GC just makes the number of heap objects drop, but fails to reduce ingester's memory resident? Do you think if I use ticker to manually trigger GC (Including debug.FreeOSMemory()）frequently, can I keep the memory at 500M instead of 50G?

I don't have an answer (I would need to further investigate it, possibly being able to reproduce it locally). Manually triggering the GC could be an experiment you can do just for testing, but if this PR fixes your issue then I would just be happy and move on 😉

@pracucci
These things are fascinating. I will do relevant experiments to find out. Pracucci, you are the first person to reply to me when I put forward issue under cortex project. Thank you! 😔

bboreham · 2020-07-08T15:49:00Z

Go has a mechanism to control how fast GC runs - the GOGC environment variable.
https://golang.org/pkg/runtime/debug/#SetGCPercent

Note frequent GC may chew up a lot of CPU.

storyicon · 2020-07-08T16:08:24Z

Go has a mechanism to control how fast GC runs - the GOGC environment variable.
https://golang.org/pkg/runtime/debug/#SetGCPercent

Note frequent GC may chew up a lot of CPU.

@bboreham
Thanks, bboreham. I will continue to track this problem. If there is an error in my statement of principle above, I hope you can correct me. In fact, I don't care who holds the truth, but what it is.☹️

bboreham · 2020-07-08T16:20:40Z

If there is an error in my statement of principle above, I hope you can correct me.

Sorry I don't know what this refers to.

storyicon · 2020-07-08T16:29:47Z

If there is an error in my statement of principle above, I hope you can correct me.

Sorry I don't know what this refers to.

Sorry, English is not my mother tongue. I mean what I explained in "Which issue(s) this PR fixes"

bboreham · 2020-07-08T16:53:28Z

Fixing the issue is fine; thank you for the PR.

My last comment referred to manually running GC, which is a totally different subject.

storyicon · 2020-07-08T17:01:19Z

Fixing the issue is fine; thank you for the PR.

My last comment referred to manually running GC, which is a totally different subject.

I understand, I haven't tried to use this function to control the behavior of GC. This is a new idea. I will continue to dive into it (includes manual triggering of GC tests), thank you!

pull-request-size bot added the size/XS label Jul 6, 2020

Fix memory leak caused by object pool and zero copy

aeff9c3

Signed-off-by: storyicon <storyicon@foxmail.com>

jtlisi approved these changes Jul 7, 2020

View reviewed changes

gouthamve approved these changes Jul 7, 2020

View reviewed changes

bboreham reviewed Jul 7, 2020

View reviewed changes

add some comments

819f1a9

Signed-off-by: storyicon <storyicon@foxmail.com>

pracucci approved these changes Jul 7, 2020

View reviewed changes

bboreham merged commit 4dce560 into cortexproject:master Jul 7, 2020

storyicon mentioned this pull request Jul 8, 2020

About the memory of ingester #2665

Closed

github-actions bot mentioned this pull request Feb 27, 2026

🌱 CNCF mission generation 2026-02-27 kubestellar/console-kb#6

Closed

	func yoloString(buf []byte) string {
	return ((string)(unsafe.Pointer(&buf)))
	}

Conversation

storyicon commented Jul 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jtlisi left a comment

Choose a reason for hiding this comment

Uh oh!

gouthamve left a comment

Choose a reason for hiding this comment

Uh oh!

bboreham commented Jul 7, 2020

Uh oh!

bboreham Jul 7, 2020

Choose a reason for hiding this comment

Uh oh!

storyicon Jul 7, 2020

Choose a reason for hiding this comment

Uh oh!

storyicon commented Jul 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

storyicon commented Jul 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test0

Test1

Test2

Uh oh!

bboreham commented Jul 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pracucci left a comment

Choose a reason for hiding this comment

Uh oh!

storyicon commented Jul 7, 2020

Uh oh!

storyicon commented Jul 7, 2020

Uh oh!

storyicon commented Jul 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Before Fix

After Fix

Uh oh!

pracucci commented Jul 8, 2020

Uh oh!

storyicon commented Jul 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pracucci commented Jul 8, 2020

Uh oh!

storyicon commented Jul 8, 2020

Uh oh!

bboreham commented Jul 8, 2020

Uh oh!

storyicon commented Jul 8, 2020

Uh oh!

bboreham commented Jul 8, 2020

Uh oh!

storyicon commented Jul 8, 2020

Uh oh!

bboreham commented Jul 8, 2020

Uh oh!

storyicon commented Jul 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

storyicon commented Jul 6, 2020 •

edited

Loading

storyicon commented Jul 7, 2020 •

edited

Loading

storyicon commented Jul 7, 2020 •

edited

Loading

bboreham commented Jul 7, 2020 •

edited

Loading

storyicon commented Jul 8, 2020 •

edited

Loading

storyicon commented Jul 8, 2020 •

edited

Loading

storyicon commented Jul 8, 2020 •

edited

Loading