Decompression uses too much memory #1239

jarifibrahim · 2020-03-03T09:02:00Z

Badger uses compression to reduce disk space. When the block are decompressed, it takes up too much memory.

Lines 625 to 629 in 5b4c0a6

    
           case options.Snappy: 
        
           	return snappy.Decode(nil, data) 
        
           case options.ZSTD: 
        
           	return y.ZSTDDecompress(nil, data) 
        
           }

The crux of the problem is that the decompression call allocates new block every time we want to decompress an existing block. An ideal fix here would be to find a way to reuses the blocks of memory.

Taken from chat: https://dgraph.slack.com/archives/C13LH03RR/p1583223673220200?thread_ts=1583216894.212400&cid=C13LH03RR

jarifibrahim · 2020-04-22T14:33:20Z

I've tried two different approaches to reduce the memory but both of them aren't working.

Attempt 1 - PR #1247 Reuse the original byte slice (the one read from the table). The problem with this approach is that the sst is mmap and if the slice of bytes read from the table is put into the sync pool, we might end up modifying the original contents of the file. This was seen in #1247 .

Attempt 2 - PR #1308 . Attempt to reuse the byte slice after the block iterator is done processing it. This also has some strange issue which I haven't been able to figure out. In simple terms, the contents of block change if the slices are reused. This is definitely an issue with the implementation in #1308 but I haven't been able to fix it. I've spent significant time trying to debug this but haven't found the issue yet.

This commit uses a sync pool to hold the decompression buffers. A buffer is added to the pool only if it was used for decompression. We don't want to put buffers that were not used for decompression because these buffers are read from mmaped SST files and any changes to these buffers would lead to a segfault. Fixes #1239

This commit uses a sync pool to hold the decompression buffers. A buffer is added to the pool only if it was used for decompression. We don't want to put buffers that were not used for decompression because these buffers are read from mmaped SST files and any changes to these buffers would lead to a segfault. Fixes dgraph-io/badger#1239

jarifibrahim added kind/enhancement Something could be better. priority/P0 Critical issue that requires immediate attention. area/performance Performance related issues. status/accepted We accept to investigate or work on it. labels Mar 3, 2020

jarifibrahim self-assigned this Mar 3, 2020

jarifibrahim changed the title ~~Badger uses too much of memory~~ Decompression uses too much of memory Mar 3, 2020

jarifibrahim changed the title ~~Decompression uses too much of memory~~ Decompression uses too much memory Mar 3, 2020

jarifibrahim mentioned this issue Mar 5, 2020

Reuse buffer for decompression #1247

Closed

looztra mentioned this issue Apr 10, 2020

Excessive memory consumption? salesforce/sloop#112

Closed

jarifibrahim mentioned this issue Apr 17, 2020

Buffer pool for decompression #1308

Merged

jarifibrahim closed this as completed in #1308 May 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decompression uses too much memory #1239

Decompression uses too much memory #1239

jarifibrahim commented Mar 3, 2020

jarifibrahim commented Apr 22, 2020

Decompression uses too much memory #1239

Decompression uses too much memory #1239

Comments

jarifibrahim commented Mar 3, 2020

jarifibrahim commented Apr 22, 2020