Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decompression uses too much memory #1239

Closed
jarifibrahim opened this issue Mar 3, 2020 · 1 comment · Fixed by #1308
Closed

Decompression uses too much memory #1239

jarifibrahim opened this issue Mar 3, 2020 · 1 comment · Fixed by #1308
Assignees
Labels
area/performance Performance related issues. kind/enhancement Something could be better. priority/P0 Critical issue that requires immediate attention. status/accepted We accept to investigate or work on it.

Comments

@jarifibrahim
Copy link
Contributor

Badger uses compression to reduce disk space. When the block are decompressed, it takes up too much memory.

badger/table/table.go

Lines 625 to 629 in 5b4c0a6

case options.Snappy:
return snappy.Decode(nil, data)
case options.ZSTD:
return y.ZSTDDecompress(nil, data)
}

The crux of the problem is that the decompression call allocates new block every time we want to decompress an existing block. An ideal fix here would be to find a way to reuses the blocks of memory.
image

Taken from chat: https://dgraph.slack.com/archives/C13LH03RR/p1583223673220200?thread_ts=1583216894.212400&cid=C13LH03RR

@jarifibrahim jarifibrahim added kind/enhancement Something could be better. priority/P0 Critical issue that requires immediate attention. area/performance Performance related issues. status/accepted We accept to investigate or work on it. labels Mar 3, 2020
@jarifibrahim jarifibrahim self-assigned this Mar 3, 2020
@jarifibrahim jarifibrahim changed the title Badger uses too much of memory Decompression uses too much of memory Mar 3, 2020
@jarifibrahim jarifibrahim changed the title Decompression uses too much of memory Decompression uses too much memory Mar 3, 2020
@jarifibrahim
Copy link
Contributor Author

I've tried two different approaches to reduce the memory but both of them aren't working.

Attempt 1 - PR #1247 Reuse the original byte slice (the one read from the table). The problem with this approach is that the sst is mmap and if the slice of bytes read from the table is put into the sync pool, we might end up modifying the original contents of the file. This was seen in #1247 .

Attempt 2 - PR #1308 . Attempt to reuse the byte slice after the block iterator is done processing it. This also has some strange issue which I haven't been able to figure out. In simple terms, the contents of block change if the slices are reused. This is definitely an issue with the implementation in #1308 but I haven't been able to fix it. I've spent significant time trying to debug this but haven't found the issue yet.

jarifibrahim pushed a commit that referenced this issue May 13, 2020
This commit uses a sync pool to hold the decompression buffers. A buffer
is added to the pool only if it was used for decompression. We don't
want to put buffers that were not used for decompression because these
buffers are read from mmaped SST files and any changes to these buffers
would lead to a segfault.

Fixes #1239
manishrjain pushed a commit to outcaste-io/outserv that referenced this issue Jul 6, 2022
This commit uses a sync pool to hold the decompression buffers. A buffer
is added to the pool only if it was used for decompression. We don't
want to put buffers that were not used for decompression because these
buffers are read from mmaped SST files and any changes to these buffers
would lead to a segfault.

Fixes dgraph-io/badger#1239
acodereviewersbestfriend52 added a commit to acodereviewersbestfriend52/badger that referenced this issue Aug 3, 2024
This commit uses a sync pool to hold the decompression buffers. A buffer
is added to the pool only if it was used for decompression. We don't
want to put buffers that were not used for decompression because these
buffers are read from mmaped SST files and any changes to these buffers
would lead to a segfault.

Fixes dgraph-io/badger#1239
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/performance Performance related issues. kind/enhancement Something could be better. priority/P0 Critical issue that requires immediate attention. status/accepted We accept to investigate or work on it.
Development

Successfully merging a pull request may close this issue.

1 participant