Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Badger Allocates A Lot Of Memory When Iterating Over Large Key Value Stores #1326

Closed
bonedaddy opened this issue May 10, 2020 · 12 comments
Closed
Labels
area/performance Performance related issues. kind/enhancement Something could be better. priority/P2 Somehow important but would not block a release. status/accepted We accept to investigate or work on it.

Comments

@bonedaddy
Copy link

bonedaddy commented May 10, 2020

What version of Go are you using (go version)?

go version go1.14.2 linux/amd64

What operating system are you using?

NAME="Ubuntu"
VERSION="18.04.4 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.4 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

What version of Badger are you using?

v2.0.3

Does this issue reproduce with the latest master?

Haven't tried

Steps to Reproduce the issue

  1. Store a ton of data in your key-value store (in this case 1.7TB)
  2. Restart badger
  3. After service startup iterate over all keys in the key-store

What Badger options were set?

Default options with the following modifications:

	DefaultOptions = Options{
		GcDiscardRatio: 0.2,
		GcInterval:     15 * time.Minute,
		GcSleep:        10 * time.Second,
		Options:        badger.DefaultOptions(""),
	}
	DefaultOptions.Options.CompactL0OnClose = false
	DefaultOptions.Options.Truncate = true

I've also set the following:

  • ValueLogLoadingMode = FileIO
  • TableLoadingMode = FileIO
  • SyncWrites = false

What did you do?

At the start of my service, the key-value store will be iterated over to announce to peers data in the key-value store. Unfortunately however when storing a large amount of data in that key-value store (1.7TB), iterative over the kv allocates a large amount of memory.

What did you expect to see?

Being able to iterate over the keys without allocating a large amount of memory

What did you see instead?

2GB+ of allocations when iterating over all the keys in a large datastore of 1.7TB

Additional Information

I recorded the following profile which shows what's responsible for the memory allocations:

 2239.12MB 57.90% 57.90%  2239.12MB 57.90%  github.com/RTradeLtd/go-datastores/badger.(*txn).query
  687.09MB 17.77% 75.66%   687.09MB 17.77%  github.com/dgraph-io/badger/v2/table.(*Table).read
  513.05MB 13.27% 88.93%  1139.44MB 29.46%  github.com/RTradeLtd/go-datastores/badger.(*txn).query.func1
   83.20MB  2.15% 91.08%    83.20MB  2.15%  github.com/dgraph-io/badger/v2/skl.newArena
   69.16MB  1.79% 92.87%   109.17MB  2.82%  github.com/dgraph-io/badger/v2/pb.(*TableIndex).Unmarshal
      40MB  1.03% 93.90%       40MB  1.03%  github.com/dgraph-io/badger/v2/pb.(*BlockOffset).Unmarshal

It looks like this is because I have a function that is iterative over all the keys in the key-value store to broadcast the keys to another peer. I'm not sure why this would result in a massive amount of memory being allocated though.

This seems somewhat related to other reported issues such as #1268. The usage of FileIO for table and value log loading mode seems to decrease memory usage abit, however it seems like the overall process of reading keys and/org value from badger requires a lot of memory

@bonedaddy bonedaddy changed the title Badger Query Function Allocates A Lot Of Memory At Startup When Storing Large Amounts Of Data Badger Query Function Allocates A Lot Of Memory When Querying A Large Datastore May 10, 2020
@bonedaddy bonedaddy changed the title Badger Query Function Allocates A Lot Of Memory When Querying A Large Datastore Badger Allocates A Lot Of Memory When Iterative Over Large Key Value Stores May 10, 2020
@bonedaddy bonedaddy changed the title Badger Allocates A Lot Of Memory When Iterative Over Large Key Value Stores Badger Allocates A Lot Of Memory When Iterating Over Large Key Value Stores May 10, 2020
@jarifibrahim
Copy link
Contributor

@bonedaddy The high memory usage you're seeing comes from

badger/table/table.go

Lines 323 to 325 in 9459a24

res := make([]byte, sz)
nbr, err := t.fd.ReadAt(res, int64(off))
y.NumReads.Add(1)

I would've expected golang GC to take care of the allocations/reclaimation but maybe since we're allocating so many []byte slices, the GC doesn't run that often. I think we can optimize this by having one block buffer per table. That way it will be reused across multiple t.read calls.

Do you see high memory usage when you open file in memory map mode? The operating system should take care of moving memory-mapped pages from your memory to your disk when the memory usage is high.

@jarifibrahim jarifibrahim added area/performance Performance related issues. kind/enhancement Something could be better. priority/P2 Somehow important but would not block a release. status/accepted We accept to investigate or work on it. labels May 11, 2020
@bonedaddy
Copy link
Author

bonedaddy commented May 11, 2020

ah makes sense. A reusable buffer would definitely reduce memory a ton.

Do you see high memory usage when you open file in memory map mode? The operating system should take care of moving memory-mapped pages from your memory to your disk when the memory usage is high.

I tried setting table loading to memory map and kept the value loading to fileio and that increased memory consumption by a lot more. Previously my service was consuming 2.4GB of memory with both value and table loading set to fileio after iterating over all the keys, however when using memory map for table loading, and fileio for value loading memory jumped to 3.5GB

@jarifibrahim
Copy link
Contributor

@bonedaddy You should also set the KeepL0InMemory option to false. That would reduce the memory consumption by around 600 mb

@bonedaddy
Copy link
Author

@bonedaddy You should also set the KeepL0InMemory option to false. That would reduce the memory consumption by around 600 mb

Yep even with that, using mmap for TableLoading consumed the 3.5GB of memory, and both table loading + value loading at fileIO with L0InMemory to false, consumed the 2.4GB

@jarifibrahim
Copy link
Contributor

@bonedaddy can you get a memory profile when the memory usage is high?

@bonedaddy
Copy link
Author

@bonedaddy can you get a memory profile when the memory usage is high?

I can capture another one, but I included one when i opened up the issue, ill work on getting another profile though:

 2239.12MB 57.90% 57.90%  2239.12MB 57.90%  github.com/RTradeLtd/go-datastores/badger.(*txn).query
  687.09MB 17.77% 75.66%   687.09MB 17.77%  github.com/dgraph-io/badger/v2/table.(*Table).read
  513.05MB 13.27% 88.93%  1139.44MB 29.46%  github.com/RTradeLtd/go-datastores/badger.(*txn).query.func1
   83.20MB  2.15% 91.08%    83.20MB  2.15%  github.com/dgraph-io/badger/v2/skl.newArena
   69.16MB  1.79% 92.87%   109.17MB  2.82%  github.com/dgraph-io/badger/v2/pb.(*TableIndex).Unmarshal
      40MB  1.03% 93.90%       40MB  1.03%  github.com/dgraph-io/badger/v2/pb.(*BlockOffset).Unmarshal

@bonedaddy
Copy link
Author

Here's a capture from badger operating with table and file loading modes at FileIO under high usage. However this isn't related to the high memory usage reported by the query function, and is instead from me putting a lot of data into badger to capture a profile of the high memory usage when iterating over the keys:

badger_high_mem

@bonedaddy
Copy link
Author

Some more profiles

big_mem2

@jarifibrahim
Copy link
Contributor

Thanks @bonedaddy. I'll look it up and get back.

@bonedaddy
Copy link
Author

Thanks, let me know if I need to capture anymore profiles

@jarifibrahim
Copy link
Contributor

@bonedaddy How big are your values? Also, can you send me the memprofile file? The graph doesn't show the exact line which is consuming the memory.

@minhaj-shakeel
Copy link

Github issues have been deprecated.
This issue has been moved to discuss. You can follow the conversation there and also subscribe to updates by changing your notification preferences.

drawing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/performance Performance related issues. kind/enhancement Something could be better. priority/P2 Somehow important but would not block a release. status/accepted We accept to investigate or work on it.
Development

No branches or pull requests

3 participants