New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added BadgerHold to Projects list #704

Merged
merged 1 commit into from Jan 30, 2019

Conversation

Projects
None yet
4 participants
@timshannon
Copy link
Contributor

timshannon commented Jan 30, 2019

I also wrote up a quick comparison of my findings between BoltDB and Badger.

https://tech.townsourced.com/post/boltdb-vs-badger/

If you feel anything is inaccurate, let me know and I'll update my post accordingly.

Thanks,


This change is Reviewable

@CLAassistant

This comment has been minimized.

Copy link

CLAassistant commented Jan 30, 2019

CLA assistant check
All committers have signed the CLA.

@martinmr martinmr merged commit e44ea26 into dgraph-io:master Jan 30, 2019

2 of 3 checks passed

continuous-integration/appveyor/pr AppVeyor build failed
Details
Unit Tests (badger) TeamCity build finished
Details
license/cla Contributor License Agreement is signed.
Details
@martinmr

This comment has been minimized.

Copy link
Contributor

martinmr commented Jan 30, 2019

I circulated the article among our team. I'll merge your PR for now.

@manishrjain

This comment has been minimized.

Copy link
Member

manishrjain commented Jan 30, 2019

Hey @timshannon , reading the article. Have some comments:

but I must admit, with all other things being equal, I’d prefer to manage an embedded database in a single file.

The benefit of having files be immutable is that they get rsync friendly. That was an explicit requirement for LevelDB at Google. Quoting Sanjay Ghemawat (Google):

It is actually easy layer an incremental backup system on the side if files are not being mutated in place (periodically rsync the directory contents).


Furthermore, you’ll always need to change those defaults because the data directories are stored in the options type.

That's intended. It allows flexibility in how you want Badger to behave, and we want users to have a look at the default options and tweak them.

This type of query is much harder to implement in Badger for Update and Delete queries than in BoltDB due to this single iterator limit from within R/W transactions.

Interesting. That limitation is easy to fix. If you file an issue, we can follow up on that.

@timshannon

This comment has been minimized.

Copy link
Contributor Author

timshannon commented Jan 30, 2019

Thanks for the feedback. I'll include those notes with the article, and open up an issue for the RW transactions.

@manishrjain

This comment has been minimized.

Copy link
Member

manishrjain commented Jan 30, 2019

I'm also not sure about the memory issue -- doubt Badger would hold onto 8GB memory, causing OOM. I tried to run badger fill, but only see usage in hundreds of MBs, not GBs. Would be worth if you could take a go heap profile and share that to see where Badger is hogging memory -- that'd also be a better way to determine the memory usage diff between Badger and Bolt.

@timshannon

This comment has been minimized.

Copy link
Contributor Author

timshannon commented Jan 30, 2019

Sure, I'll replicate where I was seeing the issue, and open an issue with the heap profile.

Keep in mind though, that my scenario was opening and closing a database file for each test. I'm not sure how real-world a scenario that is, and if it's worth spending time on.

@manishrjain

This comment has been minimized.

Copy link
Member

manishrjain commented Jan 30, 2019

We do that as well, open and close DB for each test:

func runBadgerTest(t *testing.T, opts *Options, test func(t *testing.T, db *DB)) {

@timshannon

This comment has been minimized.

Copy link
Contributor Author

timshannon commented Jan 30, 2019

Interesting. It very likely is an issue on my end then. I'll look into it and get back to you.

Thanks,

@timshannon

This comment has been minimized.

Copy link
Contributor Author

timshannon commented Jan 30, 2019

I haven't looked at them yet, but attached at the cpu.prof and mem.prof from a partial run of my test suite. I can't fun the full suite, because I run out of memory. This run is from ~120 different tests.

cpu and mem.prof.zip

I'll open up an issue, if I can determine that it is something in Badger that's consuming so much memory.

Thanks,

@timshannon timshannon referenced this pull request Jan 30, 2019

Closed

Badger support #41

@manishrjain

This comment has been minimized.

Copy link
Member

manishrjain commented Jan 30, 2019

Based on this, it seems like there are a lot of in-memory tables. It looks like you might have a lot of Badger DBs open, but not closed.

@timshannon

This comment has been minimized.

Copy link
Contributor Author

timshannon commented Jan 30, 2019

The code that wraps every single test call is here:

https://github.com/timshannon/badgerhold/blob/31ac73476bd3e189123142f709d57a16c08cf64a/store_test.go#L211

It opens the database, runs the tests, then closes the database, and removes the directory.

Is there some sort of additional cleanup that's happening on another go routine, or am I not closing the database correctly?

@manishrjain

This comment has been minimized.

Copy link
Member

manishrjain commented Jan 31, 2019

Your tempdir() is using /tmp directory, which is typically served out of RAM.
https://github.com/timshannon/badgerhold/blob/31ac73476bd3e189123142f709d57a16c08cf64a/store_test.go#L234

We use cur dir:

dir, err := ioutil.TempDir(".", "badger-test")

@timshannon

This comment has been minimized.

Copy link
Contributor Author

timshannon commented Jan 31, 2019

Yep, I thought the same thing. The process still gets killed after using all my memory when use a ioutil.TempDir(".", "badger-test").

I'll see if I can throw together a minimal gist that can replicate this. I'm not convinced it's an issue with Badger, but I appreciate your help.

Thanks,

@manishrjain

This comment has been minimized.

Copy link
Member

manishrjain commented Jan 31, 2019

I presume you tried setting options.TableLoadingMode to options.MemoryMap?

@timshannon

This comment has been minimized.

Copy link
Contributor Author

timshannon commented Feb 1, 2019

mem.zip

I've added some swap space and got a full run to complete. It looks like the real issue is on Update transactions. The memory seems to be properly reclaimed on the read only queries, but not on the update and delete queries.

I have yet to be able to recreate this on its own outside of my library. I can open and close a bunch of databases with iterators and updates, and the memory usage stays low and constant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment