Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added BadgerHold to Projects list #704

Merged
merged 1 commit into from Jan 30, 2019
Merged

Conversation

timshannon
Copy link
Contributor

@timshannon timshannon commented Jan 30, 2019

I also wrote up a quick comparison of my findings between BoltDB and Badger.

https://tech.townsourced.com/post/boltdb-vs-badger/

If you feel anything is inaccurate, let me know and I'll update my post accordingly.

Thanks,


This change is Reviewable

@CLAassistant
Copy link

CLAassistant commented Jan 30, 2019

CLA assistant check
All committers have signed the CLA.

@martinmr martinmr merged commit e44ea26 into dgraph-io:master Jan 30, 2019
@martinmr
Copy link
Contributor

I circulated the article among our team. I'll merge your PR for now.

@manishrjain
Copy link
Contributor

Hey @timshannon , reading the article. Have some comments:

but I must admit, with all other things being equal, I’d prefer to manage an embedded database in a single file.

The benefit of having files be immutable is that they get rsync friendly. That was an explicit requirement for LevelDB at Google. Quoting Sanjay Ghemawat (Google):

It is actually easy layer an incremental backup system on the side if files are not being mutated in place (periodically rsync the directory contents).


Furthermore, you’ll always need to change those defaults because the data directories are stored in the options type.

That's intended. It allows flexibility in how you want Badger to behave, and we want users to have a look at the default options and tweak them.

This type of query is much harder to implement in Badger for Update and Delete queries than in BoltDB due to this single iterator limit from within R/W transactions.

Interesting. That limitation is easy to fix. If you file an issue, we can follow up on that.

@timshannon
Copy link
Contributor Author

Thanks for the feedback. I'll include those notes with the article, and open up an issue for the RW transactions.

@manishrjain
Copy link
Contributor

manishrjain commented Jan 30, 2019

I'm also not sure about the memory issue -- doubt Badger would hold onto 8GB memory, causing OOM. I tried to run badger fill, but only see usage in hundreds of MBs, not GBs. Would be worth if you could take a go heap profile and share that to see where Badger is hogging memory -- that'd also be a better way to determine the memory usage diff between Badger and Bolt.

@timshannon
Copy link
Contributor Author

Sure, I'll replicate where I was seeing the issue, and open an issue with the heap profile.

Keep in mind though, that my scenario was opening and closing a database file for each test. I'm not sure how real-world a scenario that is, and if it's worth spending time on.

@manishrjain
Copy link
Contributor

manishrjain commented Jan 30, 2019

We do that as well, open and close DB for each test:

func runBadgerTest(t *testing.T, opts *Options, test func(t *testing.T, db *DB)) {

@timshannon
Copy link
Contributor Author

Interesting. It very likely is an issue on my end then. I'll look into it and get back to you.

Thanks,

@timshannon
Copy link
Contributor Author

I haven't looked at them yet, but attached at the cpu.prof and mem.prof from a partial run of my test suite. I can't fun the full suite, because I run out of memory. This run is from ~120 different tests.

cpu and mem.prof.zip

I'll open up an issue, if I can determine that it is something in Badger that's consuming so much memory.

Thanks,

@manishrjain
Copy link
Contributor

Based on this, it seems like there are a lot of in-memory tables. It looks like you might have a lot of Badger DBs open, but not closed.

@timshannon
Copy link
Contributor Author

The code that wraps every single test call is here:

https://github.com/timshannon/badgerhold/blob/31ac73476bd3e189123142f709d57a16c08cf64a/store_test.go#L211

It opens the database, runs the tests, then closes the database, and removes the directory.

Is there some sort of additional cleanup that's happening on another go routine, or am I not closing the database correctly?

@manishrjain
Copy link
Contributor

Your tempdir() is using /tmp directory, which is typically served out of RAM.
https://github.com/timshannon/badgerhold/blob/31ac73476bd3e189123142f709d57a16c08cf64a/store_test.go#L234

We use cur dir:

dir, err := ioutil.TempDir(".", "badger-test")

@timshannon
Copy link
Contributor Author

Yep, I thought the same thing. The process still gets killed after using all my memory when use a ioutil.TempDir(".", "badger-test").

I'll see if I can throw together a minimal gist that can replicate this. I'm not convinced it's an issue with Badger, but I appreciate your help.

Thanks,

@manishrjain
Copy link
Contributor

I presume you tried setting options.TableLoadingMode to options.MemoryMap?

@timshannon
Copy link
Contributor Author

mem.zip

I've added some swap space and got a full run to complete. It looks like the real issue is on Update transactions. The memory seems to be properly reclaimed on the read only queries, but not on the update and delete queries.

I have yet to be able to recreate this on its own outside of my library. I can open and close a bunch of databases with iterators and updates, and the memory usage stays low and constant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants