Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add badger db support #591

Open
joeblew99 opened this issue May 17, 2017 · 60 comments
Open

add badger db support #591

joeblew99 opened this issue May 17, 2017 · 60 comments
Milestone

Comments

@joeblew99
Copy link

https://github.com/dgraph-io/badger

I am proposing that this DB be supported by bleve.
Badger on this outperforms rocksdb, which is outstanding.

here is a blog post with the benchmarks
https://open.dgraph.io/post/badger/

My understanding is that they will be adding snapshotting also.

@mschoch
Copy link
Member

mschoch commented May 17, 2017

If someone from the community wants to add support it that would be great.

Internally we're focusing on support for moss in the bleve 1.x line. And in bleve 2.x we'll most likely be moving to a custom format and not using generic k/v store in the default index implementation.

@mschoch mschoch added this to the 1.X milestone May 17, 2017
@joeblew99
Copy link
Author

ok.. well good to know the roadmap.
i dont have time to do it. Will leave open ?

@mschoch
Copy link
Member

mschoch commented May 17, 2017

Sure, let's leave it open.

@mschoch mschoch reopened this May 17, 2017
@akhenakh
Copy link

I'm planning on giving it a shot.

Anyone else working on it?

@bt
Copy link

bt commented Jul 15, 2017

@akhenakh That would be awesome, please! :)

@akhenakh
Copy link

@bt
Copy link

bt commented Jul 25, 2017

Hey @akhenakh, great work so far!

I'm having a look at it and it's looking great. Was hoping I could help out.

I'm just taking a look at Badger and given that they don't have any plans to support snapshots (dgraph-io/badger#39), what's the plan here?

I believe they talked a bit about dumping the database concurrently, but the database won't be frozen in time.

@akhenakh
Copy link

Yeah the snapshot is not going to happen soon.
I'm using the KV interface outside of Bleve, so it won't be an issue for me when used in specific places.

But not sure how it would impact Bleve, any clue @bt ?

@mschoch
Copy link
Member

mschoch commented Jul 25, 2017

Pretty much all of the search side of Bleve requires a snapshot to work correctly. The reason is that multiple iterators are created, and they have to be guaranteed to the see the same underlying data. If they don't, you'll get inconsistent search results.

A simple example would be a phrase search for "fat cat". If version 1 of a document has "fat man" and the new version of a document has "old cat", without snapshots its possible that a search for the phrase "fat cat" is successful, even though neither the old nor the new version of the document ever contained that phrase.

You might think that atomic batches prevent this, but without snapshots, its possible for the first iterator (looking for "fat") sees the old version of the document, and second iterator (looking for "cat") sees the new version of the document.

@akhenakh
Copy link

thanks for the clarification @mschoch
Let's talk to the dgraph guys to see what they have in mind.

@bt
Copy link

bt commented Jul 25, 2017

Hey @mschoch, just as a clarification just to make sure I'm getting this correct:

If I was to execute a search, and whilst the search is running, the store is updated with a new document AND that document matches the search phrase, the search shouldn't see the new document, because the search would only be searching the snapshot that is made at the time the search was first executed. Am I correct?

In this case, even with the workaround that was suggested in the linked issue above would not work, as if the iterator moves as new data is indexed, it would pick up the new data as the search is run.

If that's true, then I guess the only way would be to implement snapshotting into Badger, if that's part of their plans.

@mschoch
Copy link
Member

mschoch commented Jul 25, 2017

  1. I don't claim to fully understand their proposed workaround. Certainly, streaming out a separate copy of the entire index, for every single search seems prohibitive. Could you get creative, and do this on some interval, and maybe that interval is tolerable delay for changes to show up in search results? Maybe thats OK for some applications, but probably not all.

  2. The issue is worse than just a single iterator seeing some new records. The problem is that search in bleve uses multiple iterators, and without snapshots, even if we try to create them all at the same time, they're really starting at slightly different points in time. This means for those iterators might see different key/value pairs associated with a single logical document. In the example I gave above, it can lead to scenarios where the search matches a phrase which didn't exist in either version of the document, it just appeared to match because 2 iterators were seeing into different points in time. Thats worse than just seeing the old or the new version.

@manishrjain
Copy link

(Author of Badger here)

While we don't have any plans to support snapshots natively in Badger, this can be very simple implementation above it.

You can create a simple layer above Badger, which appends current timestamp (or index) to keys as a suffix. Then if you need a snapshot at a given time, you can append that timestamp in the search key, and do an iteration from there. The first key you encounter would be the value just at or before the snapshot. This layer can have a very simple API; and would also provide others with a snapshot functionality above Badger, which would be nice.

@bt
Copy link

bt commented Jul 26, 2017

Hey @manishrjain, this sounds like a great idea!

So this implementation should be outside of Badger and act as some sort of intermediary "bridge" to support snapshotting between Badger and Bleve?

@manishrjain
Copy link

manishrjain commented Jul 26, 2017

Yeah, this should be outside of Badger. Can be a simple layer, which just adds the timestamp or index to the keys being written, and then on a lookup, can do a single key iteration. If you need some help, or some APIs in Badger, we'd be happy to provide -- we just don't want this versioning logic to live within Badger to keep things simple and performant.

For e.g., for iteration, we can probably provide an API which just gives you the first key on or after the search key -- this would avoid you having to create an iterator (and pay for the overhead).

@bt
Copy link

bt commented Jul 31, 2017

Hey @manishrjain, so just before I start hacking away at a solution for this, I just wanted to confirm my algorithm (sorry, I'm not too familiar with the underlying technology behind KV-stores).

Your proposed solution; as I understand, what you are saying is that I could build a layer on top of Badger, which would append both the key=value as well as <time>key=value entries into the KV store. Then, at the time of retrieval, I can search for <search_time>key which would return to me, the value of the last write for that key.

So, given the following scenario (where t is an example of time/Unix timestamp):

At t=1, key=value is set into the KV store. Therefore there will be two entries in the store:

key=value
1key=value

At t=5, key=value2 is set into the KV store. At this point, there will be three entries in the store:

key=value2
1key=value
5key=value2

If a snapshot is taken at t=9, therefore, we can retrieve values at the closet to time less than t=9, in this case, would be t=5, with the value of value2.

This is how I'm understanding the possible implementation of snapshotting into Badger. Please let me know if this is what you're thinking or if I'm just completely out of my mind 😄

@manishrjain
Copy link

manishrjain commented Jul 31, 2017

Close, but more complex than needed.

You just want to append a padded timestamp to the key as suffix (not prefix). So,

  • at t=1, you have key<t1>=value1
  • at t=5, you write key<t5>=value5. Thus, you have both key<t1> and key<t5>.

Then, when you need a snapshot for t=4, you can do a reverse iteration to get the first key which is lower than key<t4>. This should return key<t1>. If you want the latest key, then start doing reverse from key~, and you'll get key<t5>; and so on.

Note that binary.BigEndian gives you encoding in a way that the []byte repr would be sorted in the same order as the ints themselves. That's very useful here. So, you can convert timestamp to a positive integer, and use big endian encoding to create the suffix.

In fact, we can add an API called GetBefore(key), which can retrieve the key-value just before the specified key.

This works pretty well. The only annoying part is the need for doing reverse iteration. Forward iteration works nicer, easier to understand. But, would require the timestamp to be encoded in a way, where the latest timestamp generates the smallest byte slice. Not sure if there's already a way to do it nicely. Irrespective, the above would give you snapshots.

@bt
Copy link

bt commented Jul 31, 2017

Good to see I'm on the right track!

So if I was to implement the above, wouldn't there be some performance degrade given that we'll be storing the value multiple times? How would you propose to purge snapshots out?

@manishrjain
Copy link

Storing the value multiple times won't really degrade performance per se. But, just cause your value log and LSM tree to be bigger in size.

I think what you could do is to iterate over the LSM tree periodically and delete the keys below a certain time threshold. Badger has a key-only iteration that you can use for this purpose. If the LSM tree is in memory, as we recommend, it is very cheap to iterate over it (In the benchmarks, it's blazing fast!).

@akhenakh
Copy link

To get a forward timestamp, I'm using a reverse timestamp:

math.MaxInt64 - t.UnixNano()

Would this work in this case ?

@manishrjain
Copy link

manishrjain commented Jul 31, 2017 via email

@akhenakh
Copy link

@bt are you working on this over my branch or do you plan to do it on a different implem ?

I'm asking cause I don't want we do the work twice :)

@mschoch
Copy link
Member

mschoch commented Jul 31, 2017

Bleve also makes heavy use of iterators, so you'll also need an iterator which only sees the correct values for the snapshot, silently skipping over the values with a too old or too new timestamp. You wouldn't want to pull the values associated with those if you can avoid it, so you might want to use a key-only iterator again, and then do get operations for the keys you've determined are the ones you care about.

@bt
Copy link

bt commented Jul 31, 2017

Hey @akhenakh, I haven't started coding this yet as I'm still trying to get my head around how it would all work together, so you're free to start us off and I'll make some PRs as I go through the code and understand how you structure it :) Thanks!

@bt
Copy link

bt commented Aug 15, 2017

Hey @akhenakh, did you make much progress? I didn't manage to find your fork. Anything I can help with? 👍

@akhenakh
Copy link

akhenakh commented Aug 15, 2017

Hey @bt, sorry I missed your message, here is my WIP https://github.com/akhenakh/bleve/commits/badger, haven't worked on it since we talked.

I had some chat with @manishrjain on Badger Slack, he provided some more details on a possible implem.

@bt
Copy link

bt commented Aug 17, 2018

Thanks for the reply @manishrjain.

The current issue right now is the fact that the implementation is passing around the same Txn instance and so in the test, when it tries to create a second iterator, it causes a panic.

I’ll try to do a refactor tomorrow to see if I can get it to work but I’ve already tried a lot of things today and none seem to work (at least without changing the test function of course)

@bt
Copy link

bt commented Aug 18, 2018

@manishrjain is there a way I can get a snapshot of the BadgerDB at a certain ts (without using ManagedDB)?

I noticed you mention that using ManagedDB is literally like playing with fire; I'm trying to abstract as much as that out so that it's handled by BadgerDB, but it seems that I might need to use ManagedDB to implement Badger...

I know that creating a new transaction will also snapshot at that time, but given the way that Bleve's KVstore works, it means I will have to manage this single Txn and that seems pretty bad considering Badger emphasises concurrent Txn use...

@manishrjain
Copy link

If it just a test which is failing, better to modify the test, I think. You could use ManagedDB, but again, there's some understanding that'd be required.

For a read-only transaction, multiple get assigned the same timestamp. You could make use of that fact in the default DB implementation. They'd get a new timestamp if there are update transactions in between.

There's no way you can specify the timestamp to use unless you're using ManagedDB.

@ghost
Copy link

ghost commented Sep 12, 2018

hey @bt. Where the code. I can take a crack at it. There are a ton of example of using the ManagedDB transactions and handling the time issue out there on github land.

https://github.com/search?o=desc&p=3&q=badger+ManagedDB&s=indexed&type=Code

manishrjain added a commit to dgraph-io/badger that referenced this issue Sep 25, 2018
- Support multiple read-only iterators, as per request from blevesearch/bleve#591.

- Disallow the public APIs from accepting keys prefixed with `!badger!`, fixing #582.

- Disallow closing iterator twice, fixing #588.
@ghost
Copy link

ghost commented Sep 27, 2018

@manishrjain thanks !!

@ghost
Copy link

ghost commented Sep 27, 2018

@bt The changes you wanted for Badger have happened

@bt
Copy link

bt commented Sep 27, 2018

Hey @manishrjain, thanks for making the change :)
@gedw99 the issue now is actually, the new index type scorch doesn't write to a KV store. So, integrating BadgerDB into Bleve will only be possible in upsidedown.

I've pretty much given up on integration due to two reasons:

  1. Using upsidedown for my use-case results in a huge index (in size). Like, we're talking exponentially sized indexes where 1 MB of data indexed will result in 100 MB of BadgerDB files.
  2. BadgerDB seems to take so much RAM that it's not practical for my use-case (but I've isolated the problem to actually be due to using upsidedown).

So for those reasons I've stopped using BadgerDB over Bleve and will probably look at it again once scorch works over a KV store. I'm now implementing a WAL in scorch instead, but that itself seems to be super challenging...

@mschoch
Copy link
Member

mschoch commented Sep 27, 2018

once scorch works over a KV store

There are no plans for scorch to ever use a KV store. The KV serialization requirements are part of the reason that the upsidedown format is impractical.

@bt
Copy link

bt commented Sep 27, 2018

@mschoch oooh okay -- so in that case, is there any plans to improve upsidedown to at least be closer to scorch performance?

From my benchmarking, scorch is quite close to a billion times better in all aspects, except that it doesn't write to a KV store, so I can't get benefits of Badger like the WAL or snapshotting, etc.

@manishrjain
Copy link

BadgerDB seems to take so much RAM that it's not practical for my use-case (but I've isolated the problem to actually be due to using upsidedown).

Is it Badger taking up RAM, or the cost of serialization which is causing the RAM usage? If it is Badger, I'd like to identify the cause -- maybe share a heap profile or something.

@ghost
Copy link

ghost commented Sep 27, 2018

@bt
Yeah i knew about scorch but forgot about it being upsidedown index only:)
So just use scorch and move forward ?
why do you need the WAL ?

EDIT:
you need the WAL so that you can use Raft for HA ?

@mschoch
Copy link
Member

mschoch commented Sep 27, 2018

From my perspective, the ability to plug-in any key/value store was a cool (possibly unique) idea we explored. It turns out to not be a very good choice if you want bleve to perform well. I do not believe you can plug-in a better/faster K/V store to upsidedown and have it perform well. So, does that mean upsidedown could be improved to fix this? While it certainly could be improved, and maybe by making some different choices/trade-offs it could work for some people, I don't think it will ever be as performant in the general case. I have no proof of this to offer, it is simply my opinion after working with this technology for the last several years.

Working with bleve users over this same period of time, I've realized most users simply want a correct and fast full-text search index. They don't care how we write the bits, to them it is simply an implementation detail. Scorch is our current attempt to better address the needs of these users. I feel really good that scorch makes for a better bleve for almost all users.

That said, I know there exists a group (probably small) that really liked the pluggable K/V store aspect of bleve. With that in mind, we have no immediate plans to remove upsidedown. But, our goal has been to make scorch better, and let the community take over using/supporting upsidedown. Possibly the upsidedown code would move to the blevex repo to more clearly signal that to users. But crucially, we have no plans to make an API change that would preclude upsidedown from continuing to work.

I can't get benefits of Badger like the WAL or snapshotting, etc

If you could articulate more of these benefits, we can have a discussion about what aspect of scorch relate to this, and if there are analogues or missing features.

@bt
Copy link

bt commented Sep 27, 2018

BadgerDB seems to take so much RAM that it's not practical for my use-case (but I've isolated the problem to actually be due to using upsidedown).

Is it Badger taking up RAM, or the cost of serialization which is causing the RAM usage? If it is Badger, I'd like to identify the cause -- maybe share a heap profile or something.

@manishrjain Unfortunately I've long moved from the implementation I had for Badger and so reverting my Git repo now would be quite hard. The most I have is the PDF visualisation of the memory which I've attached, not sure if it'll help. I think a lot of it comes from the amount of data that needs to be serialised with using upsidedown.

@bt
Yeah i knew about scorch but forgot about it being upsidedown index only:)
So just use scorch and move forward ?
why do you need the WAL ?

EDIT:
you need the WAL so that you can use Raft for HA ?

@gedw99 I'm using a WAL for ensuring that data received is indexed in a case where the application crashes. I serialise the data coming in and then replay it back on application start.

If you could articulate more of these benefits, we can have a discussion about what aspect of scorch relate to this, and if there are analogues or missing features.

So one of the features (mainly a more important one to me now) is to ensure durability by implementing a WAL. I've made a very simple one (that's really just factored for my own use-case) but now my issue is more around the fact that I'm not sure what to use as the reference point -- that is, how do I know that my current index has reached the end of the WAL?

I'm about to implement that now, and my plan is to use the CurRootEpoch as reported by the Stats to give something like a batchNumber to my index batches. Then, I plan to register an event callback to watch for an EventKindPersisterProgress which I will fork Bleve to include the persisted epoch. Not entirely sure if this will work but I THINK it will 😂

Additionally, the next thing on my list is to implement high availability and sharding, and so I believe I would need snapshotting to do this. I haven't given too much thought about that yet though so unsure the way forward for that specific case.

I have been trying to follow cbft and cbgt but some parts are too advanced for me 😦

@ghost
Copy link

ghost commented Sep 27, 2018 via email

@bt
Copy link

bt commented Sep 27, 2018

Maybe this helps https://github.com/mosuka/blast/blob/master/README.md The WAL is a eventstore using a CQRS apttern i presume ? I think you can do what you need with Blast or BLeve, but Blast is really nice in that its not dependent on couchbase and uses RocksDB for its DB. RocksDB is pretty stable with golang now thanks to the cockroachdb guys pounding on it. Maybe this helps ??

On Thu, 27 Sep 2018 at 15:13, Bertram Truong @.***> wrote: BadgerDB seems to take so much RAM that it's not practical for my use-case (but I've isolated the problem to actually be due to using upsidedown). Is it Badger taking up RAM, or the cost of serialization which is causing the RAM usage? If it is Badger, I'd like to identify the cause -- maybe share a heap profile or something. @manishrjain https://github.com/manishrjain Unfortunately I've long moved from the implementation I had for Badger and so reverting my Git repo now would be quite hard. The most I have is the PDF visualisation of the memory which I've attached https://github.com/blevesearch/bleve/files/2424157/file.pdf, not sure if it'll help. I think a lot of it comes from the amount of data that needs to be serialised with using upsidedown. @bt https://github.com/bt Yeah i knew about scorch but forgot about it being upsidedown index only:) So just use scorch and move forward ? why do you need the WAL ? EDIT: you need the WAL so that you can use Raft for HA ? @gedw99 https://github.com/gedw99 I'm using a WAL for ensuring that data received is indexed in a case where the application crashes. I serialise the data coming in and then replay it back on application start. If you could articulate more of these benefits, we can have a discussion about what aspect of scorch relate to this, and if there are analogues or missing features. So one of the features (mainly a more important one to me now) is to ensure durability by implementing a WAL. I've made a very simple one (that's really just factored for my own use-case) but now my issue is more around the fact that I'm not sure what to use as the reference point -- that is, how do I know that my current index has reached the end of the WAL? I'm about to implement that now, and my plan is to use the CurRootEpoch as reported by the Stats to give something like a batchNumber to my index batches. Then, I plan to register an event callback to watch for an EventKindPersisterProgress which I will fork Bleve to include the persisted epoch. Not entirely sure if this will work but I THINK it will 😂 Additionally, the next thing on my list is to implement high availability and sharding, and so I believe I would need snapshotting to do this. I haven't given too much thought about that yet though so unsure the way forward for that specific case. I have been trying to follow cbft and cbgt but some parts are too advanced for me 😦 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#591 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/ATuCwmjN15fnyx4ykDpcQWd70IBJasViks5ufM8KgaJpZM4NdgjI .

Thanks, just had a look -- seems to be completely dependent on Bleve: https://github.com/mosuka/blast/blob/master/index/index.go

Also indexes into BoltDB using upsidedown, so it'll be the same outcome.

@mschoch
Copy link
Member

mschoch commented Sep 27, 2018

The design of bleve is such that it is durable by default. What does this mean? Specifically, unless you've overridden some configuration, when you call Index() or Batch() when those methods return (without error) you are guaranteed that those documents have been indexed durably (not just in memory).

Some applications can tolerate losing data (with some bounds) and applications like cbft turn off durability for that reason. See: https://github.com/couchbase/cbft/blob/a33ad7b7000a9d8d237ba273c47cc100401a0fb0/pindex_bleve.go#L525

If you use that setting, when the method returns, the data is guaranteed to be searchable, but not guaranteed to be written to disk yet.

Typically a WAL is used when you want the data to be durable quickly. The question is do you want that data to be searchable or not? If not, you can just put the WAL in front of Bleve. If you do want them to be searchable, then right now you need to wait for it to be indexed and persisted. It's not clear to me that Bleve should support data that is persisted but not yet indexed.

Regarding snapshotting, I'm still not clear what is missing. The IndexReader returned by the Reader() method on an Index is a stable snapshot. By combining this with your own sequence numbers (stored using the internal storage API) you can do scatter/gather across multiple shards at the same "point in time".

I agree cbft may not be the best example codebase for these features since it's actually a production application. But, I'm happy to describe in more detail how we built some of these features.

@bt
Copy link

bt commented Sep 27, 2018

The design of bleve is such that it is durable by default. What does this mean? Specifically, unless you've overridden some configuration, when you call Index() or Batch() when those methods return (without error) you are guaranteed that those documents have been indexed durably (not just in memory).

Oh! I kept thinking of moss and how the persister runs in a goroutine and if the application crashes there's a chance that not all the data as been persisted to disk yet. If this is the case, I won't need a WAL.

Regarding snapshotting, I'm still not clear what is missing. The IndexReader returned by the Reader() method on an Index is a stable snapshot. By combining this with your own sequence numbers (stored using the internal storage API) you can do scatter/gather across multiple shards at the same "point in time".

Perfect, seems like this will do what I need.

@ghost
Copy link

ghost commented Sep 27, 2018

@bt your right. Its not using rocksdb at all for the index and so is upside down.

I think that using scorch is the best way forward too since its the supported path.
Since scorch does have durable disk persistence everything should work.

I miht have a crack at getting Blast to use Scorch too.

@ailox
Copy link

ailox commented Nov 6, 2018

Looks like someone was successful: https://github.com/alash3al/bbadger

@CanRau
Copy link

CanRau commented Oct 27, 2020

How's the state of all this?

manishrjain added a commit to outcaste-io/outserv that referenced this issue Jul 6, 2022
- Support multiple read-only iterators, as per request from blevesearch/bleve#591.

- Disallow the public APIs from accepting keys prefixed with `!badger!`, fixing dgraph-io/badger#582.

- Disallow closing iterator twice, fixing dgraph-io/badger#588.
osiloke added a commit to osiloke/gostore-common that referenced this issue Mar 25, 2024
add moss and badger indexers, note: badger indexer has issues blevesearch/bleve/issues/591
change up apis
osiloke added a commit to osiloke/gostore-indexer that referenced this issue Mar 25, 2024
add moss and badger indexers, note: badger indexer has issues blevesearch/bleve/issues/591
change up apis
osiloke added a commit to osiloke/gostore-badger that referenced this issue Mar 25, 2024
add moss and badger indexers, note: badger indexer has issues blevesearch/bleve/issues/591
change up apis
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants