Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rocksDB merging code #19

Merged
merged 12 commits into from
Feb 8, 2016
Merged

Conversation

ashwin95r
Copy link
Contributor

@ashwin95r ashwin95r commented Feb 6, 2016

Review on Reviewable

@manishrjain
Copy link
Contributor

Call it tools/merge/merge.cc


Review status: 0 of 1 files reviewed at latest revision, 4 unresolved discussions.


tools/rocksDB_merger/rocksDB_merger.cc, line 5 [r1] (raw file):
Spelling mistake. Should be destination


tools/rocksDB_merger/rocksDB_merger.cc, line 24 [r1] (raw file):
Replace all tabs with 2 spaces. Here and everywhere else.


tools/rocksDB_merger/rocksDB_merger.cc, line 54 [r1] (raw file):
There should be a check somewhere which ensures that if the same key occurs multiple times, it has the same value. Otherwise, log fatal.


tools/rocksDB_merger/rocksDB_merger.cc, line 66 [r1] (raw file):
Remove commented code.


Comments from the review on Reviewable.io

@manishrjain
Copy link
Contributor

Review status: 0 of 1 files reviewed at latest revision, 5 unresolved discussions.


tools/rocksDB_merger/rocksDB_merger.cc, line 1 [r1] (raw file):
Did you copy this code from somewhere? If so, copy that license as well. Otherwise, if you wrote this code yourself, then add the standard DGraph license here.


Comments from the review on Reviewable.io

@manishrjain
Copy link
Contributor

Reviewed 1 of 1 files at r1.
Review status: all files reviewed at latest revision, 5 unresolved discussions.


Comments from the review on Reviewable.io

@ashwin95r
Copy link
Contributor Author

Review status: all files reviewed at latest revision, 5 unresolved discussions.


tools/rocksDB_merger/rocksDB_merger.cc, line 1 [r1] (raw file):
I referred to some files and this link https://github.com/facebook/rocksdb/wiki/Basic-Operations

So I'm not sure


Comments from the review on Reviewable.io

@manishrjain
Copy link
Contributor

Btw, make your commit logs messages specific -- say what you really did, not just "changed ... code"


Reviewed 3 of 3 files at r2.
Review status: all files reviewed at latest revision, 8 unresolved discussions.


tools/merge/.gitignore, line 1 [r2] (raw file):
Also add line /merge


tools/merge/merge.cc, line 1 [r1] (raw file):
Ah.. that's okay. As long as you didn't copy the whole file from somewhere. Then add our license to the top.


tools/merge/merge.cc, line 29 [r2] (raw file):
Having a map like this can cause memory blow up -- if we're dealing with millions of such uids. In a way, the reason the uid creation was done on multiple machines was because doing it on one machine would be too slow, right. So, don't use a map. Rocksdb automatically keeps all keys sorted, so the same repeated key would occur multiple times together.

Also, try and keep this generic so we can also use this to merge the data, not just the uids.


tools/merge/merge.cc, line 58 [r2] (raw file):
Break the line to not more than 80 chars. So, break it after the &&

Also, no need for a map. Iterator would give you repeated keys together.


tools/merge/merge.cc, line 59 [r2] (raw file):
This line still has a tab I think.


tools/merge/merge.cc, line 60 [r2] (raw file):
exit(0) is clean exit. This isn't clean exit. Call assert(false)


Comments from the review on Reviewable.io

@ashwin95r
Copy link
Contributor Author

Review status: all files reviewed at latest revision, 8 unresolved discussions.


tools/merge/merge.cc, line 29 [r2] (raw file):
The thing is I'm creating a new rocksDB object and adding all the key, value from the sharded rocksDB. So if a single XID was assigned different UID by different instances (though it shouldn't happen), the repeated key wont be consecutive as I'm iterating within a single instance's store. If a single XID was assigned different UID by the same instance, that would've already been overwritten while assigning. So the repeated key will not be consecutive the way i'm iterating through them


Comments from the review on Reviewable.io

@ashwin95r
Copy link
Contributor Author

Review status: all files reviewed at latest revision, 8 unresolved discussions.


tools/merge/merge.cc, line 54 [r1] (raw file):
One possible way would be to try and get the key from the rocksDB itself, that way there wont be an extra map, though it would increase the number of reads and might affect the performance by a bit


Comments from the review on Reviewable.io

@manishrjain
Copy link
Contributor

Review status: all files reviewed at latest revision, 8 unresolved discussions.


tools/merge/merge.cc, line 54 [r1] (raw file):
I doubt you need to do lookups, if you consider these sorted streams. See above.


tools/merge/merge.cc, line 29 [r2] (raw file):
Think it should be possible to iterate over all the shards together, in the same way as you'd merge a list of sorted streams; using a heap. That should take O(N log K), where K = size of heap, and N = length of all streams put together.


Comments from the review on Reviewable.io

@ashwin95r
Copy link
Contributor Author

Review status: all files reviewed at latest revision, 8 unresolved discussions.


tools/merge/merge.cc, line 29 [r2] (raw file):
I think that would be quite complex compared to the current one. Even if we read from the store, it would be O(N logN) and even is N is few billions, the factor would still be small. I feel if we use heaps it would add unnecessary complexity to the code.


Comments from the review on Reviewable.io

@ashwin95r
Copy link
Contributor Author

Review status: all files reviewed at latest revision, 7 unresolved discussions.


tools/merge/merge.cc, line 29 [r2] (raw file):
Should we first try reading from the store and if its too slow change to the heap approach?


Comments from the review on Reviewable.io

@manishrjain
Copy link
Contributor

Review status: all files reviewed at latest revision, 7 unresolved discussions.


tools/merge/merge.cc, line 29 [r2] (raw file):
N Random seeks over all shards isn't going to be faster compared to reading K shards in serial order. Let's discuss on Gitter.


Comments from the review on Reviewable.io

@manishrjain
Copy link
Contributor

Review status: 0 of 3 files reviewed at latest revision, 13 unresolved discussions.


tools/merge/merge_heap.cc, line 36 [r3] (raw file):
Following from below, technically even key doesn't need to be string. Is there some reason it is?


tools/merge/merge_heap.cc, line 37 [r3] (raw file):
Value doesn't need to be string; it is a byte array; so should be treated like such.


tools/merge/merge_heap.cc, line 56 [r3] (raw file):
Align it to 4 spaces after the start of the previous line:
std:cerr
<folder...


tools/merge/merge_heap.cc, line 94 [r3] (raw file):
Be wary of string conversions.


tools/merge/merge_heap.cc, line 102 [r3] (raw file):
const struct node& top = pq.top();


tools/merge/merge_heap.cc, line 118 [r3] (raw file):
and here.


Comments from the review on Reviewable.io

@ashwin95r
Copy link
Contributor Author

Review status: 0 of 3 files reviewed at latest revision, 13 unresolved discussions.


tools/merge/merge_heap.cc, line 36 [r3] (raw file):
The db->Get method doesnt allow value other than string as far as I've searched. Ive made the key and value generic in the merge_heap.cc but not able to do it at one place in merge.cc


Comments from the review on Reviewable.io

@manishrjain
Copy link
Contributor

Looking forward to the comparison results.


Reviewed 1 of 3 files at r2, 1 of 3 files at r3, 2 of 2 files at r4.
Review status: all files reviewed at latest revision, 18 unresolved discussions.


tools/merge/merge.cc, line 2 [r2] (raw file):
Please add the copyright notice here as it is in all the other files. I'll soon take one pass and reassign it to DGraph (instead of myself).


tools/merge/merge_heap.cc, line 56 [r3] (raw file):
I meant after the previous line. For some reason reviewable.io trimmed my spaces. It's meant to be like this:
line1
....line1 continued

or, if line1 is already indented, as it is in this case:
....line1
........line1 continued


tools/merge/merge_heap.cc, line 1 [r4] (raw file):
And here as well.


tools/merge/merge_heap.cc, line 71 [r4] (raw file):
spaces


tools/merge/merge_heap.cc, line 84 [r4] (raw file):
Extra spaces. Remove here and elsewhere below.


tools/merge/merge_heap.cc, line 115 [r4] (raw file):
spaces here as well and above.


tools/merge/merge_heap.cc, line 122 [r4] (raw file):
Spaces after the line, remove. It's amazing how much such effort gofmt saves, just like that.


Comments from the review on Reviewable.io

@manishrjain
Copy link
Contributor

Btw, for C++ style guide, follow this: https://google.github.io/styleguide/cppguide.html


Review status: all files reviewed at latest revision, 18 unresolved discussions.


Comments from the review on Reviewable.io

@hackintoshrao
Copy link
Contributor

tools/merge/merge_heap.cc, line 122 [r4] (raw file):
I agree !


Comments from the review on Reviewable.io

@ashwin95r
Copy link
Contributor Author

Review status: all files reviewed at latest revision, 18 unresolved discussions.


tools/merge/merge_heap.cc, line 122 [r4] (raw file):
Yeah, Its hard to find spaces in the code, I'll search for a similar tool for c++ I guess


Comments from the review on Reviewable.io

@hackintoshrao
Copy link
Contributor

@ashwin95r : What formatting tool have you used in your last commit ?

@ashwin95r
Copy link
Contributor Author

I didn't use any tool. I just searched using /\s\+$ in vim


Review status: 1 of 3 files reviewed at latest revision, 17 unresolved discussions.


Comments from the review on Reviewable.io

@manishrjain
Copy link
Contributor

All checks out, good work exploring both the approaches! :lgtm_strong:


Reviewed 1 of 3 files at r2, 2 of 2 files at r6.
Review status: all files reviewed at latest revision, 1 unresolved discussion.


Comments from the review on Reviewable.io

ashwin95r added a commit that referenced this pull request Feb 8, 2016
@ashwin95r ashwin95r merged commit bef782f into dgraph-io:master Feb 8, 2016
@hackintoshrao
Copy link
Contributor

This has both versions of Rocksdb merge right ?

@ashwin95r
Copy link
Contributor Author

Yeah, merge.cc and merge_heap.cc

@ashwin95r ashwin95r deleted the rocksDB_merger branch February 8, 2016 05:19
danielmai pushed a commit that referenced this pull request May 1, 2020
* Change layout of runnable and fix latency.

* version control go-grpc example

* Revert url change
arijitAD pushed a commit that referenced this pull request Oct 15, 2020
Adds gofmt, linter to ci; fixes broken test command in ci. Updates README
shivaji-dgraph pushed a commit that referenced this pull request Mar 12, 2024
harshil-goel pushed a commit that referenced this pull request Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants