-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert Cayley indexing to an append-only log #113
Conversation
$ benchcmp gollrb.bench b-gen.bench benchmark old ns/op new ns/op delta BenchmarkNamePredicate 1731218 1693373 -2.19% BenchmarkLargeSetsNoIntersection 81290360 70205277 -13.64% BenchmarkVeryLargeSetsSmallIntersection 768135620 442906243 -42.34% BenchmarkHelplessContainsChecker 39477086024 35260603748 -10.68% BenchmarkNetAndSpeed 22510637 21587975 -4.10% BenchmarkKeanuAndNet 18018886 17795328 -1.24% BenchmarkKeanuAndSpeed 20336586 20560228 +1.10% BenchmarkKeanuOther 85495040 80718152 -5.59% BenchmarkKeanuBullockOther 95457792 83868434 -12.14% Code gen from $GOPATH/src/github.com/cznic/b: make generic \ | sed -e 's/KEY/int64/g' -e 's/VALUE/struct{}/g' \ > $GOPATH/src/github.com/google/cayley/graph/memstore/b/keys.go key_test.go manually edited.
@@ -41,7 +43,7 @@ type Iterator struct { | |||
result graph.Value | |||
} | |||
|
|||
func NewIterator(prefix string, d quad.Direction, value graph.Value, qs *TripleStore) *Iterator { | |||
func NewIterator(prefix string, d quad.Direction, value graph.Value, qs *TripleStore) graph.Iterator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you hiding this for a reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to return null. It works alright if it's a closed leveldb iterator as well. Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you just return nil for the *Iterator?
h.QuadWriter.Close() | ||
} | ||
|
||
var ErrQuadExists = errors.New("Quad exists") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Write as a var block.
LGTM with comments. Has this been tested on leveldb and mongo? |
Been watching your comments come in, all should be done (including concretization of Delta, because the pointer was out of habit, not speed). Extensively tested on LevelDB (also the extant LevelDB tests), and some on Mongo but it's probably worth hacking the integration test (-short) to prove both. |
I don't feel too qualified to comment on this... I may be missing some of the surrounding conversations here. But, FWIW, my high-level viewpoint is that this seems to muddy cayley functionality (as a graph query framework) with the backend implementation details. Postgres for example handles transactions and write logging just fine by itself. |
One thing I've thought about doing is changing to
As it's a simple union type which should have no overlap. The prior API would work fine (eg, AddTriple) and provide API users the same interface. Yes, API helpers with a higher-level API seem right. To make an empty store ought to be as easy as As for backend implementors what handle transactions -- this is to help aid that fact. Yeah, Postgres does it fine, but perhaps there's another Cayley connected to the same server? How does one know that the data is there? Blind ignorance (and trusting the backend) was the order of the day before, but that doesn't hold for questions of distributed consistency. By dealing in "accepted" deltas instead of dealing with pending and whatnot, all the backend features you want can be used. So yeah, batch together the Postgres transactions and use all the cool indexing features you like; it's free reign. The further good news is the consistency gets managed by the QuadWriter now, and TripleStore is allowed to be mostly ignorant of this fact, so it's not much worse there. Anyway, that's a rough explanation as to the why and the sort of effect it has. It's kind of important, and worth a very little more mud for implementors (if you did AddTripleSet and RemoveTriple before, it's nigh on a wrapper function). And as a framework, none too shabby. |
$ benchcmp gollrb.bench b-gen.bench benchmark old ns/op new ns/op delta BenchmarkNamePredicate 1369329 1444990 +5.53% BenchmarkLargeSetsNoIntersection 72329029 64975716 -10.17% BenchmarkVeryLargeSetsSmallIntersection 890824761 408784476 -54.11% BenchmarkHelplessContainsChecker 35314797618 30673240485 -13.14% BenchmarkNetAndSpeed 19694146 19486797 -1.05% BenchmarkKeanuAndNet 15340756 15317415 -0.15% BenchmarkKeanuAndSpeed 17902709 18042030 +0.78% BenchmarkKeanuOther 53452058 50984817 -4.62% BenchmarkKeanuBullockOther 90827780 86536510 -4.72% benchmark old allocs new allocs delta BenchmarkNamePredicate 1339 1339 +0.00% BenchmarkLargeSetsNoIntersection 22603 22674 +0.31% BenchmarkVeryLargeSetsSmallIntersection 65787 65860 +0.11% BenchmarkHelplessContainsChecker 1713541 1713669 +0.01% BenchmarkNetAndSpeed 17135 17146 +0.06% BenchmarkKeanuAndNet 15802 15802 +0.00% BenchmarkKeanuAndSpeed 16397 16396 -0.01% BenchmarkKeanuOther 30148 30149 +0.00% BenchmarkKeanuBullockOther 35542 35544 +0.01% benchmark old bytes new bytes delta BenchmarkNamePredicate 96226 95842 -0.40% BenchmarkLargeSetsNoIntersection 1165914 119725 +2.69% BenchmarkVeryLargeSetsSmallIntersection 2760072 2777798 +0.64% BenchmarkHelplessContainsChecker 84388448 84351168 -0.04% BenchmarkNetAndSpeed 1414837 1425752 +0.77% BenchmarkKeanuAndNet 1247249 1247453 +0.02% BenchmarkKeanuAndSpeed 1275522 1275243 -0.02% BenchmarkKeanuOther 2021107 2021497 +0.02% BenchmarkKeanuBullockOther 2682243 2683250 +0.04%
Conflicts: graph/memstore/iterator.go graph/memstore/triplestore.go
You were right about Mongo having some issues -- it's all better now, but it wasn't indexing. (Plus I removed some duplication and whatnot) PTAL, I'm signing off for sleep though. Here's the delta against master:
For fun, here's the three backends on my machine, after log-ization:
|
Comparison of b against GoLLRB (as at d5f020). $ benchcmp gollrb.bench b-gen.bench benchmark old ns/op new ns/op delta BenchmarkNamePredicate 1631932 1409531 -13.63% BenchmarkLargeSetsNoIntersection 190792654 63748682 -66.59% BenchmarkVeryLargeSetsSmallIntersection 896154437 373475843 -58.32% BenchmarkHelplessContainsChecker 20719182678 14078301640 -32.05% BenchmarkNetAndSpeed 32519019 20188665 -37.92% BenchmarkKeanuAndNet 18319247 15224988 -16.89% BenchmarkKeanuAndSpeed 30849568 18744134 -39.24% BenchmarkKeanuOther 105552525 107620648 +1.96% BenchmarkKeanuBullockOther 295395338 115193002 -61.00% benchmark old allocs new allocs delta BenchmarkNamePredicate 1339 1341 +0.15% BenchmarkLargeSetsNoIntersection 22585 23632 +4.64% BenchmarkVeryLargeSetsSmallIntersection 65776 69396 +5.50% BenchmarkHelplessContainsChecker 1713541 2036316 +18.84% BenchmarkNetAndSpeed 17104 17240 +0.80% BenchmarkKeanuAndNet 15816 15855 +0.25% BenchmarkKeanuAndSpeed 16368 16493 +0.76% BenchmarkKeanuOther 30134 30634 +1.66% BenchmarkKeanuBullockOther 35510 36454 +2.66% benchmark old bytes new bytes delta BenchmarkNamePredicate 96162 96294 +0.14% BenchmarkLargeSetsNoIntersection 1172356 1249872 +6.61% BenchmarkVeryLargeSetsSmallIntersection 2810080 2992409 +6.49% BenchmarkHelplessContainsChecker 89233264 104999088 +17.67% BenchmarkNetAndSpeed 1388793 1428110 +2.83% BenchmarkKeanuAndNet 1263145 1250079 -1.03% BenchmarkKeanuAndSpeed 1246956 1281546 +2.77% BenchmarkKeanuOther 2021312 2024727 +0.17% BenchmarkKeanuBullockOther 2671448 2742968 +2.68% Conflicts: graph/memstore/triplestore.go
Conflicts: graph/leveldb/triplestore.go graph/mongo/triplestore.go
Use cznic/b B+tree implementation in place of GoLLRB for memstore
Faster memstore using b is merged, all the other bits seem to be in place and the build is green. I'm going for a beer and then sleeping, but barring objection, I'll merge this behemoth tomorrow. (I've got some others depending on it) |
Convert Cayley indexing to an append-only log
Fixes #70
In order to provide for replication, graph history and other neat clustering features, I created the interface
QuadWriter
which can take care of all the replication logic; TripleStore (soon, QuadStore, I'm guessing) should store the things that have been accepted. Further, it should be kept as a log -- a follow-up CL which addsGetRange(from, to int64) []*graph.Delta
to the TripleStore interface is forthcoming.Some question of moving, renaming and ownership is certainly valid.
This also turns Cayley into a full quadstore for all backends, which it was heading toward anyway. This PR finalizes that.
Anyway, this means that nothing really gets deleted, but that indexed quads get extra data about when they're valid, and the triplestore^Hquadstore iterators should check the validity. This adds some amount of slowdown for Next(), but how much?
A couple percent, but not noise. Incidentally, this is exacerbated by the Materialize iterator that recently went in; it does a lot of Next()ing, so on things where it aborts (HelplessContainsChecker) it doesn't have as problematic an impact. So, the wins from Materialize are still large, just less so now.
So what's the overhead on queries going through the addition/deletion/readdition cycle, other than using more persistence space? This now gets tested.
The answer to that, at least with the memstore is:
Meh. Not too much.
Obviously it'll be a bit slower for larger backends, but it'll still be fairly small because it fails fairly fast; we don't have to go through the whole tree for things that are invalid, we just go to the next thing in the underlying persistence iterator.
A migration tool to collapse the history (ie, only keep additions of things that are valid) might make sense in the future, but should be a separate tool.
/cc @kortschak @pbnjay