[Feature Request] In-memory mode #4813

marvin-hansen · 2020-02-19T18:45:29Z

TL'DR:
Add in-memory mode to boost performance and use in-memory (predicate) compression to pack as much data as possible in the given memory. That smokes a lot of benchmarks and helps a lot with graph OLAP.

Experience Report

Right now, we get about a 6 - 10 ms latency with a 5 node Dgraph cluster, and that's nice, but only the case because we load balance across all alphas and the current graph size is small enough that it fits completely in memory. Specifically, we run a count query against all predicates and fetch them all in a quiet hour so they are in memory when needed. That works very well, latency is very low and it's relatively easy to script.

What you wanted to do

For our unified graph, we actually wanted one single DB for operations and analytics in one cluster.
But we had some troubles with DGraph and some hard questions about scalability that are still not answered so we moved forward.

Why that wasn't great, with examples

We simply could not figure out if DGraph is going to sustain performance beyond a few billion predicates. There are no case studies of very large deployments, and that indicates there are most likely none.

What you actually did

We decided to factor out analytics and instead use an in-memory graph that goes all the way up to 500 billion ops per second and, in fact, have to integrate a secondary graph for analytics because of the alternative already ships with all important graph analytic algorithms so it makes sense.

What would be the greatest possible solution?

Please consider adding an persistent in-memory mode to DGraph.
Persistent in-memory means, the entire graph is in memory AND synced on disk so you get both, the speed from querying everything from memory AND the safety of having all data on disk in case a node goes down or needs a restart, say for an update.

Preserve "correctness" and transactional safety but improve performance with an option to hold the entire graph in memory and sync memory to disk. A bit like Redis-graph, but actually useful.

I know, one USP of DGraph is exactly that you can manage graphs way bigger than available memory because it only retains the keys in memory but stores the actual values on disk. Sometimes, you either don't have a graph that big, or equally possible, you have a lot of memory so you can hold the entire graph in memory. Because memory prices are falling already for years, you get so much cheap memory on GCP, that in-memory analytics becomes a no-brainer.

We are actually moving all analytics to a persistent in-memory graph, because, it's actually 30% cheaper than an equivalent GPU accelerated column-based storage but equivalent in performance on most tasks. In terms of system complexity, in-memory is way simpler to handle and scales much either because allocating memory or adding more high-mem nodes can be done with auto-scaling in Kubernetes.

As a rule of thumb, you need twice the memory relative to the graph size plus ~20% for the system. If a graph is about 150 GB in size, you need at least 300 GB memory plus some 50GB reserved for the system, so in total, you need 350GB. With predicate compression, however, it is certainly feasible to pack twice the graph size in the same memory and that's a huge additional relative to the total gain in performance. Obviously, memory usage is high but we are good with that.

Any external references to support your case

Trillion Triple Benchmark:
https://dzone.com/articles/got-a-minute-check-out-anzographs-record-shatterin

danielmai · 2020-02-19T21:11:46Z

An in-memory option is available today. By default, Dgraph mmaps LSM tree tables. You can set this to in-memory by setting the flag --badger.tables=ram.

marvin-hansen · 2020-02-19T21:27:13Z

@danielmai Hang on does the flag --badger.tables=ram syncs memory to disk?

I was talking about holding the graph in memory AND syncing memory to disk so you get the speed from memory AND keep your data safe on disk. Otherwise, it's only volatile and, I guess, nobody wants that for serious data....

minhaj-shakeel · 2020-07-16T13:42:15Z

Github issues have been deprecated.
This issue has been moved to discuss. You can follow the conversation there and also subscribe to updates by changing your notification preferences.

marvin-hansen changed the title ~~In-memory {Ludicrous} mode~~ [Feature Request] In-memory {Ludicrous} mode Feb 19, 2020

marvin-hansen mentioned this issue Feb 19, 2020

Product Roadmap 2020 #4724

Closed

24 tasks

marvin-hansen changed the title ~~[Feature Request] In-memory {Ludicrous} mode~~ [Feature Request] In-memory mode Feb 19, 2020

MichelDiz added the kind/feature Something completely new we should consider. label Feb 26, 2020

minhaj-shakeel closed this as completed Jul 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] In-memory mode #4813

[Feature Request] In-memory mode #4813

marvin-hansen commented Feb 19, 2020 •

edited

Loading

danielmai commented Feb 19, 2020

marvin-hansen commented Feb 19, 2020 •

edited

Loading

minhaj-shakeel commented Jul 16, 2020

[Feature Request] In-memory mode #4813

[Feature Request] In-memory mode #4813

Comments

marvin-hansen commented Feb 19, 2020 • edited Loading

Experience Report

What you wanted to do

Why that wasn't great, with examples

What you actually did

What would be the greatest possible solution?

Any external references to support your case

danielmai commented Feb 19, 2020

marvin-hansen commented Feb 19, 2020 • edited Loading

minhaj-shakeel commented Jul 16, 2020

marvin-hansen commented Feb 19, 2020 •

edited

Loading

marvin-hansen commented Feb 19, 2020 •

edited

Loading