New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Transactions #1445

Closed
niemeyer opened this Issue Sep 12, 2017 · 11 comments

Comments

Projects
None yet
7 participants
@niemeyer

niemeyer commented Sep 12, 2017

Hello developers,

Thanks so much for bringing this graph database into the market as an open source and self-contained Go project. It's the first time in the last several years that I'm again excited about a new database development. Considering what I've seen so far, this may soon be a great candidate for small and large projects, even in cases not originally associated with graph databases.

Given my experience with databases in general and as the author of the MongoDB driver for Go, the one thing that makes me concerned to recommend its use in Canonical or more widely is the lack of transactions.

Over the years this has been a constant nuisance with MongoDB, to the point where I even wrote client side transaction support that abuses atomic implementation details to achieve general multi-document transactions. My hope is that this would be short-lived, but here we are, 5 years later.

Upstream is discussing the fix since the project started pretty much (#2804, #11500, #11508, etc), and there are hints that yet another small step may be coming soon ( #28903, #30325, etc), but it's clear that being an afterthought affected the design and the cost of the solution, not to mention the use cases.

So, coming back to Dgraph, if the idea is indeed to position it as a general alternative for existing databases as I've watched in one of your videos, please don't make the same mistake of postponing transactions for too long, or voicing them as relevant mainly for financial transactions. The sort of consistency at stake is relevant for pretty much any application at all that uses data, even more when even basic details about a record are recorded as multiple individual terms. This would make the situation even worse than with MongoDB.

On the bright side, given the current implementation of Dgraph and the existing form of mutation operations, it feels like you are in a much better position to solve the problem already, and with a good solution you also get the attention of a currently suffering user base that will be interested in hearing your story.

I can't promise that since we're not a current user of Dgraph (and might not be while this is an issue), but depending on the case I can even try to advocate internally for some funding towards the solution for this problem.

Thanks for your consideration, and in either case thanks again for your work on this promising database.

@manishrjain

This comment has been minimized.

Member

manishrjain commented Sep 13, 2017

Hi @niemeyer ,

Thanks for putting this so eloquently. You make a very convincing case for transactions.

Dgraph is a distributed system, like MongoDB. And distributed transactions is a hard problem, and it would come at the cost of performance. Dgraph does consistent replication, so that'd make it even harder. Most writes in the cluster would need to be serialized, instead of running concurrently. Similarly, reads would also suffer.

So far, we've punted on transactions, because of this cost on performance -- It's not clear to me that most users would want to trade performance for cluster wide transactions. However, I'm not opposed to this idea, in fact, this is something we had plans to look into after v1.0.

I think if enough users show support for transactions -- we'll give it a serious consideration, and look into ways by which they can be integrated into Dgraph.

@manishrjain manishrjain added this to the No Deadline milestone Sep 13, 2017

@manishrjain manishrjain self-assigned this Sep 13, 2017

@niemeyer

This comment has been minimized.

niemeyer commented Sep 13, 2017

Thanks for the quick response, @manishrjain.

The cost on performance is well understood by those dealing with data, and that cost should only be paid in situations where the trade-off is explicitly requested. So the impact when not wanted would be zero, but the feature would be there even then, when the relevant use cases show up.

For example, imagine an index modifier @mvcc that when applied to a predicate would synchronize the reads and writes of that particular predicate with all those tagged with the same modifier. Reads would not block, but rather always return the consistent values using the mvcc version of when the query started. Writes with mvcc predicates would be partly serialized while creating the new version so updates are consistent. Mutations and queries on non-mvcc predicates are unaffected by the logic.

Regarding enough users showing support, that's certainly sensible and what most of us use when picking what to work on, but note that in this case there's also a chicken and egg issue: many users will not even consider a graph database as viable for their use cases, and the complete lack of consistency across values kind of makes a lot of them right. The availability of that sort of feature might work as an eye-opener and make people wonder why not, and also makes the job of an advocate a lot easier when trying to convince someone (possibly their team) that this is a good idea.

Again, thanks for your time and your consideration.

@manishrjain

This comment has been minimized.

Member

manishrjain commented Sep 14, 2017

Very good points, @niemeyer . I promise we'll give it a serious consideration, and at least put together a doc to envision how this might work and fit with Dgraph's design. And if win v/s complexity tradeoff makes sense, we'll work on it.

Given you have worked with transactions, if you have ideas about a potential implementation given our design, don't hesitate to send them to my dgraph.io email id: manish. All of that would help evaluate a good design and how much effort this is.

@niemeyer

This comment has been minimized.

niemeyer commented Sep 14, 2017

@manishrjain That's very appreciated, thank you!

I haven't looked much into the code itself yet, so my implementation ideas at this point will be incipient, but the building blocks that are already in place make me optimistic that this may be less of a challenge than it sounds. Badger is already headed in a good direction with values separated from key storage, which makes multi-versioning more convenient, and the consensus protocol for conversations around versioning and confirmations is already there too. What will require more thinking and more work is picking the proper algorithms for optimizing write speeds (retrying on conflicts instead of locking, for instance), but it seems worth getting something working even if slow before overthinking it, and then tuning over time with more experience and with a basis for metrics.

I'll write down your email, thanks. Please feel free to get in touch as well. You have my email in the forum.

@jimanvlad

This comment has been minimized.

jimanvlad commented Sep 20, 2017

As discussed on the chat, another use case of this would be when we need the upsert function to match on multiple properties. FOr example:

{
  a as var(func: [eq(name@en, "Steven Spielberg"), eq(gender, "M")]) @upsert
}

mutation {
  set {
    uid(a) <age> "70" .
  }
}
@pawanrawal

This comment has been minimized.

Contributor

pawanrawal commented Sep 21, 2017

Another user David mentioned on Slack that he wanted conditional mutations.

So only if the node was created by the upsert, then mutations should be executed. We could solve this by having a directive in mutations (which would be easier to do) or by having transactions.

@jimanvlad

This comment has been minimized.

jimanvlad commented Sep 21, 2017

Cypher has a very intuitive ON CREATE / ON MATCH syntax to deal with both scenarios of an upsert.

@ddibiase

This comment has been minimized.

ddibiase commented Sep 21, 2017

@jimanvlad precisely my thought. I brought this up on slack because our application does a lot of it. Right now we're making multiple round trips in a transactional-like manner. So this would be great!

@manishrjain manishrjain changed the title from A Pledge for Transactions to Support Transactions Sep 22, 2017

@manishrjain

This comment has been minimized.

Member

manishrjain commented Nov 14, 2017

Distributed ACID transactions with snapshot isolation are now part of v0.9 release! Enjoy!

https://github.com/dgraph-io/dgraph/releases/tag/v0.9.0

@corporatepiyush

This comment has been minimized.

corporatepiyush commented Dec 15, 2017

For v0.9 and later will DGraph still supports eventual consistent operations(SELECT, INSERT, UPDATE & DELETE) or transaction is hard default for every other operation ?

Eventual consistency might be useful for high write workloads where immediate (consistent) Read after write is not so important (time window being few hundred ms to couple of seconds)

@peterstace

This comment has been minimized.

Contributor

peterstace commented Dec 17, 2017

The old-style eventually consistent operations are no longer supported. Everything goes through transactions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment