-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
General purpose distributed transaction concept for Orleans grains. #1880
Comments
I wonder what @philbe thinks about this. He's been working on something similar, although not quite the same. Are you not concerned about deadlocks? If ETCs are different for different transactions, there will be no way to detect them? As I understand, they will get eventually broken up via timeouts and rollbacks. |
So callers will get random rejections for write request if the grain happens to be in the middle of another transaction? That is probably okay for an app where such a situation is unlikely to happen due to its high level logic flow. But as a general purpose mechanism seems brittle to me. If writes were queued up and processed once the current transaction completes, that would be a friendlier behavior I think. |
And queuing would be fairly easy to add, too, using interceptors. |
I'm not sure how deadlocks can happen with this model - they will be automatically resolved with one of the competing transaction dies: We have FooECT and BazETC, Whoever locks SharedBar first - wins, as soon as the lock confirmed e.g. for FooETC - BazETC attempt to lock SharedBar will be rejected immediately, so BazETC will rollback the entire transaction. If there will be a race and mutual lock (we have 2 SharedBars in both ETCs and one was locked in by FooETC and another one by BazETC ) - in worst case - both transactions will be safely rolled back, but there is a chance that one will be rolled back before another's lock attempt and one of them can succeed. |
@sergeybykov Yes, callers can get random rejections for write request but this rejection happens BEFORE any actual changes happen and they can safely rollback. |
Who initially calls ETC and generates a TransactionId? A first grain in a transaction? |
Queuing up writes can work if a consequent write can be applied without checking the current state ( e.g. it's irrelevant for a write if prev transaction was commited or rolled back) but in practice it's not very common. Let's take bank accounts as an example: Basic transaction involves ETC, Sender, Receiver. Balances are 100 and 100. The success or failure of SecondETC strongly depends on Success or Failure of FirstETC. If FirstETC succeeds - balances will be 30 and 170 and SecondETC can't really withdraw 50 from balance with 30. So they are not really independent and must complete transaction in full before starting the next one. We can queue them up on the grain mailboxes but any split or State write failures would cause a havoc (or I don't see how can it be resolved reliably with a positive outcome for a second transaction if first is rolled back... or...) Is this a correct understanding of what do you mean by queuing writes ? Like a grain can be involved in 2 transactions in parallel ? |
I meant simply defering requests to methods marked as |
We are using stateless grains to bundle up business logic of calling statefull grains and have central place to handle bizlogic errors. An equivalent of Unit of Work in a stateless grain.
|
If I understand the server-side interception correctly - we can organize this entire transaction via interceptors and an extra state\ reliable storage for a transaction messages, we don't even need to have a unified interface on the grains cause we can operate over intercepted message. Just need to understand how we can shadow-copy the grain with the state and run that 'Apply FT to a State copy ' as described in 2.1... |
I think your proposal ensures update transactions will be ACID, that is, atomic and durable. But they won’t be isolated (that is, serializable) if you allow transactions to read locked grains. Since a program can read a locked grain, you allow inconsistent reads. Consider your money transfer example. Suppose a transaction T1 is transferring $50 from account1 to account 2. After T1 has debited account1 and before it has credited account2, another transaction T2 could read the balances in account1 and account2. This is an inconsistent state. If T2 writes the sum of the values it read in account1 and account2 into a third grain, audit-result, the latter would be missing the $50 that’s in transit. That’s a result that couldn’t happen if transactions executed serially. You can avoid this outcome by not allowing transactions to read modified grains. In this example, T2 wouldn’t be able to read account1. That would ensure isolation, but it would reduce throughput. Have you defined throughput requirements? Each transaction does quite a few sequential writes to storage. I’m sorry and embarrassed that we haven’t finished cleaning up our transaction implementation sufficiently to make it public. I’ll see what I can do to speed things up. |
Thanks, @philbe .
Technically - on each access to Account1 and Account2 grains, these grains have to lookup in T1 Grain for the status, and based on that - decide how to serve the request - either return current OldAccount1\OldAccount2 states or delay the request, perform an actual T1 state mutation and serve NewAccount1 state. I haven't had any throughput requirements yet, neither generic enough implementations. This idea is to smoke test the concept itself - whether more eyes can find any flaws. |
What I did not understand in your proposal is the sentence "So Sagas are great on papers, but engineers want to build practical systems." What you propose is a standard regular transactions (with a bit different/weaker correctness semantics, as Phil mentioned). |
@gabikliot That was an unnecessary caustic statement to prevent recommendation a-la "in distributed world you can't have reliable transactions and must use sagas". Sorry if it insulted anyone. Before we came to this concept - we tried to model how Sagas pattern can be applied to our system and found that it doesn't really work well on our scenarios and around edge cases - we were looking for atomic and easy to implement transaction concept, although which can run for some extended time (transaction timeout can be relatively long) and still doesn't affect the entire system, just create some minor congestion around update\delete operations ( which is usually less frequent than reads). |
Ohh, I see. So long running transactions, but not necessarily multi stage. Makes sense. |
Thanks @centur for your explanation of consistent reads. That helps. However, I'm still not sure you've covered all the cases. Consider this one:
I can imagine ways to prevent T2's inconsistent read. E.g., after reading acct1, it knows it depends on T1. So from then on, T2 must inquire whether T1 updated every grain that T2 accesses after acct1. But it gets complicated. For example, it's possible that many transactions updated acct2 after T1 committed and before T2 reads acct2. ETC would have to keep track of all of them and know when it's safe to forget about that update history. |
Thanks @philbe . This is an interesting case. I tried to model safest, read-committed behavior but this case breaks with two very competing transactions (T1 and T2, when T1 commits exactly between T2 reads the values of acc1 and acc2) which gives T2 an inconsistent view of the world). Although it may work for our particular case :
it doesn't look like a good enough general cause where many others Can you give any advice how this can be generalized enough to prevent such reads or maybe share the work-in-progress you was doing on transactions (if this is possible). |
@philbe I'm interested to have the feature as well. Currently working on a game backend using Orleans. Trades should be handled as transactions. I will look at the implementation at https://github.com/saeedakhter/OrleansStrongConsistency |
@ashkan-saeedi-mazdeh Could you give an example of the scenario where you want to use transactions for trades? |
@bailud Generally speaking when an operation consists of calling methods in multipole grains and these method calls modify state. I want this state modifications to have transactional properties.
Then if some players can remove it and the request for some fails, we have two options, eventual consistency by setting a reminder to retry for failed currency reductions later on or simply fail the transaction and leave all player currencies as it was before the transaction. For example either all methods in the foreach below succeed or fail. I can tolerate executing them linearly instead of parallel as well. List<Task> transactionTasks = new List<Task>();
foreach(var player in players)
{
transactionTasks.Add(player.DecreaseCurrency("coin",30));
}
Task.WhenAll(transactionTasks()); |
@ashkan-saeedi-mazdeh What you just described is exactly the use case for a transaction. If the game itself is a stateless worker, will adding a log make sense?
Thus, if the system fails during the game, you can check the log to see if the players have already paid for the game. Also, you need some additional logic when you recover from a failure, e.g. check if the game has been interrupted by a failure so that the players will not be charged twice. |
@bailud Thank you for the response! Ideally later on we have a set of interfaces to inherit from when we want to have grains which have transactions built-in directly to them. I understand this is a tough problem to solve generally. There are many questions like
|
@ashkan-saeedi-mazdeh We are actually working on transactions in Orleans. Thus, I would like to know more about what people really expect from the transactions in Orleans. Will that be good enough to provide transactions in a traditional sense, i.e., database transactions, or do people want something more specific for the actor model? For example, it would be really nice if you could elaborate what behavior / support from the language is desirable for (1) transaction with Reentrant grains (2) writing reverse transactions and (3) lock the state partially |
@bailud Great then. Sergey told us that some researchers are doing stuff regarding it. The point is when we talk about transactions , what actually we care about is actor state and effects of operations on it. The things like partial state updates or reverse transactions I talked about are simply questions, I don't think in the first version we need custom reverse transaction functions or be capable of locking part of the state.
Then if it failed all modified grains revert their state to the previous one. It's ok to lock the grain/grain state while transaction is running. The hard part is when transaction fails and updates for reverting participant states fail as well. We keep the grain locked then. Maybe a good trade-off is to mark some methods of the grain with an attribute which says that they have nothing to do with state and can execute even when the grain is locked or we just lock state objects so any operation regarding them fails and method calls are always allowed but this second one is much harder and requires additional code generation for state objects I imagine or at least an accessor function for getting the grain state. About reentrancy of grains, the transactions are most important for these grains since if a grain is not reentrant then it can not do anything else until its current Task is returned which is from the transaction call itself so the none-reentrant grains are easy to write transactions for actually, at least easier than reentrant ones. Orleans revolutionized the actor model based programming with virtual actors and by adding transactions to actor operations you can probably multiply it I guess. Writing something similar to Mnesia is possible but that is just a DBMS to be used in actor systems. In fact service fabric reliable collections are kinda like Mnesia however I'm not sure if they allow for dirty operations or not and how many lock types they support. I'm really interested to see what you'll come up with and its differences to simply using DB transactions in actors when you need transactional properties. This can make Orleans more interesting for the cases that actors need to interact more with each other and do it in a consistent way. I think it's good if you come to Orleans's gitter chat room and ask people's opinion about it. @galvesribeiro @veikkoeeva @ReubenBond @centur What do you think guys? |
@ashkan-saeedi-mazdeh sorry I didn't understood your correlation between transactions and reentrant grains... Regarding the SF collections, you can read data before it is commited, I mean you can have inconsistent reads by reading the replicated state before the local state is actually replicated. |
@ashkan-saeedi-mazdeh I think you'll find our transaction implementation supports the usage scenarios you described.
We allow a transaction to make many calls to the same grain. We allow grains to be re-entrant. With our locking implementation, all concurrent calls to a reentrant grain must be executing the same transaction. With our optimistic-concurrency-control implementation, concurrent calls from different transactions are allowed, but each transaction is working with its own copy of grain state. With both locking and optimistic concurrency control, transactions have all of the ACID properties.
|
@ashkan-saeedi-mazdeh I took a look at the Mnesia (http://erlang.org/doc/apps/mnesia/Mnesia_chap4.html). The regular transaction in Mnesia looks very strict. It uses 2 phase locking, and can achieve serializable if the table level locking is used properly based on its description ("All programs accessing the database through the transaction system can be written as if they had sole access to the data.") It becomes more interesting with the dirty operations. These operations execute without any locks. It said that dirty operations do not have atomicity or isolation. In additional, it seems that dirty writes can also be harmful to regular transactions, since it can update a record regardless of whether it is locked by another transaction or not ("The isolation property is compromised, because other Erlang processes, which use transaction to manipulate the data, do not get the benefit of isolation if dirty operations simultaneously are used to read and write records from the same table."). It looks to me that it will be difficult to reason about the state of the system if the data can be updated by both dirty operations and transactions. Could you give me some idea of when dirty operations are used and what you can expect from the database state? @centur was asking for read committed, which is probably not safe enough for the trades scenario you mentioned. |
@bailud I would not use both dirty operations and transactions on the same rows of a table since it will be very hard to reason about the state of the object so either something is not transactional and having eventually consistent views of the data are ok for the app and dirty reads and rights are ok and you can use dirty operations or you can not do dirty operations. What I think is the right approach in Mnesia is
You should keep in mind that Mnesia is not used as a general purpose database for actual storage of the data in my mind at least. You use it to make your Erlang code do its work and the real data storage is done to regular DBMS (being RDBMS or not). Mnesia has limited storage capabilities and even disk usage limitations for tables and ...
I just meant I don't represent the opinion of everyone, yes what he wanted was read commited which is not useful for my case at all. |
@philbe It's great and I would say this really pushes the Orleans programming model about an order of magnitude forward at least for a certain number of applications. The statement which says actor model is not suitable if many actors need to interact with each other no longer fully applies to Orleans then :) For interactions you need transactions and they are costly everywhere, in Orleans or not. And of course this is all expected when one of the fathers of transactions is on top of the project :) |
@galvesribeiro see @philbe 's comment about a reentrant grain is able to take part in the same transaction and call methods. I meant this can happen and if we lock it fully and don't allow, the performance and throughput of the system will drop for reentrant grains. I should try to write more clearly. |
@ashkan-saeedi-mazdeh Slowly catching up with this thread. Yes, but my example with eventual transactions was a workaround as I don't know if there is a robust algorithm for the case you mentioned - transaction failed but reversal failed too. It's the case I was thinking about when I added that eventual thing into my algorithm - cause you may revert not only as a result of network or state failure but as a result of timeout too - so eventual part (the second commit phase) would work in case of explicit transaction grain failure and in case of transaction timeout - when grains need to be unlocked for other operations, and we can have grains activated and unlocked as is, when action happens, instead of reactivating all of them on some explicit scheduled trigger |
@centur Fair enough. Your algorithm is actually what some solutions need. |
Re on few mentions: generally speaking - I want at least some implementation of transactions. I proposed my solution to spark the discussion and move things forward. I think since we started to use orleans a year ago - there always was a subtle notion that transactions are somewhere near, just around the corner, but after the year they didn't appear. I had few discussions about Orleans on meetups and people are asking about transactions, so having good transaction implementation out of the box will skyrocket orleans compared to other actors system. If it will be database-style ones with ACID -that perfect, as it's close to a natural OOP style in C# I'm really interested in reading anything about @philbe's implementation, just to understand the algo and teach myself on complexity and problems in this area, if it's possible. And again - I'm not an expert, I'm the guy who needs this feature in any form and I can see that this is the very burning question for many others, whether they are on gitter or not or whether they are using orleans or just deciding to try. |
Hello. Found this issue while trying to understand if there is any transactional concept in Orleans. As i understood @philbe has some kind of solution but not public yet? |
Cross-referencing #2161 for further information. |
@stgolem In general in all actor based frameworks like Erlang, Akka and ... the task of doing transactions is done by using database transactions. In the specific case of Erlang, They have a special database called Mnesia usually used for the task. The DB is not anything special on its own other than having the specific feature of being able to store Erlang terms. So effectively until the feature becomes available you can use database transactions. As an example if you have a trade going on between two players in a game, when player A's grain calls a Trade method with player B's grain as the other party, you execute a DB transaction to do the operation and declare it successful if the transaction completed. The feature should be available in a couple of months from the meet-up time which was around a month ago I guess. |
A technical report that describes our transaction mechanism is now available here, written by @tamereldeeb and me. We'll make code available soon. Initially, it will support optimistic concurrency control (OCC), not two-phase locking as described in the tech report. We were hoping to add the OCC description to the tech report before publishing it. But we've been slow and decided not to delay further. We'll update the tech report with an OCC description later. |
@philbe The link is not working |
@ashkan-saeedi-mazdeh this is simple, when we have some kind of "commiter" grain that can decide whole work. But in more general environment, where every grain is doing some part of work, i wish them all to do "commit" together. |
@ashkan-saeedi-mazdeh and @stgolem. Very sorry, I accidentally made it private to Microsoft. It's public now, so you should be able to access it. |
@philbe still acting weird:
But it opens in-private just fine... Just in case - here is a direct link to PDF |
@centur Indeed, that's strange. What browser are you using? |
Google chrome. But I logged in to Azure portal in that profile (so it sends bunch of cookies with request to microsoft.com). |
@philbe Read it all at morning with a cold , really interesting. but two things. 1- In page 6 the report mentions that Orleans writes back state to storage automatically when a grain deactivates but in fact it doesn't. It might be interesting to add the feature or the possibility of destructors but for now Orleans doesn't persist state at deactivation time. 2- Why did you use a separate process for TM and did not host it in silos? To scale separately ? Or because you wanted all machine resources for the TM? |
@AshkanSaeedi 2- Yes and Yes. The TM throughput was too low when it ran in a silo. |
@philbe @sergeybykov Is there any news about the prorgess here? |
2.1.0 includes an "RC" quality implementation of cross-grain transactions. |
Hi, guys. I want to discuss one idea about how distributed transactions can be implemented with Orleans.
I'm not an expert in this matter but read a lot of various opinions on sagas and other implementation proposals like #1090.
First, let me put some basic issues we have with Sagas concept (terminology is based on this presentation Applying the Saga Pattern ).
It sounds solid in theory, but we have some challenges in practice:
FT is
UpdatePassword
, which should be handled likeState.Password = oneWayHash(UpdatePassword.NewPassword)
. Within such operations it's impossible to implement CT without subverting application rules: CT would be aSetPasswordHash
with handlerState.Password = SetPasswordHash.OldPasswordHash
which breaks incapsulation and exposes internal implementation details to an external entity. Things get even worse on the business domain edges - e.g. when FT interacts with any external service which doesn't allow such reversal (you can't revert password in Gmail back to an old one)So Sagas are great on papers, but engineers want to build practical systems.
We had discussed few concepts and one of them seems reliable and doable. Just to give it some name - let's call it Eventual Transaction. This pattern relies on Orleans guarantees and behaviour and may be tricky to implement outside of Orleans.
So the pattern is:
Perform
and type of action defined by the type of input object (message)Transaction flow resembles the one from 2 phase commit, but with 'eventual' second phase.
ETC grain starts and saves few pieces of information to the storage - TransactionId, StartTime, current TransactionState == Pending and expected number of participating grains. It also starts a timer\reminder which automatically rollbacks the transaction if expired.
Transaction Id is being passed to every FT command to every participating grain.
2.1 As part of Perform(FT) each grain has to save full FT message to a state property along with transactionId, store State+FT reliably, tries to apply FT to a copy of current State and send Confirmation back to ETC with a reference to itself. Effectively saying - 'I'm ready to continue with this Transaction'.
2.2 On each Confirmation - ETC saves reliably grainRef of a confirmed grain.
Once all participating grains sends confirmation ( either positive or negative) - ETC calculates the overall transaction state (Commit or Rollback) and persists this result to storage. This is the end of our transaction.
If timeout reached - ETC transition itself to a 'Rollback' state.
On activation ETC checks current time and Transaction startTime and if it's still in Pending but lapsed the timer - it does 'Rollback'.
Now comes the eventual consistency phase:
We really not interested in pushing all participating grains to commit or rollback immediately cause we have an extra logic around serving other requests to these grains:
4.1 If it is 'Commit' - grain must apply FT to its own current state and persist it. Grain cannot proceed further until it reliably stores mutated State. After this - grain can participate in another transaction or serve 'read' request based on the new State.
4.2 If ETC state is 'Rollback' - Grain can drop FT and TransactionID and reply to the current message from an existing State.
4.3. If ETC state is 'Pending' (this can be due to transaction in progress or waiting of timeout) - Grain can serve Read requests from a current state (without applying FT) but must reject Modify requests ( to prevent transactions overlap)
So based on this pattern and some extra overhead (TransactionId and some confirmation messages for a first phase) we can ensure that after our ETC commits or rollbacks atomically - we can guarantee (to a certain degree) that all participating grains can successfully apply FT to their states and will not serve any other Modify messages until each of them finalizes it's own part of transaction. Also, we have clear fixation points when we can crash any silo and still resume the transaction.
All cross grain message delivery should be At-Least-Once as with Sagas.
It also looks doable with server-side interception if SSI has access to an actual grain state (to verify that FT is applicable to State without exception)
Downsides \ Grey areas:
Really keen to hear any constructive feedback and explanations why this will not work or how to achieve similar results with less pain and more gain.
cc @philbe (I saw this 'invocation' in 1090 - seems like works well ;) )
PS: sorry for a long description...
The text was updated successfully, but these errors were encountered: