-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distributed transactions #22
Comments
Hi fenglin99, |
So candidate timestamp can't be changed in a SSI transaction, if do so, the transaction will be restart. |
And Where do we get the initial candidate timestamp? assigned by client or by cockroach server node? if we get candidate timestamp from cockroach node, when is the candidate timestamp asssigned? the first read or write? |
Github is annoying in how you can't respond to individual questions. @tschottdorf: you can't write a value at the same timestamp as a previous read. The requirement is that the write timestamp be bumped up to the previous read's timestamp + 1 logical tick, so it doesn't matter if the two transactions pick identical candidate timestamps. One will have to be +1 logical tick in the end. @fenglin99: if candidate timestamp is changed in an SSI txn, txn will restart. An SSI cockroach txn which is sending writes for longer than 10s will end up pushing its timestamp forward. SSI will require a restart at the end. On the restart however, the intents are still in place. Having to always retry a long transaction is pretty terrible however... Probably if you're running extremely long transactions like this, you want to reconsider what you're doing, or possibly switch to using SI isolation. @fenglin99: the initial candidate timestamp is taken from the gateway node which receives the first command(s) which are part of the txn. This timestamp must be assigned by a node in the cluster because it should be within the maximum clock offset of the cluster's "true time". The client will have no guarantee of being synchronized properly as it's not estimating clock offsets to other nodes on a constant basis. |
@spencerkimball what does an SSI transaction do that reads x, then writes x? Why won't that restart forever? |
The timestamp cache contains a txn id for each entry if applicable. If a write encounters an entry in the timestamp cache with its own txn id, it's able to avoid incrementing its timestamp. |
Ah, perfect. |
@spencerkimball ok, I see, thanks |
@fenglin99 you might want to take a look at kv/txn_correctness_test.go for tests which specifically verify the operation of the txn model with respect to various write anomalies. |
Hi, I am reading the transaction part of cockroach design, have a question about the read timestamp cache of each range.Read timestamp cache maintains the "latest timestamp" at which key was read, what exactly the "latest timestamp" mean?is the "latest timestamp" mean the candidate commit timestamp of transaction read the key or the start timestamp(snapshot timestamp) of transaction read the key? |
Hi, |
Hi, you mean a transaction only hava a provisional commit timestamp? but how a transaction determine which version of a key is visible? just use the provisional commit timestamp of a transaction? does all read opertion read the latest commit version of a key |
Hi alkfbb, sorry. What I said was misleading. I'm editing the comment above to be more appropriate. |
Hi there, I have some more doubt about the interaction between distributed Assume we are running transactions in SI. We will run a distributed The 99% clock skew is 10... And let's say the candidate timestamp of TX1 is So when committing, TX1 commits with timestamp 7. However, it didn't read Thank you very much in advance, On Sunday, November 16, 2014 2:21:28 PM UTC, Tobias Schottdorf wrote:
|
Hi Lee, the txn status is checked via the transaction table (that's the single source of truth). Whatever is written there is fact, and all updates to this table are atomic. I'm not sure I see the problem with your example. You're reading a consistent cut of your database (at "time" 5) throughout your transaction. The transaction will commit with a later ts 7 (pushed to accomodate concurrent readers, in your example). Whatever is written between 5 and 7 doesn't concern you (there would be an uncertainty restart, but let's ignore that). Best, |
Lee, The example you've given is actually correct behavior for snapshot isolation. It's also the root of why snapshot isolation doesn't guarantee serializable consistency--you may read values from a candidate timestamp which is earlier than your final commit timestamp. This is the basis for the write skew anomaly. |
Thank you Spencer and Tobias! You are absolutely right :P.. On Thursday, November 20, 2014 5:06:39 PM UTC, Spencer Kimball wrote:
|
As far as DB isolation goes, SI exhibits relatively few anomalies, and is http://en.wikipedia.org/wiki/Snapshot_isolation See especially the write-skew example in the Definition section, which is Spencer and I had a vigorous discussion as to what the default isolation ~Andy On Thu, Nov 20, 2014 at 9:06 AM, Spencer Kimball notifications@github.com
|
There are enough issues with performance I might change my mind and make snapshot the default. Ultimately, Oracle ran with snapshot isolation claiming to be "serializable" for many years (a decade plus?) in probably 10s of thousands of installations. Further, most people using MySQL across the entire industry never go above "read committed". Isolation matters, but when you are talking about the difference between snapshot isolation and serializable consistency, it's a pretty marginal benefit. That said, I think the difference between "read committed" and "snapshot isolation" is somewhat more dramatic. With "read committed" (and "repeatable read"), decent programmers constantly have to think about when to read "for update" which strikes me as exactly the sort of thing you want to avoid having them spend their time doing (and often getting wrong). If you're interested in the correctness of the transaction model, take a look at https://github.com/cockroachdb/cockroach/blob/master/kv/txn_correctness_test.go. It makes sense to start reading the file at about line 613. Everything above that is part of the testing harness. |
Hello guys, I am also trying to become a contributor of Cockroach. I just On Thursday, November 20, 2014 11:29:23 PM UTC+6, Spencer Kimball wrote:
|
Btw, when we update timestamp of current rw tx with last evicted timestamp
|
Hi Rustem, It's great that you want to contribute. The items below look like they could be a good starting point - usually everything leads down the rabbit hole sooner or later. Just assign yourself and off you go.
|
Thank you, Tobias (@tschottdorf) for nice explanation of pushing.
|
Why do we need to update low water mark if a new range replica leader is |
the tsCache is associated to the respective replica leader. The new leader's clock will not be exactly in sync with the old leader's and if we don't up the high water mark to take care of the offset, we could wind up not pushing the timestamp of a write that should have been pushed because the old leader might have seen a read for that timestamp prior to losing leadership. |
Now I got it. I forgot that Timestamp is taken from range replica leader On Monday, December 8, 2014 9:21:28 PM UTC+6, Tobias Schottdorf wrote:
|
Hi,
If the data read have timestamp greater than t-ε but less than t, how to decide whether that transaction write this data committed before start timestamp of the read transaction or after? Or there is other mechanism can guarantee all data which have timestamp greater than t-ε but less than t has commited before the read transaction with timestamp t started? Thank you very much. |
Hi @yorkxu, the transaction won't read such data (it will come back with a higher timestamp). We optimize that a bit more using the node's timestamp (which limits these restarts to one per node), but essentially within that uncertainty interval [t, t+eps) you will restart as you encounter values, moving your start timestamp forward. If you pass I hope that helped, |
Thank you @tschottdorf If a transaction named txn1 has start timestamp t1. As the transaction progress, it read a data with commit timestamp t2 (t2 is greater than t1-ε but less than t, ε is the maximum clock skew), then will txn1 read this data? I mean the commit timestamp t2 maybe later than t1, am i right? Thank you very much |
t1 will read t2's write (which is in its past) if t2 has committed at that point. That is, either there is a committed value or it sees a provisional value and checks on t2's central transaction record. This record will either confirm the value as committed (in which case again it is read) or not committed, in which case t2 and t1 will undergo conflict resolution and the result of the read depends on its outcome (and may have to mean that t1 has to wait for t2 to complete if it can't abort it). |
See also the section "Transaction interactions" in https://github.com/cockroachdb/cockroach/blob/master/docs/design.md. |
Himanshu GitHub
…ons/actions/checkout-3 build(deps): bump actions/checkout from 2 to 3
This is an umbrella issue for now for accounting purposes. Smaller issues should be created for the various paths a transaction can take.
The text was updated successfully, but these errors were encountered: