Life of a transaction

Will Haack edited this page Jun 6, 2016 · 2 revisions

NOTE: Much of the code referenced (such as HTTPSender and RunTransaction) is now deprecated. This guide is out of date.

This page is a walkthrough of the client and server code involved in a transaction, intended to serve as a guide to the various layers.

  1. Client: The code in this section is a library linked into the client application. The links point to the go client, although this layer will eventually have counterparts in many languages.

    1. The main entry point is the client.KV type, and in particular the RunTransaction method. The application passes a retryable closure to RunTransaction.

    2. RunTransaction routes all KV operations (puts and gets) through a txnSender. The txnSender detects transaction abort errors and instructs the client.KV to retry the transaction as needed.

    3. txnSender proxies to HTTPSender, which handles the actual transmission of requests (this layer may eventually support other protocols such as gRPC). The client communicates with a single node in the cockroach cluster, called the gateway node. The gateway node is currently chosen arbitrarily.

  2. Gateway node: The gateway node routes client requests to the nodes that contain the relevant data; it performs this role to minimize the amount of logic that must be replicated in the clients.

    1. A request enters the gateway node in a server.Server, which routes the request to the kv.DBServer.

    2. kv.DBServer uses a kv.TxnCoordSender which manages transaction state such as timestamps and intents (note that it is different from the similarly-named client.txnSender).

    3. kv.TxnCoordSender wraps a kv.DistSender to forward the requests to the nodes that actually contain the data. DistSender uses a combination of the gossip network and queries to the first range to map request key ranges to server addresses. DistSender uses an rpc.Client to send the request, which uses the go net/rpc protocol with a custom codec (protocol buffers instead of gob)

  3. Range node: The gateway node may contact multiple range nodes as needed for the transaction.

    1. Requests enter the range node in an rpc.Server, which routes the request to a server.Node.

    2. Each node has one storage.Store per disk; the server.Node routes to the Store identified by the request (via a kv.LocalSender). Each store corresponds to one engine.Engine, a RocksDB instance, and one multiraft.MultiRaft object.

    3. Each Store has many storage.Ranges. Each range is one group within multiraft.MultiRaft object owned by the store. kv.LocalSender calls store.ExecuteCmd which finds the correct node and passes the command to Range.AddCmd.

    4. Range.AddCmd submits read/write commands to Raft (store.ProposeRaftCommand). Read-only commands will be executed immediately if the node has a read lease, otherwise they will either be forwarded to a node that is expected to have a lease or the caller will be instructed to retry elsewhere.

    5. MultiRaft.SubmitCommand forwards to etcd/raft.MultiNode.Propose.

  4. Raft leader: If the current node is not the raft leader, raft.MultiNode will forward the command to the leader.

    1. The leader appends the command to its log and then sends it to the followers.

    2. Once the leader has responses from a quorum of followers, the command is committed and it is returned to the storage layer via the MultiRaft.Events channel (which is being read by Store.processRaft).

  5. Range nodes: The following operations are repeated on each replica of the range.

    1. Store.processRaft reads the MultiRaft.Events channel. Each event is mapped to a Range and passed to Range.processRaftCommand.

    2. Range.processRaftCommand calls Range.executeCmd, which constructs an engine.Batch and passes it to a method like Range.Increment, which does the real work. In Raft terms, this is applying the command to the state machine (the entire rocksdb instance is a large but finite state machine) and must be deterministic, so that it performs exactly the same actions on each replica.

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.