Skip to content

Commit

Permalink
Stream Transaction HTTP docs (#8833)
Browse files Browse the repository at this point in the history
  • Loading branch information
graetzer authored and jsteemann committed Apr 29, 2019
1 parent b979aa4 commit b110f5f
Show file tree
Hide file tree
Showing 20 changed files with 605 additions and 57 deletions.
3 changes: 0 additions & 3 deletions Documentation/Books/HTTP/AqlQueryCursor/README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,6 @@
HTTP Interface for AQL Query Cursors
====================================

Database Cursors
----------------

This is an introduction to ArangoDB's HTTP Interface for Queries. Results of AQL
and simple queries are returned as cursors in order to batch the communication
between server and client. Each call returns a number of documents in a batch
Expand Down
2 changes: 2 additions & 0 deletions Documentation/Books/HTTP/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,8 @@
* [ArangoSearch Views](Views/ArangoSearch.md)
* [Analyzers](Analyzers/README.md)
* [Transactions](Transaction/README.md)
* [Stream Transactions](Transaction/StreamTransaction.md)
* [JavaScript Transactions](Transaction/JsTransaction.md)
* [Replication](Replications/README.md)
* [Replication Dump](Replications/ReplicationDump.md)
* [Replication Logger](Replications/ReplicationLogger.md)
Expand Down
24 changes: 24 additions & 0 deletions Documentation/Books/HTTP/Transaction/JsTransaction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
HTTP Interface for JavaScript Transactions
==========================================

ArangoDB's JS-transactions are executed on the server. Transactions can be
initiated by clients by sending the transaction description for execution to
the server.

JS-Transactions in ArangoDB do not offer separate *BEGIN*, *COMMIT* and *ROLLBACK*
operations. Instead, ArangoDB JS-transactions are described by a JavaScript function,
and the code inside the JavaScript function will then be executed transactionally.

At the end of the function, the transaction is automatically committed, and all
changes done by the transaction will be persisted. If an exception is thrown
during transaction execution, all operations performed in the transaction are
rolled back.

For a more detailed description of how transactions work in ArangoDB please
refer to [Transactions](../../Manual/Transactions/index.html).

<!-- RestTransactionHandler.cpp -->

@startDocuBlock post_api_transaction


39 changes: 26 additions & 13 deletions Documentation/Books/HTTP/Transaction/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,33 @@ HTTP Interface for Transactions
### Transactions

ArangoDB's transactions are executed on the server. Transactions can be
initiated by clients by sending the transaction description for execution to
the server.
executed by clients in two different ways:

Transactions in ArangoDB do not offer separate *BEGIN*, *COMMIT* and *ROLLBACK*
operations as they are available in many other database products.
Instead, ArangoDB transactions are described by a JavaScript function, and the
code inside the JavaScript function will then be executed transactionally.
At the end of the function, the transaction is automatically committed, and all
changes done by the transaction will be persisted. If an exception is thrown
during transaction execution, all operations performed in the transaction are
rolled back.
1. Via the [Stream Transaction](StreamTransaction.md) API
2. Via the [JavaScript Transaction](JsTransaction.md) API

For a more detailed description of how transactions work in ArangoDB please
The difference between these two is not difficult to understand, a short primer
is listed below.
For a more detailed description of how transactions work in ArangoDB and
what guarantees ArangoDB can deliver please
refer to [Transactions](../../Manual/Transactions/index.html).

<!-- js/actions/api-transaction.js -->
@startDocuBlock post_api_transaction

### Stream Transactions

[Stream Transactions](StreamTransaction.md) allow you to perform a multi-document transaction
with individual begin and commit / abort commands. This is similar to
the way traditional RDBMS do it with *BEGIN*, *COMMIT* and *ROLLBACK* operations.

This the recommended API for larger transactions. However the client is responsible
for making sure that the transaction is committed or aborted when it is no longer needed,
to avoid taking up resources.

### JavaScript Transactions

[JS-Transactions](JsTransaction.md) allow you to send the server
a dedicated piece of JavaScript code (i.e. a function), which will be executed transactionally.

At the end of the function, the transaction is automatically committed, and all
changes done by the transaction will be persisted. No interaction is required by
the client beyond the initial start request.
64 changes: 64 additions & 0 deletions Documentation/Books/HTTP/Transaction/StreamTransaction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
HTTP Interface for Stream Transactions
======================================

*Stream Transactions* allow you to perform a multi-document transaction
with individual begin and commit / abort commands. This is similar to
the way traditional RDBMS do it with *BEGIN*, *COMMIT* and *ROLLBACK* operations.

To use a stream transaction a client first sends the (configuration)[#begin-a-transaction]
of the transaction to the ArangoDB server.

{% hint 'info' %}
Contrary to the [JS-Transaction](JsTransaction.md) the definition of this
transaction must only contain the collections which are going to be used
and (optionally) the various transaction options supported by ArangoDB.
No *action* attribute is supported.
{% endhint %}

The stream transaction API works in *conjunction* with other APIs in ArangoDB.
To use the transaction for a supported operation a client needs to specify
the transaction identifier in the *x-arango-trx-id* header on each request.
This will automatically cause these operations to use the specified transaction.

Supported transactional API operations include:

1. All operations in the [Document API](../Document/WorkingWithDocuments.md)
2. Number of documents via the [Collection API](../Collection/Getting.md#return-number-of-documents-in-a-collection)
3. Truncate a collection via the [Collection API](../Collection/Getting.md#return-number-of-documents-in-a-collection)
4. Create an AQL cursor via the [Cursor API](../AqlQueryCursor/AccessingCursors.md)

Note that a client *always needs to start the transaction first* and it is required to
explicitly specify the collections used for write accesses. The client is responsible
for making sure that the transaction is committed or aborted when it is no longer needed.
This avoids taking up resources on the ArangoDB server.

For a more detailed description of how transactions work in ArangoDB please
refer to [Transactions](../../Manual/Transactions/index.html).

Begin a Transaction
-------------------

<!-- RestTransactionHandler.cpp -->

@startDocuBlock post_api_transaction_begin

Check Status of a Transaction
-----------------------------

@startDocuBlock get_api_transaction

Commit or Abort a Transaction
-----------------------------

Committing or aborting a running transaction must be done by the client.
It is *bad practice* to not commit or abort a transaction once you are done
using it. It will force the server to keep resources and collection locks
until the entire transaction times out.

<!-- RestTransactionHandler.cpp -->

@startDocuBlock put_api_transaction

<!-- RestTransactionHandler.cpp -->

@startDocuBlock delete_api_transaction
Original file line number Diff line number Diff line change
Expand Up @@ -70,9 +70,11 @@ and restarted as needed.

_DBservers_ are the ones where the data is actually hosted. They
host shards of data and using synchronous replication a _DBServer_ may
either be _leader_ or _follower_ for a shard.
either be _leader_ or _follower_ for a shard. Document operations are first
applied on the _leader_ and then synchronously replicated to
all followers.

They should not be accessed from the outside but indirectly through the
Shards must not be accessed from the outside but indirectly through the
_Coordinators_. They may also execute queries in part or as a whole when
asked by a _Coordinator_.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -241,7 +241,7 @@ negatively impact the write performance:
This will allow you to maintain steady write throughput even under very high load.
- Transactions are held in-memory before they are committed.
This means that transactions have to be split if they become too big, see the
[limitations section](../Transactions/Limitations.md#with-rocksdb-storage-engine).
[limitations section](../Transactions/Limitations.md#rocksdb-storage-engine).

### Improving Update Query Perfromance

Expand Down
13 changes: 13 additions & 0 deletions Documentation/Books/Manual/ReleaseNotes/NewFeatures35.md
Original file line number Diff line number Diff line change
Expand Up @@ -247,6 +247,19 @@ The HTTP API for running Foxx service tests now supports a `filter` attribute,
which can be used to limit which test cases should be executed.


### Stream Transaction API

There is a new HTTP API for transactions. This API allows clients to add operations to a
transaction in a streaming fashion. A transaction can consist of a series of supported
transactional operations, followed by a commit or abort command.
This allows clients to construct larger transactions in a more efficent way than
with JavaScript-based transactions.

Note that this requires client applications to abort transactions which are no
longer necessary. Otherwise resources and locks acquired by the transactions
will hang around until the server decides to garbage-collect them.


Web interface
-------------

Expand Down
35 changes: 23 additions & 12 deletions Documentation/Books/Manual/Transactions/Durability.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
Durability
==========

Transactions are executed in main memory first until there is either a rollback
or a commit. On rollback, no data will be written to disk, but the operations
from the transaction will be reversed in memory.
Transactions are executed until there is either a rollback
or a commit. On rollback the operations from the transaction will be reversed.

On commit, all modifications done in the transaction will be written to the
collection datafiles. These writes will be synchronized to disk if any of the
Expand Down Expand Up @@ -54,21 +53,33 @@ full durability for single collection transactions. Using the delayed synchroniz
and performance of transactions, but will introduce the risk of losing the last
committed transactions in the case of a crash.

In contrast, transactions that modify data in more than one collection are
automatically synchronized to disk. This comes at the cost of several disk sync.
For a multi-collection transaction, the call to the *_executeTransaction* function
The call to the *_executeTransaction* function
will only return after the data of all modified collections has been synchronized
to disk and the transaction has been made fully durable. This not only reduces the
risk of losing data in case of a crash but also ensures consistency after a
restart.

MMFiles Storage Engine
----------------------

The MMFiles storage engine continuously writes the transaction operation into
a journal file on the disk (Journal is sometimes also referred to as write-ahead-log).

This means that the commit operation can be very fast because the engine only needs
to write the *commit* marker into the journal (and perform a disk-sync if
*waitForSync* was set to *true*). This also means that failed or aborted
transactions need to be rolled back by reversing every single operation.

In case of a server crash, any multi-collection transactions that were not yet
committed or in preparation to be committed will be rolled back on server restart.

For multi-collection transactions, there will be at least one disk sync operation
per modified collection. Multi-collection transactions thus have a potentially higher
cost than single collection transactions. There is no configuration to turn off disk
synchronization for multi-collection transactions in ArangoDB.
The disk sync speed of the system will thus be the most important factor for the
performance of multi-collection transactions.
RocksDB Storage Engine
----------------------

The RocksDB Storage Engine applies operations in a transaction only in main memory
until they are committed. In case of an a rollback the entire transaction is just
cleared, no extra rollback steps are required.

In the event of a server-crash the storage engine will scan the write-ahead log
to restore certain meta-data like the number of documents in collection
or the selectivity estimates of secondary indexes.
72 changes: 60 additions & 12 deletions Documentation/Books/Manual/Transactions/Limitations.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@ In General
----------

Transactions in ArangoDB have been designed with particular use cases
in mind. They will be mainly useful for short and small data retrieval
in mind. They will be mainly useful for *short and small* data retrieval
and/or modification operations.

The implementation is not optimized for very long-running or very voluminous
The implementation is **not** optimized for *very long-running* or *very voluminous*
operations, and may not be usable for these cases.

One limitation is that a transaction operation information must fit into main
Expand Down Expand Up @@ -53,28 +53,76 @@ unregistered collection used in transaction*.
It is legal to not declare read-only collections, but this should be avoided if
possible to reduce the probability of deadlocks and non-repeatable reads.

Please refer to [Locking and Isolation](LockingAndIsolation.md) for more details.

In Clusters
-----------

Using a single instance of ArangoDB, multi-document / multi-collection queries
are guaranteed to be fully ACID. This is more than many other NoSQL database
systems support. In cluster mode, single-document operations are also fully ACID.
Multi-document / multi-collection queries in a cluster are not ACID, which is
equally the case with competing database systems. Transactions in a cluster
will be supported in a future version of ArangoDB and make these operations
fully ACID as well.
are guaranteed to be fully ACID in the [traditional sense](https://en.wikipedia.org/wiki/ACID_(computer_science)).
This is more than many other NoSQL database systems support.
In cluster mode, single-document operations are also *fully ACID*.

Multi-document / multi-collection queries and transactions offer different guarantees.
Understanding these differences is important when designing applications that need
to be resilient agains outages of individual servers.

Cluster transactions share the underlying characteristics of the [storage engine](../Architecture/StorageEngines.md)
that is used for the cluster deployment.
A transaction started on a Coordinator translates to one transaction per involved DBServer.
The guarantees and characteristics of the given storage-engine apply additionally
to the cluster specific information below.
Please refer to [Locking and Isolation](LockingAndIsolation.md) for more details
on the storage-engines.

### Atomicity

A transaction on *one DBServer* is either committed completely or not at all.

ArangoDB transactions do currently not require any form of global consensus. This makes
them relatively fast, but also vulnerable to unexpected server outages.

Should a transaction involve [Leader Shards](../Architecture/DeploymentModes/Cluster/Architecture.md#dbservers)
on *multiple DBServers*, the atomicity of the distributed transaction *during the commit operation* can
not be guaranteed. Should one of the involve DBServers fails during the commit the transaction
is not rolled-back globally, sub-transactions may have been committed on some DBServers, but not on others.
Should this case occur the client application will see an error.

An improved failure handling issue might be introduced in future versions.

### Consistency

We provide consistency even in the cluster, a transaction will never leave the data in
an incorrect or corrupt state.

In ArangoDB there is always exactly one DBServer responsible for a given shard. In both
Storage-Engines the locking procedures ensure that dependent transactions (in the sense that
the transactions modify the same documents or unique index entries) are ordered sequentially.
Therefore we can provide [Causal-Consistency](https://en.wikipedia.org/wiki/Consistency_model#Causal_consistency)
for your transactions.

From the applications point-of-view this also means that a given transaction can always
[read it's own writes](https://en.wikipedia.org/wiki/Consistency_model#Read-your-writes_consistency).
Other concurrent operations will not change the database state seen by a transaction.

### Isolation

The ArangoDB Cluster provides *Local Snapshot Isolation*. This means that all operations
and queries in the transactions will see the same version, or snapshot, of the data on a given
DBServer. This snapshot is based on the state of the data at the moment in
time when the transaction begins *on that DBServer*.

### Durability

It is guaranteed that successfully committed transactions are persistent. Using
replication and / or *waitForSync* increases the durability (Just as with the single-server).

With RocksDB storage engine
RocksDB storage engine
---------------------------

{% hint 'info' %}
The following restrictions and limitations do not apply to JavaScript
transactions, since their intended use case is for smaller transactions
with full transactional guarantees. So the following only applies
to AQL transactions and transactions created through the document API.
to AQL queries and transactions created through the document API (i.e. batch operations).
{% endhint %}

Data of ongoing transactions is stored in RAM. Transactions that get too big
Expand Down
13 changes: 13 additions & 0 deletions Documentation/Books/Manual/Transactions/LockingAndIsolation.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,9 @@ The *RocksDB* engine does not lock any collections participating in a transactio
for read. Read operations can run in parallel to other read or write operations on the
same collections.


### Locking

For all collections that are used in write mode, the RocksDB engine will internally
acquire a (shared) read lock. This means that many writers can modify data in the same
collection in parallel (and also run in parallel to ongoing reads). However, if two
Expand All @@ -73,6 +76,16 @@ Exclusive accesses will internally acquire a write-lock on the collections, so t
are not executed in parallel with any other write operations. Read operations can still
be carried out by other concurrent transactions.

### Isolation

The RocksDB storage-engine provides *snapshot isolation*. This means that all operations
and queries in the transactions will see the same version, or snapshot, of the database.
This snapshot is based on the state of the database at the moment in time when the transaction
begins. No locks are acquired on the underlying data to keep this snapshot, which permits
other transactions to execute without being blocked by an older uncompleted transaction
(so long as they do not try to modify the same documents or unique index-entries concurrently).
In the cluster a snapshot is acquired on each DBServer individually.

Lazily adding collections
-------------------------

Expand Down

0 comments on commit b110f5f

Please sign in to comment.