Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 39 additions & 30 deletions site/content/3.10/aql/high-level-operations/upsert.md
Original file line number Diff line number Diff line change
Expand Up @@ -203,35 +203,42 @@ UPDATE { logins: OLD.logins + 1 } IN users
RETURN { doc: NEW, type: OLD ? 'update' : 'insert' }
```

## Transactionality

On a single server, upserts are executed transactionally in an all-or-nothing
fashion.

If the RocksDB engine is used and intermediate commits are enabled, a query may
execute intermediate transaction commits in case the running transaction (AQL
query) hits the specified size thresholds. In this case, the query's operations
carried out so far will be committed and not rolled back in case of a later
abort/rollback. That behavior can be controlled by adjusting the intermediate
commit settings for the RocksDB engine.

For sharded collections, the entire query and/or upsert operation may not be
transactional, especially if it involves different shards and/or DB-Servers.

## Limitations
## Transactionality and Limitations

- On a single server, upserts are generally executed transactionally in an
all-or-nothing fashion.

For sharded collections in cluster deployments, the entire query and/or upsert
operation may not be transactional, especially if it involves different shards,
DB-Servers, or both.

- Queries may execute intermediate transaction commits in case the running
transaction (AQL query) hits the specified size thresholds. This writes the
data that has been modified so far and it is not rolled back in case of a later
abort/rollback of the transaction.

Such **intermediate commits** can occur for `UPSERT` operations over all
documents of a large collection, for instance. This has the side-effect that
atomicity of this operation cannot be guaranteed anymore and ArangoDB cannot
guarantee that "read your own writes" in upserts work.

This is only an issue if you write a query where your search condition would
hit the same document multiple times, and only if you have large transactions.
You can adjust the behavior of the RocksDB storage engine by increasing the
`intermediateCommit` thresholds for data size and operation counts.

- The lookup and the insert/update/replace parts are executed one after
another, so that other operations in other threads can happen in
between. This means if multiple UPSERT queries run concurrently, they
between. This means if multiple `UPSERT` queries run concurrently, they
may all determine that the target document does not exist and then
create it multiple times!

Note that due to this gap between the lookup and insert/update/replace,
even with a unique index there may be duplicate key errors or conflicts.
even with a unique index, duplicate key errors or conflicts can occur.
But if they occur, the application/client code can execute the same query
again.

To prevent this from happening, one should add a unique index to the lookup
To prevent this from happening, you should add a unique index to the lookup
attribute(s). Note that in the cluster a unique index can only be created if
it is equal to the shard key attribute of the collection or at least contains
it as a part.
Expand All @@ -240,18 +247,20 @@ transactional, especially if it involves different shards and/or DB-Servers.
`exclusive` option to limit write concurrency for this collection to 1, which
helps avoiding conflicts but is bad for throughput!

- Using very large transactions in an UPSERT (e.g. UPSERT over all documents in
a collection) an **intermediate commit** can be triggered. This intermediate
commit will write the data that has been modified so far. However this will
have the side-effect that atomicity of this operation cannot be guaranteed
anymore and that ArangoDB cannot guarantee to that read your own writes in
upsert will work.
- `UPSERT` operations do not observe their own writes correctly in cluster
deployments. They only do for OneShard databases with the `cluster-one-shard`
optimizer rule active.

If upserts in a query create new documents and would then semantically hit the
same documents again, the operation may incorrectly use the `INSERT` branch to
create more documents instead of the `UPDATE`/`REPLACE` branch to update the
previously created documents.

This will only be an issue if you write a query where your search condition
would hit the same document multiple times, and only if you have large
transactions. In order to avoid this issues you can increase the
`intermediateCommit` thresholds for data and operation counts.
If upserts find existing documents for updating/replacing, you can access the
current document via the `OLD` pseudo-variable, but this may hold the initial
version of the document from before the query even if it has been modified
by `UPSERT` in the meantime.

- The lookup attribute(s) from the search expression should be indexed in order
to improve UPSERT performance. Ideally, the search expression contains the
to improve the `UPSERT` performance. Ideally, the search expression contains the
shard key, as this allows the lookup to be restricted to a single shard.
69 changes: 39 additions & 30 deletions site/content/3.11/aql/high-level-operations/upsert.md
Original file line number Diff line number Diff line change
Expand Up @@ -203,35 +203,42 @@ UPDATE { logins: OLD.logins + 1 } IN users
RETURN { doc: NEW, type: OLD ? 'update' : 'insert' }
```

## Transactionality

On a single server, upserts are executed transactionally in an all-or-nothing
fashion.

If the RocksDB engine is used and intermediate commits are enabled, a query may
execute intermediate transaction commits in case the running transaction (AQL
query) hits the specified size thresholds. In this case, the query's operations
carried out so far will be committed and not rolled back in case of a later
abort/rollback. That behavior can be controlled by adjusting the intermediate
commit settings for the RocksDB engine.

For sharded collections, the entire query and/or upsert operation may not be
transactional, especially if it involves different shards and/or DB-Servers.

## Limitations
## Transactionality and Limitations

- On a single server, upserts are generally executed transactionally in an
all-or-nothing fashion.

For sharded collections in cluster deployments, the entire query and/or upsert
operation may not be transactional, especially if it involves different shards,
DB-Servers, or both.

- Queries may execute intermediate transaction commits in case the running
transaction (AQL query) hits the specified size thresholds. This writes the
data that has been modified so far and it is not rolled back in case of a later
abort/rollback of the transaction.

Such **intermediate commits** can occur for `UPSERT` operations over all
documents of a large collection, for instance. This has the side-effect that
atomicity of this operation cannot be guaranteed anymore and ArangoDB cannot
guarantee that "read your own writes" in upserts work.

This is only an issue if you write a query where your search condition would
hit the same document multiple times, and only if you have large transactions.
You can adjust the behavior of the RocksDB storage engine by increasing the
`intermediateCommit` thresholds for data size and operation counts.

- The lookup and the insert/update/replace parts are executed one after
another, so that other operations in other threads can happen in
between. This means if multiple UPSERT queries run concurrently, they
between. This means if multiple `UPSERT` queries run concurrently, they
may all determine that the target document does not exist and then
create it multiple times!

Note that due to this gap between the lookup and insert/update/replace,
even with a unique index there may be duplicate key errors or conflicts.
even with a unique index, duplicate key errors or conflicts can occur.
But if they occur, the application/client code can execute the same query
again.

To prevent this from happening, one should add a unique index to the lookup
To prevent this from happening, you should add a unique index to the lookup
attribute(s). Note that in the cluster a unique index can only be created if
it is equal to the shard key attribute of the collection or at least contains
it as a part.
Expand All @@ -240,18 +247,20 @@ transactional, especially if it involves different shards and/or DB-Servers.
`exclusive` option to limit write concurrency for this collection to 1, which
helps avoiding conflicts but is bad for throughput!

- Using very large transactions in an UPSERT (e.g. UPSERT over all documents in
a collection) an **intermediate commit** can be triggered. This intermediate
commit will write the data that has been modified so far. However this will
have the side-effect that atomicity of this operation cannot be guaranteed
anymore and that ArangoDB cannot guarantee to that read your own writes in
upsert will work.
- `UPSERT` operations do not observe their own writes correctly in cluster
deployments. They only do for OneShard databases with the `cluster-one-shard`
optimizer rule active.

If upserts in a query create new documents and would then semantically hit the
same documents again, the operation may incorrectly use the `INSERT` branch to
create more documents instead of the `UPDATE`/`REPLACE` branch to update the
previously created documents.

This will only be an issue if you write a query where your search condition
would hit the same document multiple times, and only if you have large
transactions. In order to avoid this issues you can increase the
`intermediateCommit` thresholds for data and operation counts.
If upserts find existing documents for updating/replacing, you can access the
current document via the `OLD` pseudo-variable, but this may hold the initial
version of the document from before the query even if it has been modified
by `UPSERT` in the meantime.

- The lookup attribute(s) from the search expression should be indexed in order
to improve UPSERT performance. Ideally, the search expression contains the
to improve the `UPSERT` performance. Ideally, the search expression contains the
shard key, as this allows the lookup to be restricted to a single shard.
69 changes: 39 additions & 30 deletions site/content/3.12/aql/high-level-operations/upsert.md
Original file line number Diff line number Diff line change
Expand Up @@ -203,35 +203,42 @@ UPDATE { logins: OLD.logins + 1 } IN users
RETURN { doc: NEW, type: OLD ? 'update' : 'insert' }
```

## Transactionality

On a single server, upserts are executed transactionally in an all-or-nothing
fashion.

If the RocksDB engine is used and intermediate commits are enabled, a query may
execute intermediate transaction commits in case the running transaction (AQL
query) hits the specified size thresholds. In this case, the query's operations
carried out so far will be committed and not rolled back in case of a later
abort/rollback. That behavior can be controlled by adjusting the intermediate
commit settings for the RocksDB engine.

For sharded collections, the entire query and/or upsert operation may not be
transactional, especially if it involves different shards and/or DB-Servers.

## Limitations
## Transactionality and Limitations

- On a single server, upserts are generally executed transactionally in an
all-or-nothing fashion.

For sharded collections in cluster deployments, the entire query and/or upsert
operation may not be transactional, especially if it involves different shards,
DB-Servers, or both.

- Queries may execute intermediate transaction commits in case the running
transaction (AQL query) hits the specified size thresholds. This writes the
data that has been modified so far and it is not rolled back in case of a later
abort/rollback of the transaction.

Such **intermediate commits** can occur for `UPSERT` operations over all
documents of a large collection, for instance. This has the side-effect that
atomicity of this operation cannot be guaranteed anymore and ArangoDB cannot
guarantee that "read your own writes" in upserts work.

This is only an issue if you write a query where your search condition would
hit the same document multiple times, and only if you have large transactions.
You can adjust the behavior of the RocksDB storage engine by increasing the
`intermediateCommit` thresholds for data size and operation counts.

- The lookup and the insert/update/replace parts are executed one after
another, so that other operations in other threads can happen in
between. This means if multiple UPSERT queries run concurrently, they
between. This means if multiple `UPSERT` queries run concurrently, they
may all determine that the target document does not exist and then
create it multiple times!

Note that due to this gap between the lookup and insert/update/replace,
even with a unique index there may be duplicate key errors or conflicts.
even with a unique index, duplicate key errors or conflicts can occur.
But if they occur, the application/client code can execute the same query
again.

To prevent this from happening, one should add a unique index to the lookup
To prevent this from happening, you should add a unique index to the lookup
attribute(s). Note that in the cluster a unique index can only be created if
it is equal to the shard key attribute of the collection or at least contains
it as a part.
Expand All @@ -240,18 +247,20 @@ transactional, especially if it involves different shards and/or DB-Servers.
`exclusive` option to limit write concurrency for this collection to 1, which
helps avoiding conflicts but is bad for throughput!

- Using very large transactions in an UPSERT (e.g. UPSERT over all documents in
a collection) an **intermediate commit** can be triggered. This intermediate
commit will write the data that has been modified so far. However this will
have the side-effect that atomicity of this operation cannot be guaranteed
anymore and that ArangoDB cannot guarantee to that read your own writes in
upsert will work.
- `UPSERT` operations do not observe their own writes correctly in cluster
deployments. They only do for OneShard databases with the `cluster-one-shard`
optimizer rule active.

If upserts in a query create new documents and would then semantically hit the
same documents again, the operation may incorrectly use the `INSERT` branch to
create more documents instead of the `UPDATE`/`REPLACE` branch to update the
previously created documents.

This will only be an issue if you write a query where your search condition
would hit the same document multiple times, and only if you have large
transactions. In order to avoid this issues you can increase the
`intermediateCommit` thresholds for data and operation counts.
If upserts find existing documents for updating/replacing, you can access the
current document via the `OLD` pseudo-variable, but this may hold the initial
version of the document from before the query even if it has been modified
by `UPSERT` in the meantime.

- The lookup attribute(s) from the search expression should be indexed in order
to improve UPSERT performance. Ideally, the search expression contains the
to improve the `UPSERT` performance. Ideally, the search expression contains the
shard key, as this allows the lookup to be restricted to a single shard.