Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions site/content/3.12/index-and-search/analyzers.md
Original file line number Diff line number Diff line change
Expand Up @@ -683,6 +683,18 @@ An Analyzer capable of converting the input into a set of language-specific
tokens. This makes comparisons follow the rules of the respective language,
most notable in range queries against Views.

For example, the Swedish alphabet has 29 letters: `a` to `z` plus `å`, `ä`, and
`ö`, in that order. Using a Swedish locale (like `sv`), the sorting order is
`å` after `z`, whereas using an English locale (like `en`), it is `å` after `a`.
This impacts queries with `SEARCH` expressions like `doc.text < "c"`, excluding
`å` when using a Swedish locale but including it when using an English locale.

{{< info >}}
Sorting by the output of the `collation` Analyzer like
`SORT TOKENS(<text>, <collationAnalyzer>)` is not a supported feature and
doesn't produce meaningful results.
{{< /info >}}

The *properties* allowed for this Analyzer are an object with the following
attributes:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -170,26 +170,18 @@ onward and will be removed in a future version.
You can use [Stream Transactions](../../develop/transactions/stream-transactions.md)
instead in most cases, and in some cases AQL can be sufficient.

## Breaking changes to the `collation` Analyzer

The [`collation` Analyzer](../../index-and-search/analyzers.md#collation) lets
you adhere to the alphabetic order of a language in range queries. For example,
using a Swedish locale (`sv`), the sorting order is `å` after `z`, whereas using
an English locale (`en`), `å` is preceded by `a`. This impacts queries with
`SEARCH` expressions like `doc.text < "c"`, excluding `å` when using the Swedish
locale.

ArangoDB 3.12 bundles an upgraded version of the ICU library. It is used for
Unicode character handling including text sorting. Because of changes in ICU,
data produced by the `collation` Analyzer in previous versions is not compatible
with ArangoDB v3.12. You need to **recreate inverted indexes and Views that use
`collation` Analyzers** to ensure that they work correctly. Otherwise,
range queries involving the `collation` Analyzers and indexes created in v3.11
or older versions may behave in unpredicted ways.

Note that sorting by the output of the `collation` Analyzer like
`SORT TOKENS(<text>, <collationAnalyzer>)` is still not a supported feature and
doesn't produce meaningful results.
## Incompatibilities with Unicode text between core and JavaScript

ArangoDB 3.12 uses the ICU library for Unicode handling in version 64 for its core
(ArangoSearch, AQL, RocksDB) but version 73 in [JavaScript contexts](../../develop/javascript-api/_index.md).
If you compare or sort string values with JavaScript and with the core, the values
may not match between the two or have a different order. This is due to changes
in the Unicode standard and the binary representation of strings for comparisons.

You can be affected if you use JavaScript-based features like Foxx microservices
or user-defined AQL functions (UDFs), compare or sort strings in them, and
Unicode characters for which the standard has changed between the two ICU versions
are involved.

## Control character escaping in audit log

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,3 +46,4 @@ Note that this page does not list all open issues.
| **Date Added:** 2024-03-21 <br> **Component:** arangod <br> **Deployment Mode:** All <br> **Description:** When creating an `inverted` index with the `inBackground` option enabled, HTTP API calls like `http://localhost:8529/_api/index?collection=<coll>&withHidden=true` don't return the `isBuilding` and `progress` attributes and the progress of the index building can thus not be observed. <br> **Affected Versions:** 3.10.13, 3.11.7, 3.12.x <br> **Fixed in Versions:** - <br> **Reference:** [BTS-1788](https://arangodb.atlassian.net/browse/BTS-1788) (internal) |
| **Date Added:** 2024-03-28 <br> **Component:** arangod <br> **Deployment Mode:** Cluster <br> **Description:** During startup or upgrade from a previous minor version, Agent nodes crash if the `--cluster.force-one-shard` option is enabled. Workaround: Don't use the `--cluster.force-one-shard` option (or set it to `false`) for Agents. <br> **Affected Versions:** 3.12.0 <br> **Fixed in Versions:** 3.12.1 <br> **Reference:** [BTS-1839](https://arangodb.atlassian.net/browse/BTS-1839) (internal) |
| **Date Added:** 2024-03-28 <br> **Component:** arangod <br> **Deployment Mode:** Cluster <br> **Description:** In a cluster, creating an EnterpriseGraph fails in OneShard databases (created with the option `{"sharding": "single"}`). EnterpriseGraphs can still be created in a single server deployment, if the sharding option was not set to `single` during the database creation. <br> **Affected Versions:** 3.12.x <br> **Fixed in Versions:** - <br> **Reference:** [BTS-1841](https://arangodb.atlassian.net/browse/BTS-1841) (internal) |
| **Date Added:** 2024-04-24 <br> **Component:** arangod <br> **Deployment Mode:** All <br> **Description:** ArangoDB uses the ICU library for Unicode handling in version 64 for its core (ArangoSearch, AQL, RocksDB) but version 73 in [JavaScript contexts](../../develop/javascript-api/_index.md) since v3.12.0. If you compare or sort string values with JavaScript and with the core, the values may not match or have a different order. This is due to changes in the Unicode standard and the binary representation of strings for comparisons. You can be affected if you use JavaScript-based features like Foxx microservices or user-defined AQL functions (UDFs), compare or sort strings in them, and Unicode characters for which the standard has changed between the two ICU versions are involved. <br> **Affected Versions:** 3.12.x <br> **Fixed in Versions:** - <br> **Reference:** [BTS-1854](https://arangodb.atlassian.net/browse/BTS-1854) (internal) |
Original file line number Diff line number Diff line change
Expand Up @@ -716,8 +716,9 @@ full, log entries are written synchronously until the queue has space again.
### V8 and ICU library upgrades

The bundled V8 JavaScript engine has been upgraded from version 7.9.317 to
12.1.165. As part of this upgrade, the bundled Unicode character handling library
ICU has been upgraded as well, from version 64.2 to 73.1.
12.1.165. As part of this upgrade, the Unicode character handling library
ICU has been upgraded as well, from version 64.2 to 73.1 (but only for
JavaScript contexts, see [Incompatible changes in ArangoDB 3.12](incompatible-changes-in-3-12.md#incompatibilities-with-unicode-text-between-core-and-javascript)).

Note that ArangoDB's build of V8 has pointer compression disabled to allow for
more than 4 GB of heap memory.
Expand Down
12 changes: 12 additions & 0 deletions site/content/3.13/index-and-search/analyzers.md
Original file line number Diff line number Diff line change
Expand Up @@ -683,6 +683,18 @@ An Analyzer capable of converting the input into a set of language-specific
tokens. This makes comparisons follow the rules of the respective language,
most notable in range queries against Views.

For example, the Swedish alphabet has 29 letters: `a` to `z` plus `å`, `ä`, and
`ö`, in that order. Using a Swedish locale (like `sv`), the sorting order is
`å` after `z`, whereas using an English locale (like `en`), it is `å` after `a`.
This impacts queries with `SEARCH` expressions like `doc.text < "c"`, excluding
`å` when using a Swedish locale but including it when using an English locale.

{{< info >}}
Sorting by the output of the `collation` Analyzer like
`SORT TOKENS(<text>, <collationAnalyzer>)` is not a supported feature and
doesn't produce meaningful results.
{{< /info >}}

The *properties* allowed for this Analyzer are an object with the following
attributes:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -170,26 +170,18 @@ onward and will be removed in a future version.
You can use [Stream Transactions](../../develop/transactions/stream-transactions.md)
instead in most cases, and in some cases AQL can be sufficient.

## Breaking changes to the `collation` Analyzer

The [`collation` Analyzer](../../index-and-search/analyzers.md#collation) lets
you adhere to the alphabetic order of a language in range queries. For example,
using a Swedish locale (`sv`), the sorting order is `å` after `z`, whereas using
an English locale (`en`), `å` is preceded by `a`. This impacts queries with
`SEARCH` expressions like `doc.text < "c"`, excluding `å` when using the Swedish
locale.

ArangoDB 3.12 bundles an upgraded version of the ICU library. It is used for
Unicode character handling including text sorting. Because of changes in ICU,
data produced by the `collation` Analyzer in previous versions is not compatible
with ArangoDB v3.12. You need to **recreate inverted indexes and Views that use
`collation` Analyzers** to ensure that they work correctly. Otherwise,
range queries involving the `collation` Analyzers and indexes created in v3.11
or older versions may behave in unpredicted ways.

Note that sorting by the output of the `collation` Analyzer like
`SORT TOKENS(<text>, <collationAnalyzer>)` is still not a supported feature and
doesn't produce meaningful results.
## Incompatibilities with Unicode text between core and JavaScript

ArangoDB 3.12 uses the ICU library for Unicode handling in version 64 for its core
(ArangoSearch, AQL, RocksDB) but version 73 in [JavaScript contexts](../../develop/javascript-api/_index.md).
If you compare or sort string values with JavaScript and with the core, the values
may not match between the two or have a different order. This is due to changes
in the Unicode standard and the binary representation of strings for comparisons.

You can be affected if you use JavaScript-based features like Foxx microservices
or user-defined AQL functions (UDFs), compare or sort strings in them, and
Unicode characters for which the standard has changed between the two ICU versions
are involved.

## Control character escaping in audit log

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,3 +46,4 @@ Note that this page does not list all open issues.
| **Date Added:** 2024-03-21 <br> **Component:** arangod <br> **Deployment Mode:** All <br> **Description:** When creating an `inverted` index with the `inBackground` option enabled, HTTP API calls like `http://localhost:8529/_api/index?collection=<coll>&withHidden=true` don't return the `isBuilding` and `progress` attributes and the progress of the index building can thus not be observed. <br> **Affected Versions:** 3.10.13, 3.11.7, 3.12.x <br> **Fixed in Versions:** - <br> **Reference:** [BTS-1788](https://arangodb.atlassian.net/browse/BTS-1788) (internal) |
| **Date Added:** 2024-03-28 <br> **Component:** arangod <br> **Deployment Mode:** Cluster <br> **Description:** During startup or upgrade from a previous minor version, Agent nodes crash if the `--cluster.force-one-shard` option is enabled. Workaround: Don't use the `--cluster.force-one-shard` option (or set it to `false`) for Agents. <br> **Affected Versions:** 3.12.0 <br> **Fixed in Versions:** 3.12.1 <br> **Reference:** [BTS-1839](https://arangodb.atlassian.net/browse/BTS-1839) (internal) |
| **Date Added:** 2024-03-28 <br> **Component:** arangod <br> **Deployment Mode:** Cluster <br> **Description:** In a cluster, creating an EnterpriseGraph fails in OneShard databases (created with the option `{"sharding": "single"}`). EnterpriseGraphs can still be created in a single server deployment, if the sharding option was not set to `single` during the database creation. <br> **Affected Versions:** 3.12.x <br> **Fixed in Versions:** - <br> **Reference:** [BTS-1841](https://arangodb.atlassian.net/browse/BTS-1841) (internal) |
| **Date Added:** 2024-04-24 <br> **Component:** arangod <br> **Deployment Mode:** All <br> **Description:** ArangoDB uses the ICU library for Unicode handling in version 64 for its core (ArangoSearch, AQL, RocksDB) but version 73 in [JavaScript contexts](../../develop/javascript-api/_index.md) since v3.12.0. If you compare or sort string values with JavaScript and with the core, the values may not match or have a different order. This is due to changes in the Unicode standard and the binary representation of strings for comparisons. You can be affected if you use JavaScript-based features like Foxx microservices or user-defined AQL functions (UDFs), compare or sort strings in them, and Unicode characters for which the standard has changed between the two ICU versions are involved. <br> **Affected Versions:** 3.12.x <br> **Fixed in Versions:** - <br> **Reference:** [BTS-1854](https://arangodb.atlassian.net/browse/BTS-1854) (internal) |
Original file line number Diff line number Diff line change
Expand Up @@ -716,8 +716,9 @@ full, log entries are written synchronously until the queue has space again.
### V8 and ICU library upgrades

The bundled V8 JavaScript engine has been upgraded from version 7.9.317 to
12.1.165. As part of this upgrade, the bundled Unicode character handling library
ICU has been upgraded as well, from version 64.2 to 73.1.
12.1.165. As part of this upgrade, the Unicode character handling library
ICU has been upgraded as well, from version 64.2 to 73.1 (but only for
JavaScript contexts, see [Incompatible changes in ArangoDB 3.12](incompatible-changes-in-3-12.md#incompatibilities-with-unicode-text-between-core-and-javascript)).

Note that ArangoDB's build of V8 has pointer compression disabled to allow for
more than 4 GB of heap memory.
Expand Down