From 1ade64b370b5791b6adf974a1fd7d538b57cbd65 Mon Sep 17 00:00:00 2001 From: Alexey Bakharew Date: Wed, 30 Jun 2021 15:45:04 +0300 Subject: [PATCH 1/3] doc update after implementing prefix in levenshtein_match --- 3.7/aql/functions-arangosearch.md | 14 +++++++++++++- 3.8/aql/functions-arangosearch.md | 12 ++++++++++++ 2 files changed, 25 insertions(+), 1 deletion(-) diff --git a/3.7/aql/functions-arangosearch.md b/3.7/aql/functions-arangosearch.md index bd683476ff..6072614ddf 100644 --- a/3.7/aql/functions-arangosearch.md +++ b/3.7/aql/functions-arangosearch.md @@ -710,7 +710,7 @@ FOR doc IN viewName Introduced in: v3.7.0 -`LEVENSHTEIN_MATCH(path, target, distance, transpositions, maxTerms) → fulfilled` +`LEVENSHTEIN_MATCH(path, target, distance, transpositions, maxTerms, prefix) → fulfilled` Match documents with a [Damerau-Levenshtein distance](https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance){:target=_"blank"} lower than or equal to *distance* between the stored attribute value and @@ -732,6 +732,9 @@ if you want to calculate the edit distance of two strings. impact performance negatively. The default value is `64`. - returns **fulfilled** (bool): `true` if the calculated distance is less than or equal to *distance*, `false` otherwise +- **prefix** (string, _optional_): if defined, Levenshtein or Damerau-Levenshtein + distance is computed for documents which contains specified prefix. The default value + is empty string. The Levenshtein distance between _quick_ and _quikc_ is `2` because it requires two operations to go from one to the other (remove _k_, insert _k_ at a @@ -751,6 +754,15 @@ FOR doc IN viewName RETURN doc.text ``` +Match documents on levenshtein distance 1 with prefix `qui`. All edit operations +is applied to term `kc`. Prefix `qui` is constant. + +```js +FOR doc IN viewName + SEARCH LEVENSHTEIN_MATCH(doc.text, "kc", 1, false, 64, "qui") // matches "quick" + RETURN doc.text +``` + You may want to pick the maximum edit distance based on string length. If the stored attribute is the string _quick_ and the target string is _quicksands_, then the Levenshtein distance is 5, with 50% of the diff --git a/3.8/aql/functions-arangosearch.md b/3.8/aql/functions-arangosearch.md index d063d1657e..44587b5459 100644 --- a/3.8/aql/functions-arangosearch.md +++ b/3.8/aql/functions-arangosearch.md @@ -732,6 +732,9 @@ if you want to calculate the edit distance of two strings. impact performance negatively. The default value is `64`. - returns **fulfilled** (bool): `true` if the calculated distance is less than or equal to *distance*, `false` otherwise +- **prefix** (string, _optional_): if defined, Levenshtein or Damerau-Levenshtein + distance is computed for documents which contains specified prefix. The default value + is empty string. The Levenshtein distance between _quick_ and _quikc_ is `2` because it requires two operations to go from one to the other (remove _k_, insert _k_ at a @@ -751,6 +754,15 @@ FOR doc IN viewName RETURN doc.text ``` +Match documents on levenshtein distance 1 with prefix `qui`. All edit operations +is applied to term `kc`. Prefix `qui` is constant. + +```js +FOR doc IN viewName + SEARCH LEVENSHTEIN_MATCH(doc.text, "kc", 1, false, 64, "qui") // matches "quick" + RETURN doc.text +``` + You may want to pick the maximum edit distance based on string length. If the stored attribute is the string _quick_ and the target string is _quicksands_, then the Levenshtein distance is 5, with 50% of the From f6ad37ef128d89f76fda1f9f9a6f73b9c66989e4 Mon Sep 17 00:00:00 2001 From: Alexey Bakharew Date: Wed, 30 Jun 2021 15:57:22 +0300 Subject: [PATCH 2/3] upd for PHRASE --- 3.7/aql/functions-arangosearch.md | 5 ++++- 3.8/aql/functions-arangosearch.md | 3 +++ 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/3.7/aql/functions-arangosearch.md b/3.7/aql/functions-arangosearch.md index 6072614ddf..a909ae68a1 100644 --- a/3.7/aql/functions-arangosearch.md +++ b/3.7/aql/functions-arangosearch.md @@ -457,7 +457,7 @@ Object tokens: - `{IN_RANGE: [low, high, includeLow, includeHigh]}`: see [IN_RANGE()](#in_range). *low* and *high* can only be strings. -- `{LEVENSHTEIN_MATCH: [token, maxDistance, transpositions, maxTerms]}`: +- `{LEVENSHTEIN_MATCH: [token, maxDistance, transpositions, maxTerms, prefix]}`: - `token` (string): a string to search - `maxDistance` (number): maximum Levenshtein / Damerau-Levenshtein distance - `transpositions` (bool, _optional_): if set to `false`, a Levenshtein @@ -465,6 +465,9 @@ Object tokens: - `maxTerms` (number, _optional_): consider only a specified number of the most relevant terms. One can pass `0` to consider all matched terms, but it may impact performance negatively. The default value is `64`. + - `prefix` (string, _optional_): if defined, Levenshtein or Damerau-Levenshtein + distance is computed for documents which contains specified prefix. The default value + is empty string. - `{STARTS_WITH: [prefix]}`: see [STARTS_WITH()](#starts_with). Array brackets are optional - `{TERM: [token]}`: equal to `token` but without Analyzer tokenization. diff --git a/3.8/aql/functions-arangosearch.md b/3.8/aql/functions-arangosearch.md index 44587b5459..e2b8930f66 100644 --- a/3.8/aql/functions-arangosearch.md +++ b/3.8/aql/functions-arangosearch.md @@ -465,6 +465,9 @@ Object tokens: - `maxTerms` (number, _optional_): consider only a specified number of the most relevant terms. One can pass `0` to consider all matched terms, but it may impact performance negatively. The default value is `64`. + - `prefix` (string, _optional_): if defined, Levenshtein or Damerau-Levenshtein + distance is computed for documents which contains specified prefix. The default value + is empty string. - `{STARTS_WITH: [prefix]}`: see [STARTS_WITH()](#starts_with). Array brackets are optional - `{TERM: [token]}`: equal to `token` but without Analyzer tokenization. From 5dca486c921139ebabcaa966f77f80790df1205f Mon Sep 17 00:00:00 2001 From: Simran Spiller Date: Tue, 20 Jul 2021 16:19:55 +0200 Subject: [PATCH 3/3] Review --- 3.7/aql/functions-arangosearch.md | 23 +++++++++++++++-------- 3.8/aql/functions-arangosearch.md | 27 +++++++++++++++++---------- 3.9/aql/functions-arangosearch.md | 26 ++++++++++++++++++++++++-- 3 files changed, 56 insertions(+), 20 deletions(-) diff --git a/3.7/aql/functions-arangosearch.md b/3.7/aql/functions-arangosearch.md index a909ae68a1..a3d81a2609 100644 --- a/3.7/aql/functions-arangosearch.md +++ b/3.7/aql/functions-arangosearch.md @@ -465,9 +465,12 @@ Object tokens: - `maxTerms` (number, _optional_): consider only a specified number of the most relevant terms. One can pass `0` to consider all matched terms, but it may impact performance negatively. The default value is `64`. - - `prefix` (string, _optional_): if defined, Levenshtein or Damerau-Levenshtein - distance is computed for documents which contains specified prefix. The default value - is empty string. + - `prefix` (string, _optional_): if defined, then a search for the exact + prefix is carried out, using the matches as candidates. The Levenshtein / + Damerau-Levenshtein distance is then computed for each candidate using the + remainders of the strings. This option can improve performance in cases where + there is a known common prefix. The default value is an empty string + (introduced in v3.7.13, v3.8.1). - `{STARTS_WITH: [prefix]}`: see [STARTS_WITH()](#starts_with). Array brackets are optional - `{TERM: [token]}`: equal to `token` but without Analyzer tokenization. @@ -735,9 +738,12 @@ if you want to calculate the edit distance of two strings. impact performance negatively. The default value is `64`. - returns **fulfilled** (bool): `true` if the calculated distance is less than or equal to *distance*, `false` otherwise -- **prefix** (string, _optional_): if defined, Levenshtein or Damerau-Levenshtein - distance is computed for documents which contains specified prefix. The default value - is empty string. +- **prefix** (string, _optional_): if defined, then a search for the exact + prefix is carried out, using the matches as candidates. The Levenshtein / + Damerau-Levenshtein distance is then computed for each candidate using the + remainders of the strings. This option can improve performance in cases where + there is a known common prefix. The default value is an empty string + (introduced in v3.7.13, v3.8.1). The Levenshtein distance between _quick_ and _quikc_ is `2` because it requires two operations to go from one to the other (remove _k_, insert _k_ at a @@ -757,8 +763,9 @@ FOR doc IN viewName RETURN doc.text ``` -Match documents on levenshtein distance 1 with prefix `qui`. All edit operations -is applied to term `kc`. Prefix `qui` is constant. +Match documents with a Levenshtein distance of 1 with the prefix `qui`. The edit +distance is calculated using the search term `kc` and the stored value without +the prefix (e.g. `ck`). The prefix `qui` is constant. ```js FOR doc IN viewName diff --git a/3.8/aql/functions-arangosearch.md b/3.8/aql/functions-arangosearch.md index e2b8930f66..9055dcdfdc 100644 --- a/3.8/aql/functions-arangosearch.md +++ b/3.8/aql/functions-arangosearch.md @@ -457,7 +457,7 @@ Object tokens: - `{IN_RANGE: [low, high, includeLow, includeHigh]}`: see [IN_RANGE()](#in_range). *low* and *high* can only be strings. -- `{LEVENSHTEIN_MATCH: [token, maxDistance, transpositions, maxTerms]}`: +- `{LEVENSHTEIN_MATCH: [token, maxDistance, transpositions, maxTerms, prefix]}`: - `token` (string): a string to search - `maxDistance` (number): maximum Levenshtein / Damerau-Levenshtein distance - `transpositions` (bool, _optional_): if set to `false`, a Levenshtein @@ -465,9 +465,12 @@ Object tokens: - `maxTerms` (number, _optional_): consider only a specified number of the most relevant terms. One can pass `0` to consider all matched terms, but it may impact performance negatively. The default value is `64`. - - `prefix` (string, _optional_): if defined, Levenshtein or Damerau-Levenshtein - distance is computed for documents which contains specified prefix. The default value - is empty string. + - `prefix` (string, _optional_): if defined, then a search for the exact + prefix is carried out, using the matches as candidates. The Levenshtein / + Damerau-Levenshtein distance is then computed for each candidate using the + remainders of the strings. This option can improve performance in cases where + there is a known common prefix. The default value is an empty string + (introduced in v3.7.13, v3.8.1). - `{STARTS_WITH: [prefix]}`: see [STARTS_WITH()](#starts_with). Array brackets are optional - `{TERM: [token]}`: equal to `token` but without Analyzer tokenization. @@ -713,7 +716,7 @@ FOR doc IN viewName Introduced in: v3.7.0 -`LEVENSHTEIN_MATCH(path, target, distance, transpositions, maxTerms) → fulfilled` +`LEVENSHTEIN_MATCH(path, target, distance, transpositions, maxTerms, prefix) → fulfilled` Match documents with a [Damerau-Levenshtein distance](https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance){:target=_"blank"} lower than or equal to *distance* between the stored attribute value and @@ -735,9 +738,12 @@ if you want to calculate the edit distance of two strings. impact performance negatively. The default value is `64`. - returns **fulfilled** (bool): `true` if the calculated distance is less than or equal to *distance*, `false` otherwise -- **prefix** (string, _optional_): if defined, Levenshtein or Damerau-Levenshtein - distance is computed for documents which contains specified prefix. The default value - is empty string. +- **prefix** (string, _optional_): if defined, then a search for the exact + prefix is carried out, using the matches as candidates. The Levenshtein / + Damerau-Levenshtein distance is then computed for each candidate using the + remainders of the strings. This option can improve performance in cases where + there is a known common prefix. The default value is an empty string + (introduced in v3.7.13, v3.8.1). The Levenshtein distance between _quick_ and _quikc_ is `2` because it requires two operations to go from one to the other (remove _k_, insert _k_ at a @@ -757,8 +763,9 @@ FOR doc IN viewName RETURN doc.text ``` -Match documents on levenshtein distance 1 with prefix `qui`. All edit operations -is applied to term `kc`. Prefix `qui` is constant. +Match documents with a Levenshtein distance of 1 with the prefix `qui`. The edit +distance is calculated using the search term `kc` and the stored value without +the prefix (e.g. `ck`). The prefix `qui` is constant. ```js FOR doc IN viewName diff --git a/3.9/aql/functions-arangosearch.md b/3.9/aql/functions-arangosearch.md index da25124f31..eed071dbfe 100644 --- a/3.9/aql/functions-arangosearch.md +++ b/3.9/aql/functions-arangosearch.md @@ -457,7 +457,7 @@ Object tokens: - `{IN_RANGE: [low, high, includeLow, includeHigh]}`: see [IN_RANGE()](#in_range). *low* and *high* can only be strings. -- `{LEVENSHTEIN_MATCH: [token, maxDistance, transpositions, maxTerms]}`: +- `{LEVENSHTEIN_MATCH: [token, maxDistance, transpositions, maxTerms, prefix]}`: - `token` (string): a string to search - `maxDistance` (number): maximum Levenshtein / Damerau-Levenshtein distance - `transpositions` (bool, _optional_): if set to `false`, a Levenshtein @@ -465,6 +465,12 @@ Object tokens: - `maxTerms` (number, _optional_): consider only a specified number of the most relevant terms. One can pass `0` to consider all matched terms, but it may impact performance negatively. The default value is `64`. + - `prefix` (string, _optional_): if defined, then a search for the exact + prefix is carried out, using the matches as candidates. The Levenshtein / + Damerau-Levenshtein distance is then computed for each candidate using the + remainders of the strings. This option can improve performance in cases where + there is a known common prefix. The default value is an empty string + (introduced in v3.7.13, v3.8.1). - `{STARTS_WITH: [prefix]}`: see [STARTS_WITH()](#starts_with). Array brackets are optional - `{TERM: [token]}`: equal to `token` but without Analyzer tokenization. @@ -710,7 +716,7 @@ FOR doc IN viewName Introduced in: v3.7.0 -`LEVENSHTEIN_MATCH(path, target, distance, transpositions, maxTerms) → fulfilled` +`LEVENSHTEIN_MATCH(path, target, distance, transpositions, maxTerms, prefix) → fulfilled` Match documents with a [Damerau-Levenshtein distance](https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance){:target=_"blank"} lower than or equal to *distance* between the stored attribute value and @@ -732,6 +738,12 @@ if you want to calculate the edit distance of two strings. impact performance negatively. The default value is `64`. - returns **fulfilled** (bool): `true` if the calculated distance is less than or equal to *distance*, `false` otherwise +- **prefix** (string, _optional_): if defined, then a search for the exact + prefix is carried out, using the matches as candidates. The Levenshtein / + Damerau-Levenshtein distance is then computed for each candidate using the + remainders of the strings. This option can improve performance in cases where + there is a known common prefix. The default value is an empty string + (introduced in v3.7.13, v3.8.1). The Levenshtein distance between _quick_ and _quikc_ is `2` because it requires two operations to go from one to the other (remove _k_, insert _k_ at a @@ -751,6 +763,16 @@ FOR doc IN viewName RETURN doc.text ``` +Match documents with a Levenshtein distance of 1 with the prefix `qui`. The edit +distance is calculated using the search term `kc` and the stored value without +the prefix (e.g. `ck`). The prefix `qui` is constant. + +```js +FOR doc IN viewName + SEARCH LEVENSHTEIN_MATCH(doc.text, "kc", 1, false, 64, "qui") // matches "quick" + RETURN doc.text +``` + You may want to pick the maximum edit distance based on string length. If the stored attribute is the string _quick_ and the target string is _quicksands_, then the Levenshtein distance is 5, with 50% of the