diff --git a/3.7/aql/functions-arangosearch.md b/3.7/aql/functions-arangosearch.md index bd683476ff..a3d81a2609 100644 --- a/3.7/aql/functions-arangosearch.md +++ b/3.7/aql/functions-arangosearch.md @@ -457,7 +457,7 @@ Object tokens: - `{IN_RANGE: [low, high, includeLow, includeHigh]}`: see [IN_RANGE()](#in_range). *low* and *high* can only be strings. -- `{LEVENSHTEIN_MATCH: [token, maxDistance, transpositions, maxTerms]}`: +- `{LEVENSHTEIN_MATCH: [token, maxDistance, transpositions, maxTerms, prefix]}`: - `token` (string): a string to search - `maxDistance` (number): maximum Levenshtein / Damerau-Levenshtein distance - `transpositions` (bool, _optional_): if set to `false`, a Levenshtein @@ -465,6 +465,12 @@ Object tokens: - `maxTerms` (number, _optional_): consider only a specified number of the most relevant terms. One can pass `0` to consider all matched terms, but it may impact performance negatively. The default value is `64`. + - `prefix` (string, _optional_): if defined, then a search for the exact + prefix is carried out, using the matches as candidates. The Levenshtein / + Damerau-Levenshtein distance is then computed for each candidate using the + remainders of the strings. This option can improve performance in cases where + there is a known common prefix. The default value is an empty string + (introduced in v3.7.13, v3.8.1). - `{STARTS_WITH: [prefix]}`: see [STARTS_WITH()](#starts_with). Array brackets are optional - `{TERM: [token]}`: equal to `token` but without Analyzer tokenization. @@ -710,7 +716,7 @@ FOR doc IN viewName Introduced in: v3.7.0 -`LEVENSHTEIN_MATCH(path, target, distance, transpositions, maxTerms) → fulfilled` +`LEVENSHTEIN_MATCH(path, target, distance, transpositions, maxTerms, prefix) → fulfilled` Match documents with a [Damerau-Levenshtein distance](https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance){:target=_"blank"} lower than or equal to *distance* between the stored attribute value and @@ -732,6 +738,12 @@ if you want to calculate the edit distance of two strings. impact performance negatively. The default value is `64`. - returns **fulfilled** (bool): `true` if the calculated distance is less than or equal to *distance*, `false` otherwise +- **prefix** (string, _optional_): if defined, then a search for the exact + prefix is carried out, using the matches as candidates. The Levenshtein / + Damerau-Levenshtein distance is then computed for each candidate using the + remainders of the strings. This option can improve performance in cases where + there is a known common prefix. The default value is an empty string + (introduced in v3.7.13, v3.8.1). The Levenshtein distance between _quick_ and _quikc_ is `2` because it requires two operations to go from one to the other (remove _k_, insert _k_ at a @@ -751,6 +763,16 @@ FOR doc IN viewName RETURN doc.text ``` +Match documents with a Levenshtein distance of 1 with the prefix `qui`. The edit +distance is calculated using the search term `kc` and the stored value without +the prefix (e.g. `ck`). The prefix `qui` is constant. + +```js +FOR doc IN viewName + SEARCH LEVENSHTEIN_MATCH(doc.text, "kc", 1, false, 64, "qui") // matches "quick" + RETURN doc.text +``` + You may want to pick the maximum edit distance based on string length. If the stored attribute is the string _quick_ and the target string is _quicksands_, then the Levenshtein distance is 5, with 50% of the diff --git a/3.8/aql/functions-arangosearch.md b/3.8/aql/functions-arangosearch.md index d063d1657e..9055dcdfdc 100644 --- a/3.8/aql/functions-arangosearch.md +++ b/3.8/aql/functions-arangosearch.md @@ -457,7 +457,7 @@ Object tokens: - `{IN_RANGE: [low, high, includeLow, includeHigh]}`: see [IN_RANGE()](#in_range). *low* and *high* can only be strings. -- `{LEVENSHTEIN_MATCH: [token, maxDistance, transpositions, maxTerms]}`: +- `{LEVENSHTEIN_MATCH: [token, maxDistance, transpositions, maxTerms, prefix]}`: - `token` (string): a string to search - `maxDistance` (number): maximum Levenshtein / Damerau-Levenshtein distance - `transpositions` (bool, _optional_): if set to `false`, a Levenshtein @@ -465,6 +465,12 @@ Object tokens: - `maxTerms` (number, _optional_): consider only a specified number of the most relevant terms. One can pass `0` to consider all matched terms, but it may impact performance negatively. The default value is `64`. + - `prefix` (string, _optional_): if defined, then a search for the exact + prefix is carried out, using the matches as candidates. The Levenshtein / + Damerau-Levenshtein distance is then computed for each candidate using the + remainders of the strings. This option can improve performance in cases where + there is a known common prefix. The default value is an empty string + (introduced in v3.7.13, v3.8.1). - `{STARTS_WITH: [prefix]}`: see [STARTS_WITH()](#starts_with). Array brackets are optional - `{TERM: [token]}`: equal to `token` but without Analyzer tokenization. @@ -710,7 +716,7 @@ FOR doc IN viewName Introduced in: v3.7.0 -`LEVENSHTEIN_MATCH(path, target, distance, transpositions, maxTerms) → fulfilled` +`LEVENSHTEIN_MATCH(path, target, distance, transpositions, maxTerms, prefix) → fulfilled` Match documents with a [Damerau-Levenshtein distance](https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance){:target=_"blank"} lower than or equal to *distance* between the stored attribute value and @@ -732,6 +738,12 @@ if you want to calculate the edit distance of two strings. impact performance negatively. The default value is `64`. - returns **fulfilled** (bool): `true` if the calculated distance is less than or equal to *distance*, `false` otherwise +- **prefix** (string, _optional_): if defined, then a search for the exact + prefix is carried out, using the matches as candidates. The Levenshtein / + Damerau-Levenshtein distance is then computed for each candidate using the + remainders of the strings. This option can improve performance in cases where + there is a known common prefix. The default value is an empty string + (introduced in v3.7.13, v3.8.1). The Levenshtein distance between _quick_ and _quikc_ is `2` because it requires two operations to go from one to the other (remove _k_, insert _k_ at a @@ -751,6 +763,16 @@ FOR doc IN viewName RETURN doc.text ``` +Match documents with a Levenshtein distance of 1 with the prefix `qui`. The edit +distance is calculated using the search term `kc` and the stored value without +the prefix (e.g. `ck`). The prefix `qui` is constant. + +```js +FOR doc IN viewName + SEARCH LEVENSHTEIN_MATCH(doc.text, "kc", 1, false, 64, "qui") // matches "quick" + RETURN doc.text +``` + You may want to pick the maximum edit distance based on string length. If the stored attribute is the string _quick_ and the target string is _quicksands_, then the Levenshtein distance is 5, with 50% of the diff --git a/3.9/aql/functions-arangosearch.md b/3.9/aql/functions-arangosearch.md index da25124f31..eed071dbfe 100644 --- a/3.9/aql/functions-arangosearch.md +++ b/3.9/aql/functions-arangosearch.md @@ -457,7 +457,7 @@ Object tokens: - `{IN_RANGE: [low, high, includeLow, includeHigh]}`: see [IN_RANGE()](#in_range). *low* and *high* can only be strings. -- `{LEVENSHTEIN_MATCH: [token, maxDistance, transpositions, maxTerms]}`: +- `{LEVENSHTEIN_MATCH: [token, maxDistance, transpositions, maxTerms, prefix]}`: - `token` (string): a string to search - `maxDistance` (number): maximum Levenshtein / Damerau-Levenshtein distance - `transpositions` (bool, _optional_): if set to `false`, a Levenshtein @@ -465,6 +465,12 @@ Object tokens: - `maxTerms` (number, _optional_): consider only a specified number of the most relevant terms. One can pass `0` to consider all matched terms, but it may impact performance negatively. The default value is `64`. + - `prefix` (string, _optional_): if defined, then a search for the exact + prefix is carried out, using the matches as candidates. The Levenshtein / + Damerau-Levenshtein distance is then computed for each candidate using the + remainders of the strings. This option can improve performance in cases where + there is a known common prefix. The default value is an empty string + (introduced in v3.7.13, v3.8.1). - `{STARTS_WITH: [prefix]}`: see [STARTS_WITH()](#starts_with). Array brackets are optional - `{TERM: [token]}`: equal to `token` but without Analyzer tokenization. @@ -710,7 +716,7 @@ FOR doc IN viewName Introduced in: v3.7.0 -`LEVENSHTEIN_MATCH(path, target, distance, transpositions, maxTerms) → fulfilled` +`LEVENSHTEIN_MATCH(path, target, distance, transpositions, maxTerms, prefix) → fulfilled` Match documents with a [Damerau-Levenshtein distance](https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance){:target=_"blank"} lower than or equal to *distance* between the stored attribute value and @@ -732,6 +738,12 @@ if you want to calculate the edit distance of two strings. impact performance negatively. The default value is `64`. - returns **fulfilled** (bool): `true` if the calculated distance is less than or equal to *distance*, `false` otherwise +- **prefix** (string, _optional_): if defined, then a search for the exact + prefix is carried out, using the matches as candidates. The Levenshtein / + Damerau-Levenshtein distance is then computed for each candidate using the + remainders of the strings. This option can improve performance in cases where + there is a known common prefix. The default value is an empty string + (introduced in v3.7.13, v3.8.1). The Levenshtein distance between _quick_ and _quikc_ is `2` because it requires two operations to go from one to the other (remove _k_, insert _k_ at a @@ -751,6 +763,16 @@ FOR doc IN viewName RETURN doc.text ``` +Match documents with a Levenshtein distance of 1 with the prefix `qui`. The edit +distance is calculated using the search term `kc` and the stored value without +the prefix (e.g. `ck`). The prefix `qui` is constant. + +```js +FOR doc IN viewName + SEARCH LEVENSHTEIN_MATCH(doc.text, "kc", 1, false, 64, "qui") // matches "quick" + RETURN doc.text +``` + You may want to pick the maximum edit distance based on string length. If the stored attribute is the string _quick_ and the target string is _quicksands_, then the Levenshtein distance is 5, with 50% of the