diff --git a/.translation-init b/.translation-init index 4b2a0a4619..4aa9145ea7 100644 --- a/.translation-init +++ b/.translation-init @@ -1 +1 @@ -Translation initialization: 2025-10-21T08:14:11.718396 +Translation initialization: 2025-10-21T10:45:23.526397 diff --git a/docs/cn/sql-reference/20-sql-functions/10-search-functions/index.md b/docs/cn/sql-reference/20-sql-functions/10-search-functions/index.md index be801b9cad..ab14db315c 100644 --- a/docs/cn/sql-reference/20-sql-functions/10-search-functions/index.md +++ b/docs/cn/sql-reference/20-sql-functions/10-search-functions/index.md @@ -1,51 +1,92 @@ --- -title: 全文搜索函数(Full-Text Search Functions) +title: 全文搜索函数 --- -本节提供 Databend 中全文搜索函数的参考信息。这些函数可实现与专用搜索引擎类似的强大文本搜索能力。 +Databend 的全文搜索函数为已建立倒排索引(inverted index)的半结构化 `VARIANT` 数据及纯文本列提供搜索引擎式的过滤能力,非常适合检索与资产一同存储的 AI 生成元数据,例如自动驾驶视频帧的感知结果。 :::info -Databend 的全文搜索函数设计灵感源自 [Elasticsearch 全文搜索函数](https://www.elastic.co/guide/en/elasticsearch/reference/current/sql-functions-search.html)。 +Databend 的搜索函数借鉴自 [Elasticsearch 全文搜索函数](https://www.elastic.co/guide/en/elasticsearch/reference/current/sql-functions-search.html)。 ::: +在表定义中为待搜索的列添加倒排索引: + +```sql +CREATE OR REPLACE TABLE frames ( + id INT, + meta VARIANT, + INVERTED INDEX idx_meta (meta) +); +``` + ## 搜索函数 | 函数 | 描述 | 示例 | -|----------|-------------|--------| -| [MATCH](match) | 在选定列中搜索包含指定关键词的文档 | `MATCH('title, body', 'technology')` | -| [QUERY](query) | 使用高级语法搜索满足指定查询表达式的文档 | `QUERY('title:technology AND society')` | -| [SCORE](score) | 配合 MATCH 或 QUERY 使用时返回搜索结果的相关性评分 | `SELECT title, SCORE() FROM articles WHERE MATCH('title', 'technology')` | +|----------|-------------|---------| +| [MATCH](match) | 对指定列执行相关性排序搜索。 | `MATCH('summary, tags', 'traffic light red')` | +| [QUERY](query) | 解析 Lucene 风格查询表达式,支持嵌套 `VARIANT` 字段。 | `QUERY('meta.signals.traffic_light:red')` | +| [SCORE](score) | 与 `MATCH` 或 `QUERY` 配合使用时,返回当前行的相关性得分。 | `SELECT summary, SCORE() FROM frame_notes WHERE MATCH('summary, tags', 'traffic light red')` | + +## 查询语法示例 + +### 示例:单个关键词 + +```sql +SELECT id, meta['frame']['timestamp'] AS ts +FROM frames +WHERE QUERY('meta.detections.label:pedestrian') +LIMIT 100; +``` + +### 示例:布尔 AND + +```sql +SELECT id, meta['frame']['timestamp'] AS ts +FROM frames +WHERE QUERY('meta.signals.traffic_light:red AND meta.vehicle.lane:center') +LIMIT 100; +``` + +### 示例:布尔 OR -## 使用示例 +```sql +SELECT id, meta['frame']['timestamp'] AS ts +FROM frames +WHERE QUERY('meta.signals.traffic_light:red OR meta.detections.label:bike') +LIMIT 100; +``` -### 基本文本搜索 +### 示例:IN 列表 ```sql --- 在 title 或 body 列中搜索包含 'technology' 的文档 -SELECT * FROM articles -WHERE MATCH('title, body', 'technology'); +SELECT id, meta['frame']['timestamp'] AS ts +FROM frames +WHERE QUERY('meta.tags:IN [stop urban]') +LIMIT 100; ``` -### 高级查询表达式 +### 示例:包含范围 ```sql --- 搜索 title 列包含 'technology' 且 body 列包含 'impact' 的文档 -SELECT * FROM articles -WHERE QUERY('title:technology AND body:impact'); +SELECT id, meta['frame']['timestamp'] AS ts +FROM frames +WHERE QUERY('meta.vehicle.speed_kmh:[0 TO 10]') +LIMIT 100; ``` -### 相关性评分 +### 示例:排除范围 ```sql --- 执行带相关性评分的搜索,并按评分降序排序 -SELECT title, body, SCORE() -FROM articles -WHERE MATCH('title^2, body', 'technology') -ORDER BY SCORE() DESC; +SELECT id, meta['frame']['timestamp'] AS ts +FROM frames +WHERE QUERY('meta.vehicle.speed_kmh:{0 TO 10}') +LIMIT 100; ``` -使用这些函数前,需在目标列上创建倒排索引(Inverted Index): +### 示例:加权字段 ```sql -CREATE INVERTED INDEX idx ON articles(title, body); +SELECT id, meta['frame']['timestamp'] AS ts, SCORE() +FROM frames +WHERE QUERY('meta.signals.traffic_light:red^1.0 AND meta.tags:urban^2.0') +LIMIT 100; ``` \ No newline at end of file diff --git a/docs/cn/sql-reference/20-sql-functions/10-search-functions/match.md b/docs/cn/sql-reference/20-sql-functions/10-search-functions/match.md index 27d5678054..cd9459d1ed 100644 --- a/docs/cn/sql-reference/20-sql-functions/10-search-functions/match.md +++ b/docs/cn/sql-reference/20-sql-functions/10-search-functions/match.md @@ -3,9 +3,9 @@ title: MATCH --- import FunctionDescription from '@site/src/components/FunctionDescription'; - + -搜索包含指定关键词的文档。请注意,MATCH 函数只能在 WHERE 子句中使用。 +`MATCH` 用于在指定列中搜索包含所提供关键字的行。该函数只能出现在 `WHERE` 子句中。 :::info Databend 的 MATCH 函数灵感来源于 Elasticsearch 的 [MATCH](https://www.elastic.co/guide/en/elasticsearch/reference/current/sql-functions-search.html#sql-functions-search-match)。 @@ -14,83 +14,62 @@ Databend 的 MATCH 函数灵感来源于 Elasticsearch 的 [MATCH](https://www.e ## 语法 ```sql -MATCH( '', ''[, ''] ) +MATCH('', ''[, '']) ``` -| 参数 | 描述 | -|--------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `` | 表中要搜索指定关键词的列名列表,以逗号分隔,可选地使用 (^) 语法进行加权,允许为每个列分配不同的权重,影响每个列在搜索中的重要性。 | -| `` | 要匹配表中指定列的关键词。此参数还可用于后缀匹配,搜索词后跟星号 (*) 可以匹配任意数量的字符或词。 | -| `` | 一组以分号 `;` 分隔的配置选项,用于自定义搜索行为。详情见下表。 | +- ``:要搜索的列,以逗号分隔。可附加 `^` 为某列赋予更高权重。 +- ``:要搜索的词条。可附加 `*` 进行后缀匹配,例如 `rust*`。 +- ``:可选的、以分号分隔的 `key=value` 对列表,用于微调搜索。 -| 选项 | 描述 | 示例 | 解释 | -|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| fuzziness | 允许匹配指定 Levenshtein 距离内的词项。`fuzziness` 可以设置为 1 或 2。 | SELECT id, score(), content FROM t WHERE match(content, 'box', 'fuzziness=1'); | 当匹配查询词 "box" 时,`fuzziness=1` 允许匹配 "fox" 等词项,因为 "box" 和 "fox" 的 Levenshtein 距离为 1。 | -| operator | 指定多个查询词项的组合方式。可以是 OR(默认)或 AND。OR 返回包含任意查询词项的结果,而 AND 返回包含所有查询词项的结果。 | SELECT id, score(), content FROM t WHERE match(content, 'action works', 'fuzziness=1;operator=AND'); | 使用 `operator=AND`,查询要求结果中同时包含 "action" 和 "works"。由于 `fuzziness=1`,它匹配 "Actions" 和 "words" 等词项,因此返回 "Actions speak louder than words"。 | -| lenient | 控制当查询文本无效时是否报告错误。默认为 `false`。如果设置为 `true`,则不报告错误,如果查询文本无效,则返回空结果集。 | SELECT id, score(), content FROM t WHERE match(content, '()', 'lenient=true'); | 如果查询文本 `()` 无效,设置 `lenient=true` 会阻止抛出错误,并返回空结果集。 | +## 选项 + +| 选项 | 值 | 描述 | 示例 | +|--------|--------|-------------|---------| +| `fuzziness` | `1` 或 `2` | 匹配在指定 Levenshtein distance(莱文斯坦距离)内的关键字。 | `MATCH('summary, tags', 'pedestrain', 'fuzziness=1')` 匹配包含正确拼写 `pedestrian` 的行。 | +| `operator` | `OR`(默认)或 `AND` | 在未指定布尔操作符时,控制多个关键字的组合方式。 | `MATCH('summary, tags', 'traffic light red', 'operator=AND')` 要求同时包含这两个词。 | +| `lenient` | `true` 或 `false` | 为 `true` 时,抑制解析错误并返回空结果集。 | `MATCH('summary, tags', '()', 'lenient=true')` 返回空行而非报错。 | ## 示例 +在许多 AI Pipeline(流水线)中,你可能会在 `VARIANT` 列中捕获结构化元数据,同时为人类可读摘要建立索引以便搜索。以下示例存储了从 JSON 负载中提取的行车记录仪帧摘要和标签。 + +### 示例:构建可搜索的摘要 + +```sql +CREATE OR REPLACE TABLE frame_notes ( + id INT, + camera STRING, + summary STRING, + tags STRING, + INVERTED INDEX idx_notes (summary, tags) +); + +INSERT INTO frame_notes VALUES + (1, 'dashcam_front', + 'Green light at Market & 5th with pedestrian entering the crosswalk', + 'downtown commute green-light pedestrian'), + (2, 'dashcam_front', + 'Vehicle stopped at Mission & 6th red traffic light with cyclist ahead', + 'stop urban red-light cyclist'), + (3, 'dashcam_front', + 'School zone caution sign in SOMA with pedestrian waiting near crosswalk', + 'school-zone caution pedestrian'); +``` + +### 示例:布尔 AND + +```sql +SELECT id, summary +FROM frame_notes +WHERE MATCH('summary, tags', 'traffic light red', 'operator=AND'); +-- 返回 id 2 +``` + +### 示例:模糊匹配 + ```sql -CREATE TABLE test(title STRING, body STRING); - -CREATE INVERTED INDEX idx ON test(title, body); - -INSERT INTO test VALUES -('The Importance of Reading', 'Reading is a crucial skill that opens up a world of knowledge and imagination.'), -('The Benefits of Exercise', 'Exercise is essential for maintaining a healthy lifestyle.'), -('The Power of Perseverance', 'Perseverance is the key to overcoming obstacles and achieving success.'), -('The Art of Communication', 'Effective communication is crucial in everyday life.'), -('The Impact of Technology on Society', 'Technology has revolutionized our society in countless ways.'); - --- 检索 'title' 列匹配 'art power' 的文档 -SELECT * FROM test WHERE MATCH('title', 'art power'); - -┌────────────────────────────────────────────────────────────────────────────────────────────────────┐ -│ title │ body │ -├───────────────────────────┼────────────────────────────────────────────────────────────────────────┤ -│ The Power of Perseverance │ Perseverance is the key to overcoming obstacles and achieving success. │ -│ The Art of Communication │ Effective communication is crucial in everyday life. │ -└────────────────────────────────────────────────────────────────────────────────────────────────────┘ - --- 检索 'title' 列包含以 'The' 开头后跟任意字符的值的文档 -SELECT * FROM test WHERE MATCH('title', 'The*') - -┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ -│ title │ body │ -│ Nullable(String) │ Nullable(String) │ -├─────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────┤ -│ The Importance of Reading │ Reading is a crucial skill that opens up a world of knowledge and imagination. │ -│ The Benefits of Exercise │ Exercise is essential for maintaining a healthy lifestyle. │ -│ The Power of Perseverance │ Perseverance is the key to overcoming obstacles and achieving success. │ -│ The Art of Communication │ Effective communication is crucial in everyday life. │ -│ The Impact of Technology on Society │ Technology has revolutionized our society in countless ways. │ -└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ - --- 检索 'title' 或 'body' 列匹配 'knowledge technology' 的文档 -SELECT *, score() FROM test WHERE MATCH('title, body', 'knowledge technology'); - -┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ -│ title │ body │ score() │ -├─────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────┼───────────┤ -│ The Importance of Reading │ Reading is a crucial skill that opens up a world of knowledge and imagination. │ 1.1550591 │ -│ The Impact of Technology on Society │ Technology has revolutionized our society in countless ways. │ 2.6830134 │ -└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ - --- 检索 'title' 或 'body' 列匹配 'knowledge technology' 的文档,并对两列进行加权 -SELECT *, score() FROM test WHERE MATCH('title^5, body^1.2', 'knowledge technology'); - -┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ -│ title │ body │ score() │ -├─────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────┼───────────┤ -│ The Importance of Reading │ Reading is a crucial skill that opens up a world of knowledge and imagination. │ 1.3860708 │ -│ The Impact of Technology on Society │ Technology has revolutionized our society in countless ways. │ 7.8053584 │ -└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ - --- 检索 'body' 列包含 "knowledge" 和 "imagination"(允许轻微拼写错误)的文档 -SELECT * FROM test WHERE MATCH('body', 'knowledg imaginatio', 'fuzziness = 1; operator = AND'); - --[ RECORD 1 ]----------------------------------- -title: The Importance of Reading - body: Reading is a crucial skill that opens up a world of knowledge and imagination. +SELECT id, summary +FROM frame_notes +WHERE MATCH('summary^2, tags', 'pedestrain', 'fuzziness=1'); +-- 返回 id 1 和 3 ``` \ No newline at end of file diff --git a/docs/cn/sql-reference/20-sql-functions/10-search-functions/query.md b/docs/cn/sql-reference/20-sql-functions/10-search-functions/query.md index 7f7cbd2274..76d86c9663 100644 --- a/docs/cn/sql-reference/20-sql-functions/10-search-functions/query.md +++ b/docs/cn/sql-reference/20-sql-functions/10-search-functions/query.md @@ -1,131 +1,182 @@ --- -title: QUERY +title: QUERY 函数 --- import FunctionDescription from '@site/src/components/FunctionDescription'; - + -搜索满足指定查询表达式的文档。请注意,QUERY 函数只能在 WHERE 子句中使用。 +`QUERY` 通过 Lucene 风格查询表达式与具备倒排索引(Inverted Index)的列进行匹配,从而过滤行。使用点记法可导航 `VARIANT` 列中的嵌套字段。该函数仅在 `WHERE` 子句中生效。 :::info -Databend 的 QUERY 函数灵感来源于 Elasticsearch 的 [QUERY](https://www.elastic.co/guide/en/elasticsearch/reference/current/sql-functions-search.html#sql-functions-search-query)。 +Databend 的 QUERY 函数灵感源自 Elasticsearch 的 [QUERY](https://www.elastic.co/guide/en/elasticsearch/reference/current/sql-functions-search.html#sql-functions-search-query)。 ::: ## 语法 ```sql -QUERY( ''[, ''] ) +QUERY(''[, '']) ``` -查询表达式支持以下语法。请注意,`` 也可用于后缀匹配,搜索词后跟星号 (*) 可以匹配任意数量的字符或单词。 +`` 为可选的分号分隔 `key=value` 对列表,用于调整搜索行为。 -| 语法 | 描述 | 示例 | -|---------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------| -| `:` | 匹配指定列包含指定关键词的文档。 | `QUERY('title:power')` | -| `:IN [, ...]` | 匹配指定列包含方括号内任一关键词的文档。 | `QUERY('title:IN [power, art]')`| -| `: AND / OR ` | 匹配指定列包含指定关键词中的全部或任一关键词的文档。在同时包含 AND 和 OR 的查询中,AND 操作优先于 OR,即 'a AND b OR c' 被解读为 '(a AND b) OR c'。 | `QUERY('title:power AND art')` | -| `:+ -` | 匹配指定列包含指定正关键词且不包含指定负关键词的文档。 | `QUERY('title:+the -reading')` | -| `:""` | 匹配指定列包含指定确切短语的文档。 | `QUERY('title:"Benefits of Exercise"')` | -| `:^ :^` | 匹配指定关键词存在于指定列中,并根据指定的权重增加其在搜索中的相关性。此语法允许为多个列设置不同的权重,以影响搜索相关性。 | `QUERY('title:art^5 body:reading^1.2')` | +## 构建查询表达式 +| 表达式 | 用途 | 示例 | +|------------|---------|---------| +| `column:keyword` | 匹配列中包含关键字的行,追加 `*` 可实现后缀匹配。 | `QUERY('meta.detections.label:pedestrian')` | +| `column:"exact phrase"` | 匹配包含精确短语的行。 | `QUERY('meta.scene.summary:"vehicle stopped at red traffic light"')` | +| `column:+required -excluded` | 在同一列中要求或排除特定词项。 | `QUERY('meta.tags:+commute -cyclist')` | +| `column:term1 AND term2` / `column:term1 OR term2` | 使用布尔运算符组合多个词项,`AND` 优先级高于 `OR`。 | `QUERY('meta.signals.traffic_light:red AND meta.vehicle.lane:center')` | +| `column:IN [value1 value2 ...]` | 匹配列表中的任意值。 | `QUERY('meta.tags:IN [stop urban]')` | +| `column:[min TO max]` | 执行包含边界的范围搜索,用 `*` 可开放一侧。 | `QUERY('meta.vehicle.speed_kmh:[0 TO 10]')` | +| `column:{min TO max}` | 执行排除边界的范围搜索。 | `QUERY('meta.vehicle.speed_kmh:{0 TO 10}')` | +| `column:term^boost` | 提升特定列中匹配的权重。 | `QUERY('meta.signals.traffic_light:red^1.0 meta.tags:urban^2.0')` | -| 选项 | 描述 | 示例 | 解释 | -|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| fuzziness | 允许匹配在指定 Levenshtein 距离内的词项。`fuzziness` 可以设置为 1 或 2。 | SELECT id, score(), content FROM t WHERE query('content:box', 'fuzziness=1'); | 当匹配查询词 "box" 时,`fuzziness=1` 允许匹配 "fox" 等词项,因为 "box" 和 "fox" 的 Levenshtein 距离为 1。 | -| operator | 指定多个查询词项的组合方式。可以是 OR(默认)或 AND。OR 返回包含任一查询词项的结果,而 AND 返回包含所有查询词项的结果。 | SELECT id, score(), content FROM t WHERE query('content:action works', 'fuzziness=1;operator=AND'); | 使用 `operator=AND`,查询要求结果中同时包含 "action" 和 "works"。由于 `fuzziness=1`,它匹配 "Actions" 和 "words" 等词项,因此 "Actions speak louder than words" 被返回。 | -| lenient | 控制当查询文本无效时是否报告错误。默认为 `false`。如果设置为 `true`,则不报告错误,如果查询文本无效,则返回空结果集。 | SELECT id, score(), content FROM t WHERE query('content:()', 'lenient=true'); | 如果查询文本 `()` 无效,设置 `lenient=true` 防止抛出错误并返回空结果集。 | +### 嵌套 VARIANT 字段 + +使用点记法访问 `VARIANT` 列内部字段,Databend 会跨对象与数组评估路径。 + +| 模式 | 描述 | 示例 | +|---------|-------------|---------| +| `variant_col.field:value` | 匹配内部字段。 | `QUERY('meta.signals.traffic_light:red')` | +| `variant_col.field:IN [ ... ]` | 匹配数组内任意值。 | `QUERY('meta.detections.label:IN [pedestrian cyclist]')` | +| `variant_col.field:[min TO max]` | 对数值型内部字段执行范围搜索。 | `QUERY('meta.vehicle.speed_kmh:[0 TO 10]')` | + +## 选项 + +| 选项 | 取值 | 描述 | 示例 | +|--------|--------|-------------|---------| +| `fuzziness` | `1` 或 `2` | 匹配指定 Levenshtein 距离内的词项。 | `SELECT id FROM frames WHERE QUERY('meta.detections.label:pedestrain', 'fuzziness=1');` | +| `operator` | `OR`(默认)或 `AND` | 未显式指定布尔运算符时控制多词项组合方式。 | `SELECT id FROM frames WHERE QUERY('meta.scene.weather:rain fog', 'operator=AND');` | +| `lenient` | `true` 或 `false` | 为 `true` 时抑制解析错误并返回空结果集。 | `SELECT id FROM frames WHERE QUERY('meta.detections.label:()', 'lenient=true');` | ## 示例 +### 构建智能驾驶数据集 + +```sql +CREATE OR REPLACE TABLE frames ( + id INT, + meta VARIANT, + INVERTED INDEX idx_meta (meta) +); + +INSERT INTO frames VALUES + (1, '{ + "frame":{"source":"dashcam_front","timestamp":"2025-10-21T08:32:05Z","location":{"city":"San Francisco","intersection":"Market & 5th","gps":[37.7825,-122.4072]}}, + "vehicle":{"speed_kmh":48,"acceleration":0.8,"lane":"center"}, + "signals":{"traffic_light":"green","distance_m":55,"speed_limit_kmh":50}, + "detections":[ + {"label":"car","confidence":0.96,"distance_m":15,"relative_speed_kmh":2}, + {"label":"pedestrian","confidence":0.88,"distance_m":12,"intent":"crossing"} + ], + "scene":{"weather":"clear","time_of_day":"day","visibility":"good"}, + "tags":["downtown","commute","green-light"], + "model":"perception-net-v5" + }'), + (2, '{ + "frame":{"source":"dashcam_front","timestamp":"2025-10-21T08:32:06Z","location":{"city":"San Francisco","intersection":"Mission & 6th","gps":[37.7829,-122.4079]}}, + "vehicle":{"speed_kmh":9,"acceleration":-1.1,"lane":"center"}, + "signals":{"traffic_light":"red","distance_m":18,"speed_limit_kmh":40}, + "detections":[ + {"label":"traffic_light","state":"red","confidence":0.99,"distance_m":18}, + {"label":"bike","confidence":0.82,"distance_m":9,"relative_speed_kmh":3} + ], + "scene":{"weather":"clear","time_of_day":"day","visibility":"good"}, + "tags":["stop","cyclist","urban"], + "model":"perception-net-v5" + }'), + (3, '{ + "frame":{"source":"dashcam_front","timestamp":"2025-10-21T08:32:07Z","location":{"city":"San Francisco","intersection":"SOMA School Zone","gps":[37.7808,-122.4016]}}, + "vehicle":{"speed_kmh":28,"acceleration":0.2,"lane":"right"}, + "signals":{"traffic_light":"yellow","distance_m":32,"speed_limit_kmh":25}, + "detections":[ + {"label":"traffic_sign","text":"SCHOOL","confidence":0.91,"distance_m":25}, + {"label":"pedestrian","confidence":0.76,"distance_m":8,"intent":"waiting"} + ], + "scene":{"weather":"overcast","time_of_day":"day","visibility":"moderate"}, + "tags":["school-zone","caution"], + "model":"perception-net-v5" + }'); +``` + +### 示例:布尔 AND + +```sql +SELECT id, meta['frame']['timestamp'] AS ts +FROM frames +WHERE QUERY('meta.signals.traffic_light:red AND meta.vehicle.speed_kmh:[0 TO 10]'); +-- 返回 id 2 +``` + +### 示例:布尔 OR + +```sql +SELECT id, meta['frame']['timestamp'] AS ts +FROM frames +WHERE QUERY('meta.signals.traffic_light:red OR meta.detections.label:bike'); +-- 返回 id 2 +``` + +### 示例:IN 列表匹配 + +```sql +SELECT id, meta['frame']['timestamp'] AS ts +FROM frames +WHERE QUERY('meta.tags:IN [stop urban]'); +-- 返回 id 2 +``` + +### 示例:包含边界范围 + +```sql +SELECT id, meta['frame']['timestamp'] AS ts +FROM frames +WHERE QUERY('meta.vehicle.speed_kmh:[0 TO 10]'); +-- 返回 id 2 +``` + +### 示例:排除边界范围 + +```sql +SELECT id, meta['frame']['timestamp'] AS ts +FROM frames +WHERE QUERY('meta.vehicle.speed_kmh:{0 TO 10}'); +-- 返回 id 2 +``` + +### 示例:跨字段权重提升 + +```sql +SELECT id, meta['frame']['timestamp'] AS ts, SCORE() +FROM frames +WHERE QUERY('meta.signals.traffic_light:red^1.0 AND meta.tags:urban^2.0'); +-- 返回 id 2,相关性更高 +``` + +### 示例:检测高置信度行人 + +```sql +SELECT id, meta['frame']['timestamp'] AS ts +FROM frames +WHERE QUERY('meta.detections.label:IN [pedestrian cyclist] AND meta.detections.confidence:[0.8 TO *]'); +-- 返回 id 1 和 3 +``` + +### 示例:按短语过滤 + +```sql +SELECT id, meta['frame']['timestamp'] AS ts +FROM frames +WHERE QUERY('meta.scene.summary:"vehicle stopped at red traffic light"'); +-- 返回 id 2 +``` + +### 示例:学区过滤 + ```sql -CREATE TABLE test(title STRING, body STRING); - -CREATE INVERTED INDEX idx ON test(title, body); - -INSERT INTO test VALUES -('The Importance of Reading', 'Reading is a crucial skill that opens up a world of knowledge and imagination.'), -('The Benefits of Exercise', 'Exercise is essential for maintaining a healthy lifestyle.'), -('The Power of Perseverance', 'Perseverance is the key to overcoming obstacles and achieving success.'), -('The Art of Communication', 'Effective communication is crucial in everyday life.'), -('The Impact of Technology on Society', 'Technology has revolutionized our society in countless ways.'); - --- 检索 'title' 列包含关键词 'power' 的文档 -SELECT * FROM test WHERE QUERY('title:power'); - -┌────────────────────────────────────────────────────────────────────────────────────────────────────┐ -│ title │ body │ -├───────────────────────────┼────────────────────────────────────────────────────────────────────────┤ -│ The Power of Perseverance │ Perseverance is the key to overcoming obstacles and achieving success. │ -└────────────────────────────────────────────────────────────────────────────────────────────────────┘ - --- 检索 'title' 列包含以 'The' 开头后跟任意字符的值的文档 -SELECT * FROM test WHERE QUERY('title:The*'); - -┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ -│ title │ body │ -├─────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────┤ -│ The Importance of Reading │ Reading is a crucial skill that opens up a world of knowledge and imagination. │ -│ The Benefits of Exercise │ Exercise is essential for maintaining a healthy lifestyle. │ -│ The Power of Perseverance │ Perseverance is the key to overcoming obstacles and achieving success. │ -│ The Art of Communication │ Effective communication is crucial in everyday life. │ -│ The Impact of Technology on Society │ Technology has revolutionized our society in countless ways. │ -└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ - --- 检索 'title' 列包含关键词 'power' 或 'art' 的文档 -SELECT * FROM test WHERE QUERY('title:power OR art'); - -┌────────────────────────────────────────────────────────────────────────────────────────────────────┐ -│ title │ body │ -├───────────────────────────┼────────────────────────────────────────────────────────────────────────┤ -│ The Power of Perseverance │ Perseverance is the key to overcoming obstacles and achieving success. │ -│ The Art of Communication │ Effective communication is crucial in everyday life. │ -└────────────────────────────────────────────────────────────────────────────────────────────────────┘ - -SELECT * FROM test WHERE QUERY('title:IN [power, art]') - -┌────────────────────────────────────────────────────────────────────────────────────────────────────┐ -│ title │ body │ -│ Nullable(String) │ Nullable(String) │ -├───────────────────────────┼────────────────────────────────────────────────────────────────────────┤ -│ The Power of Perseverance │ Perseverance is the key to overcoming obstacles and achieving success. │ -│ The Art of Communication │ Effective communication is crucial in everyday life. │ -└────────────────────────────────────────────────────────────────────────────────────────────────────┘ - --- 检索 'title' 列包含正向关键词 'the' 但不包含 'reading' 的文档 -SELECT * FROM test WHERE QUERY('title:+the -reading'); - -┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ -│ title │ body │ -├─────────────────────────────────────┼────────────────────────────────────────────────────────────────────────┤ -│ The Benefits of Exercise │ Exercise is essential for maintaining a healthy lifestyle. │ -│ The Power of Perseverance │ Perseverance is the key to overcoming obstacles and achieving success. │ -│ The Art of Communication │ Effective communication is crucial in everyday life. │ -│ The Impact of Technology on Society │ Technology has revolutionized our society in countless ways. │ -└──────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ - --- 检索 'title' 列包含确切短语 'Benefits of Exercise' 的文档 -SELECT * FROM test WHERE QUERY('title:"Benefits of Exercise"'); - -┌───────────────────────────────────────────────────────────────────────────────────────┐ -│ title │ body │ -├──────────────────────────┼────────────────────────────────────────────────────────────┤ -│ The Benefits of Exercise │ Exercise is essential for maintaining a healthy lifestyle. │ -└───────────────────────────────────────────────────────────────────────────────────────┘ - --- 检索 'title' 列包含关键词 'art' 并提升权重 5 以及 'body' 列包含关键词 'reading' 并提升权重 1.2 的文档 -SELECT *, score() FROM test WHERE QUERY('title:art^5 body:reading^1.2'); - -┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ -│ title │ body │ score() │ -├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────┼───────────┤ -│ The Importance of Reading │ Reading is a crucial skill that opens up a world of knowledge and imagination. │ 1.3860708 │ -│ The Art of Communication │ Effective communication is crucial in everyday life. │ 7.1992116 │ -└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ - --- 检索 'body' 列同时包含 "knowledge" 和 "imagination"(允许轻微拼写错误)的文档 -SELECT * FROM test WHERE QUERY('body:knowledg OR imaginatio', 'fuzziness = 1; operator = AND'); - --[ RECORD 1 ]----------------------------------- -title: The Importance of Reading - body: Reading is a crucial skill that opens up a world of knowledge and imagination. +SELECT id, meta['frame']['timestamp'] AS ts +FROM frames +WHERE QUERY('meta.detections.text:SCHOOL AND meta.scene.time_of_day:day'); +-- 返回 id 3 ``` \ No newline at end of file diff --git a/docs/cn/sql-reference/20-sql-functions/10-search-functions/score.md b/docs/cn/sql-reference/20-sql-functions/10-search-functions/score.md index cf2797ef06..b3dff04081 100644 --- a/docs/cn/sql-reference/20-sql-functions/10-search-functions/score.md +++ b/docs/cn/sql-reference/20-sql-functions/10-search-functions/score.md @@ -3,9 +3,9 @@ title: SCORE --- import FunctionDescription from '@site/src/components/FunctionDescription'; - + -返回查询字符串的相关性。得分越高,数据的相关性越强。请注意,SCORE 函数只能与 [QUERY](query.md) 或 [MATCH](match.md) 函数一起使用。 +`SCORE()` 返回倒排索引搜索为当前行分配的相关性得分。请在 `WHERE` 子句中与 [MATCH](match) 或 [QUERY](query) 一起使用。 :::info Databend 的 SCORE 函数灵感来源于 Elasticsearch 的 [SCORE](https://www.elastic.co/guide/en/elasticsearch/reference/current/sql-functions-search.html#sql-functions-search-score)。 @@ -19,35 +19,44 @@ SCORE() ## 示例 +### 示例:为 MATCH 准备文本注释 + +```sql +CREATE OR REPLACE TABLE frame_notes ( + id INT, + camera STRING, + summary STRING, + tags STRING, + INVERTED INDEX idx_notes (summary, tags) +); + +INSERT INTO frame_notes VALUES + (1, 'dashcam_front', + 'Green light at Market & 5th with pedestrian entering the crosswalk', + 'downtown commute green-light pedestrian'), + (2, 'dashcam_front', + 'Vehicle stopped at Mission & 6th red traffic light with cyclist ahead', + 'stop urban red-light cyclist'), + (3, 'dashcam_front', + 'School zone caution sign in SOMA with pedestrian waiting near crosswalk', + 'school-zone caution pedestrian'); +``` + +### 示例:为 MATCH 结果评分 + +```sql +SELECT summary, SCORE() +FROM frame_notes +WHERE MATCH('summary^2, tags', 'traffic light red', 'operator=AND') +ORDER BY SCORE() DESC; +``` + +### 示例:为 QUERY 结果评分 + +复用 [QUERY](query) 示例中的 `frames` 表: + ```sql -CREATE TABLE test(title STRING, body STRING); - -CREATE INVERTED INDEX idx ON test(title, body); - -INSERT INTO test VALUES -('The Importance of Reading', 'Reading is a crucial skill that opens up a world of knowledge and imagination.'), -('The Benefits of Exercise', 'Exercise is essential for maintaining a healthy lifestyle.'), -('The Power of Perseverance', 'Perseverance is the key to overcoming obstacles and achieving success.'), -('The Art of Communication', 'Effective communication is crucial in everyday life.'), -('The Impact of Technology on Society', 'Technology has revolutionized our society in countless ways.'); - --- 检索 'title' 列包含关键词 'art' 且权重为 5,'body' 列包含关键词 'reading' 且权重为 1.2 的文档及其相关性得分 -SELECT *, score() FROM test WHERE QUERY('title:art^5 body:reading^1.2'); - -┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ -│ title │ body │ score() │ -├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────┼───────────┤ -│ The Importance of Reading │ Reading is a crucial skill that opens up a world of knowledge and imagination. │ 1.3860708 │ -│ The Art of Communication │ Effective communication is crucial in everyday life. │ 7.1992116 │ -└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ - --- 检索 'title' 列包含关键词 'reading' 且权重为 5,'body' 列包含关键词 'everyday' 且权重为 1.2 的文档及其相关性得分 -SELECT *, score() FROM test WHERE MATCH('title^5, body^1.2', 'reading everyday'); - -┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ -│ title │ body │ score() │ -├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────┼───────────┤ -│ The Importance of Reading │ Reading is a crucial skill that opens up a world of knowledge and imagination. │ 8.585282 │ -│ The Art of Communication │ Effective communication is crucial in everyday life. │ 1.8575745 │ -└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +SELECT id, SCORE() +FROM frames +WHERE QUERY('meta.detections.label:pedestrian^3 AND meta.scene.time_of_day:day'); ``` \ No newline at end of file