Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .translation-init
Original file line number Diff line number Diff line change
@@ -1 +1 @@
Translation initialization: 2025-10-21T08:14:11.718396
Translation initialization: 2025-10-21T10:45:23.526397
89 changes: 65 additions & 24 deletions docs/cn/sql-reference/20-sql-functions/10-search-functions/index.md
Original file line number Diff line number Diff line change
@@ -1,51 +1,92 @@
---
title: 全文搜索函数(Full-Text Search Functions)
title: 全文搜索函数
---

本节提供 Databend 中全文搜索函数的参考信息。这些函数可实现与专用搜索引擎类似的强大文本搜索能力
Databend 的全文搜索函数为已建立倒排索引(inverted index)的半结构化 `VARIANT` 数据及纯文本列提供搜索引擎式的过滤能力,非常适合检索与资产一同存储的 AI 生成元数据,例如自动驾驶视频帧的感知结果

:::info
Databend 的全文搜索函数设计灵感源自 [Elasticsearch 全文搜索函数](https://www.elastic.co/guide/en/elasticsearch/reference/current/sql-functions-search.html)。
Databend 的搜索函数借鉴自 [Elasticsearch 全文搜索函数](https://www.elastic.co/guide/en/elasticsearch/reference/current/sql-functions-search.html)。
:::

在表定义中为待搜索的列添加倒排索引:

```sql
CREATE OR REPLACE TABLE frames (
id INT,
meta VARIANT,
INVERTED INDEX idx_meta (meta)
);
```

## 搜索函数

| 函数 | 描述 | 示例 |
|----------|-------------|--------|
| [MATCH](match) | 在选定列中搜索包含指定关键词的文档 | `MATCH('title, body', 'technology')` |
| [QUERY](query) | 使用高级语法搜索满足指定查询表达式的文档 | `QUERY('title:technology AND society')` |
| [SCORE](score) | 配合 MATCH 或 QUERY 使用时返回搜索结果的相关性评分 | `SELECT title, SCORE() FROM articles WHERE MATCH('title', 'technology')` |
|----------|-------------|---------|
| [MATCH](match) | 对指定列执行相关性排序搜索。 | `MATCH('summary, tags', 'traffic light red')` |
| [QUERY](query) | 解析 Lucene 风格查询表达式,支持嵌套 `VARIANT` 字段。 | `QUERY('meta.signals.traffic_light:red')` |
| [SCORE](score) | 与 `MATCH` 或 `QUERY` 配合使用时,返回当前行的相关性得分。 | `SELECT summary, SCORE() FROM frame_notes WHERE MATCH('summary, tags', 'traffic light red')` |

## 查询语法示例

### 示例:单个关键词

```sql
SELECT id, meta['frame']['timestamp'] AS ts
FROM frames
WHERE QUERY('meta.detections.label:pedestrian')
LIMIT 100;
```

### 示例:布尔 AND

```sql
SELECT id, meta['frame']['timestamp'] AS ts
FROM frames
WHERE QUERY('meta.signals.traffic_light:red AND meta.vehicle.lane:center')
LIMIT 100;
```

### 示例:布尔 OR

## 使用示例
```sql
SELECT id, meta['frame']['timestamp'] AS ts
FROM frames
WHERE QUERY('meta.signals.traffic_light:red OR meta.detections.label:bike')
LIMIT 100;
```

### 基本文本搜索
### 示例:IN 列表

```sql
-- 在 title 或 body 列中搜索包含 'technology' 的文档
SELECT * FROM articles
WHERE MATCH('title, body', 'technology');
SELECT id, meta['frame']['timestamp'] AS ts
FROM frames
WHERE QUERY('meta.tags:IN [stop urban]')
LIMIT 100;
```

### 高级查询表达式
### 示例:包含范围

```sql
-- 搜索 title 列包含 'technology' 且 body 列包含 'impact' 的文档
SELECT * FROM articles
WHERE QUERY('title:technology AND body:impact');
SELECT id, meta['frame']['timestamp'] AS ts
FROM frames
WHERE QUERY('meta.vehicle.speed_kmh:[0 TO 10]')
LIMIT 100;
```

### 相关性评分
### 示例:排除范围

```sql
-- 执行带相关性评分的搜索,并按评分降序排序
SELECT title, body, SCORE()
FROM articles
WHERE MATCH('title^2, body', 'technology')
ORDER BY SCORE() DESC;
SELECT id, meta['frame']['timestamp'] AS ts
FROM frames
WHERE QUERY('meta.vehicle.speed_kmh:{0 TO 10}')
LIMIT 100;
```

使用这些函数前,需在目标列上创建倒排索引(Inverted Index):
### 示例:加权字段

```sql
CREATE INVERTED INDEX idx ON articles(title, body);
SELECT id, meta['frame']['timestamp'] AS ts, SCORE()
FROM frames
WHERE QUERY('meta.signals.traffic_light:red^1.0 AND meta.tags:urban^2.0')
LIMIT 100;
```
127 changes: 53 additions & 74 deletions docs/cn/sql-reference/20-sql-functions/10-search-functions/match.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@ title: MATCH
---
import FunctionDescription from '@site/src/components/FunctionDescription';

<FunctionDescription description="引入或更新: v1.2.619"/>
<FunctionDescription description="Introduced or updated: v1.2.619"/>

搜索包含指定关键词的文档。请注意,MATCH 函数只能在 WHERE 子句中使用
`MATCH` 用于在指定列中搜索包含所提供关键字的行。该函数只能出现在 `WHERE` 子句中

:::info
Databend 的 MATCH 函数灵感来源于 Elasticsearch 的 [MATCH](https://www.elastic.co/guide/en/elasticsearch/reference/current/sql-functions-search.html#sql-functions-search-match)。
Expand All @@ -14,83 +14,62 @@ Databend 的 MATCH 函数灵感来源于 Elasticsearch 的 [MATCH](https://www.e
## 语法

```sql
MATCH( '<columns>', '<keywords>'[, '<options>'] )
MATCH('<columns>', '<keywords>'[, '<options>'])
```

| 参数 | 描述 |
|--------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `<columns>` | 表中要搜索指定关键词的列名列表,以逗号分隔,可选地使用 (^) 语法进行加权,允许为每个列分配不同的权重,影响每个列在搜索中的重要性。 |
| `<keywords>` | 要匹配表中指定列的关键词。此参数还可用于后缀匹配,搜索词后跟星号 (*) 可以匹配任意数量的字符或词。 |
| `<options>` | 一组以分号 `;` 分隔的配置选项,用于自定义搜索行为。详情见下表。 |
- `<columns>`:要搜索的列,以逗号分隔。可附加 `^<boost>` 为某列赋予更高权重。
- `<keywords>`:要搜索的词条。可附加 `*` 进行后缀匹配,例如 `rust*`。
- `<options>`:可选的、以分号分隔的 `key=value` 对列表,用于微调搜索。

| 选项 | 描述 | 示例 | 解释 |
|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| fuzziness | 允许匹配指定 Levenshtein 距离内的词项。`fuzziness` 可以设置为 1 或 2。 | SELECT id, score(), content FROM t WHERE match(content, 'box', 'fuzziness=1'); | 当匹配查询词 "box" 时,`fuzziness=1` 允许匹配 "fox" 等词项,因为 "box" 和 "fox" 的 Levenshtein 距离为 1。 |
| operator | 指定多个查询词项的组合方式。可以是 OR(默认)或 AND。OR 返回包含任意查询词项的结果,而 AND 返回包含所有查询词项的结果。 | SELECT id, score(), content FROM t WHERE match(content, 'action works', 'fuzziness=1;operator=AND'); | 使用 `operator=AND`,查询要求结果中同时包含 "action" 和 "works"。由于 `fuzziness=1`,它匹配 "Actions" 和 "words" 等词项,因此返回 "Actions speak louder than words"。 |
| lenient | 控制当查询文本无效时是否报告错误。默认为 `false`。如果设置为 `true`,则不报告错误,如果查询文本无效,则返回空结果集。 | SELECT id, score(), content FROM t WHERE match(content, '()', 'lenient=true'); | 如果查询文本 `()` 无效,设置 `lenient=true` 会阻止抛出错误,并返回空结果集。 |
## 选项

| 选项 | 值 | 描述 | 示例 |
|--------|--------|-------------|---------|
| `fuzziness` | `1` 或 `2` | 匹配在指定 Levenshtein distance(莱文斯坦距离)内的关键字。 | `MATCH('summary, tags', 'pedestrain', 'fuzziness=1')` 匹配包含正确拼写 `pedestrian` 的行。 |
| `operator` | `OR`(默认)或 `AND` | 在未指定布尔操作符时,控制多个关键字的组合方式。 | `MATCH('summary, tags', 'traffic light red', 'operator=AND')` 要求同时包含这两个词。 |
| `lenient` | `true` 或 `false` | 为 `true` 时,抑制解析错误并返回空结果集。 | `MATCH('summary, tags', '()', 'lenient=true')` 返回空行而非报错。 |

## 示例

在许多 AI Pipeline(流水线)中,你可能会在 `VARIANT` 列中捕获结构化元数据,同时为人类可读摘要建立索引以便搜索。以下示例存储了从 JSON 负载中提取的行车记录仪帧摘要和标签。

### 示例:构建可搜索的摘要

```sql
CREATE OR REPLACE TABLE frame_notes (
id INT,
camera STRING,
summary STRING,
tags STRING,
INVERTED INDEX idx_notes (summary, tags)
);

INSERT INTO frame_notes VALUES
(1, 'dashcam_front',
'Green light at Market & 5th with pedestrian entering the crosswalk',
'downtown commute green-light pedestrian'),
(2, 'dashcam_front',
'Vehicle stopped at Mission & 6th red traffic light with cyclist ahead',
'stop urban red-light cyclist'),
(3, 'dashcam_front',
'School zone caution sign in SOMA with pedestrian waiting near crosswalk',
'school-zone caution pedestrian');
```

### 示例:布尔 AND

```sql
SELECT id, summary
FROM frame_notes
WHERE MATCH('summary, tags', 'traffic light red', 'operator=AND');
-- 返回 id 2
```

### 示例:模糊匹配

```sql
CREATE TABLE test(title STRING, body STRING);

CREATE INVERTED INDEX idx ON test(title, body);

INSERT INTO test VALUES
('The Importance of Reading', 'Reading is a crucial skill that opens up a world of knowledge and imagination.'),
('The Benefits of Exercise', 'Exercise is essential for maintaining a healthy lifestyle.'),
('The Power of Perseverance', 'Perseverance is the key to overcoming obstacles and achieving success.'),
('The Art of Communication', 'Effective communication is crucial in everyday life.'),
('The Impact of Technology on Society', 'Technology has revolutionized our society in countless ways.');

-- 检索 'title' 列匹配 'art power' 的文档
SELECT * FROM test WHERE MATCH('title', 'art power');

┌────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ title │ body │
├───────────────────────────┼────────────────────────────────────────────────────────────────────────┤
│ The Power of Perseverance │ Perseverance is the key to overcoming obstacles and achieving success. │
│ The Art of Communication │ Effective communication is crucial in everyday life. │
└────────────────────────────────────────────────────────────────────────────────────────────────────┘

-- 检索 'title' 列包含以 'The' 开头后跟任意字符的值的文档
SELECT * FROM test WHERE MATCH('title', 'The*')

┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ title │ body │
│ Nullable(String) │ Nullable(String) │
├─────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────┤
│ The Importance of Reading │ Reading is a crucial skill that opens up a world of knowledge and imagination. │
│ The Benefits of Exercise │ Exercise is essential for maintaining a healthy lifestyle. │
│ The Power of Perseverance │ Perseverance is the key to overcoming obstacles and achieving success. │
│ The Art of Communication │ Effective communication is crucial in everyday life. │
│ The Impact of Technology on Society │ Technology has revolutionized our society in countless ways. │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

-- 检索 'title' 或 'body' 列匹配 'knowledge technology' 的文档
SELECT *, score() FROM test WHERE MATCH('title, body', 'knowledge technology');

┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ title │ body │ score() │
├─────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────┼───────────┤
│ The Importance of Reading │ Reading is a crucial skill that opens up a world of knowledge and imagination. │ 1.1550591 │
│ The Impact of Technology on Society │ Technology has revolutionized our society in countless ways. │ 2.6830134 │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

-- 检索 'title' 或 'body' 列匹配 'knowledge technology' 的文档,并对两列进行加权
SELECT *, score() FROM test WHERE MATCH('title^5, body^1.2', 'knowledge technology');

┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ title │ body │ score() │
├─────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────┼───────────┤
│ The Importance of Reading │ Reading is a crucial skill that opens up a world of knowledge and imagination. │ 1.3860708 │
│ The Impact of Technology on Society │ Technology has revolutionized our society in countless ways. │ 7.8053584 │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

-- 检索 'body' 列包含 "knowledge" 和 "imagination"(允许轻微拼写错误)的文档
SELECT * FROM test WHERE MATCH('body', 'knowledg imaginatio', 'fuzziness = 1; operator = AND');

-[ RECORD 1 ]-----------------------------------
title: The Importance of Reading
body: Reading is a crucial skill that opens up a world of knowledge and imagination.
SELECT id, summary
FROM frame_notes
WHERE MATCH('summary^2, tags', 'pedestrain', 'fuzziness=1');
-- 返回 id 1 和 3
```
Loading