Skip to content
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ MARKOV_TRAIN(<order>, <frequency_cutoff>, <num_buckets_cutoff>, <frequency_add>,

## 返回类型

取决于实现,仅用于作为 [MARKOV_GENERATE](../20-other-functions/markov_generate.md) 的参数。
取决于实现,仅用于作为 [MARKOV_GENERATE](../19-data-anonymization-functions/markov_generate.md) 的参数。

## 示例

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,11 @@
title: OBFUSCATE
---

OBFUSCATE 表函数用于生成匿名化数据。这是一个快速工具,对于更复杂的场景,推荐直接使用底层函数 [MARKOV_TRAIN](../07-aggregate-functions/aggregate-markov-train.md)、[MARKOV_GENERATE](../20-other-functions/markov_generate.md) 和 [FEISTEL_OBFUSCATE](../20-other-functions/feistel_obfuscate.md)。该函数支持的类型包括 Email、String、Date、Integer 和 Float。
OBFUSCATE 表函数用于生成匿名化数据。这是一个快速方式,对于更复杂的场景,推荐直接使用底层函数 [MARKOV_TRAIN](../07-aggregate-functions/aggregate-markov-train.md)、[MARKOV_GENERATE](../19-data-anonymization-functions/markov_generate.md) 和 [FEISTEL_OBFUSCATE](../19-data-anonymization-functions/feistel_obfuscate.md)。该函数支持对 String、Integer 和 Float 类型的数据进行匿名化处理。

:::note
对于其他暂时不支持的类型(如 Date),该函数目前不进行处理,直接返回原值。
:::

## 语法

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
---
title: FEISTEL_OBFUSCATE
---

FEISTEL_OBFUSCATE 函数用于对数值类型的数据进行匿名化处理。

## 语法

```sql
FEISTEL_OBFUSCATE( <number>, <seed> )
```

## 参数

| 参数 | 描述 |
| ----------- | ----------- |
| `<number>` | 需要进行匿名化处理的数值。|
| `<seed>` | 加密种子。<br /> 使用相同的种子总是会生成相同的结果,这在某些场景下很有用。但请注意,泄露种子可能会导致原始数据被还原。|

## 返回类型

与输入相同

## 示例

```sql
SELECT feistel_obfuscate(10000,1561819567875);
+------------------------------------------+
| feistel_obfuscate(10000, 1561819567875) |
+------------------------------------------+
| 15669 |
+------------------------------------------+
```
feistel_obfuscate 会保留原始输入的位数。如果需要映射到更大的数值范围,可以在原始输入上添加一个偏移量,例如:`feistel_obfuscate(n+10000, 50)`。
```sql
SELECT feistel_obfuscate(10,1561819567875);
+------------------------------------------+
| feistel_obfuscate(10, 1561819567875) |
+------------------------------------------+
| 13 |
+------------------------------------------+
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
---
title: 数据匿名化函数
---

本节提供用于数据匿名化的函数。

| 函数 | 描述 |
|----------|-------------|
| [MARKOV_GENERATE](markov_generate.md) | 基于马尔可夫模型生成匿名化数据 |
| [FEISTEL_OBFUSCATE](feistel_obfuscate.md) | 对数值类型进行匿名化处理 |
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
---
title: MARKOV_GENERATE
---

MARKOV_GENERATE 函数利用由 [MARKOV_TRAIN](../07-aggregate-functions/aggregate-markov-train.md) 训练生成的模型来生成匿名化数据。

## 语法

```sql
MARKOV_GENERATE( <model>, <params>, <seed>, <determinator> )
```

## 参数

| 参数 | 描述 |
| ----------- | ----------- |
| `model` | 由 markov_train 生成的模型。 |
| `params`| 生成参数,为 JSON 字符串格式,例如 `{"order": 5, "sliding_window_size": 8}`。<br/> `order`:模型上下文长度。<br/> `sliding_window_size`:源字符串中滑动窗口的大小,其哈希值将用作模型中随机数生成器 (RNG) 的种子。 |
| `seed` | 生成种子。|
| `determinator`| 输入数据(决定因子)。 |

## 返回类型

字符串

## 示例

```sql
create table model as
select markov_train(concat('bar', number::string)) as bar from numbers(100);

select markov_generate(bar,'{"order":5,"sliding_window_size":8}', 151, (number+100000)::string) as generate
from numbers(5), model;
+-----------+
| generate |
+-----------+
│ bar95 │
│ bar64 │
│ bar85 │
│ bar56 │
│ bar95 │
+-----------+
```
Original file line number Diff line number Diff line change
Expand Up @@ -14,5 +14,4 @@ title: 其他函数
| [REMOVE_NULLABLE](remove-nullable.md) | 从列值中去除可空性 |
| [TO_NULLABLE](to-nullable.md) | 将值转换为可空类型 |
| [TYPEOF](typeof.md) | 返回值的数据类型名称 |
| [MARKOV_GENERATE](markov_generate.md) | 基于马尔可夫模型生成匿名化数据 |
| [FEISTEL_OBFUSCATE](feistel_obfuscate.md) | 对数值类型进行匿名化处理 |

1 change: 1 addition & 0 deletions docs/cn/sql-reference/20-sql-functions/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ Databend 为各类数据处理提供了全面的 SQL 函数。函数按重要性
|----------|-------------|
| [间隔函数](./05-interval-functions/index.md) | 时间单位转换与间隔创建 |
| [序列函数](./18-sequence-functions/index.md) | 自增序列值生成 |
| [数据匿名化函数](./19-data-anonymization-functions/index.md) | 数据脱敏与匿名化工具 |

| [测试函数](./19-test-functions/index.md) | 测试与调试工具 |
| [其他函数](./20-other-functions/index.md) | 杂项辅助与实用工具 |
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ MARKOV_TRAIN(<order>, <frequency_cutoff>, <num_buckets_cutoff>, <frequency_add>,

## Return Type

Depending on the implementation, it is only used as a argument for [MARKOV_GENERATE](../20-other-functions/markov_generate.md).
Depending on the implementation, it is only used as a argument for [MARKOV_GENERATE](../19-data-anonymization-functions/markov_generate.md).

## Examples

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: OBFUSCATE
---

Dataset anonymization. This is a quick tool, and for more complex scenarios, it is recommended to directly use the underlying function [MARKOV_TRAIN](../07-aggregate-functions/aggregate-markov-train.md), [MARKOV_GENERATE](../20-other-functions/markov_generate.md), [FEISTEL_OBFUSCATE](../20-other-functions/feistel_obfuscate.md).
Dataset anonymization. This is a quick tool, and for more complex scenarios, it is recommended to directly use the underlying function [MARKOV_TRAIN](../07-aggregate-functions/aggregate-markov-train.md), [MARKOV_GENERATE](../19-data-anonymization-functions/markov_generate.md), [FEISTEL_OBFUSCATE](../19-data-anonymization-functions/feistel_obfuscate.md).

## Syntax

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
---
title: Data Anonymization Functions
---

This section provides functions used for data anonymization.

| Function | Description |
|----------|-------------|
| [MARKOV_GENERATE](markov_generate.md) | Generate anonymized strings based on a Markov model |
| [FEISTEL_OBFUSCATE](feistel_obfuscate.md) | Obfuscate numbers using a Feistel cipher |
Original file line number Diff line number Diff line change
Expand Up @@ -14,5 +14,5 @@ This section collects assorted utilities that do not fit into the major function
| [REMOVE_NULLABLE](remove-nullable.md) | Strip NULLability from a column value |
| [TO_NULLABLE](to-nullable.md) | Convert a value to a nullable type |
| [TYPEOF](typeof.md) | Return the name of a value’s data type |
| [MARKOV_GENERATE](markov_generate.md) | Generate anonymized strings |
| [FEISTEL_OBFUSCATE](feistel_obfuscate.md) | Transformed numbers for anonymization |


1 change: 1 addition & 0 deletions docs/en/sql-reference/20-sql-functions/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ Databend provides comprehensive SQL functions for all types of data processing.
|----------|-------------|
| [Interval Functions](./05-interval-functions/index.md) | Time unit conversion and interval creation |
| [Sequence Functions](./18-sequence-functions/index.md) | Auto-incrementing sequence value generation |
| [Data Anonymization Functions](./19-data-anonymization-functions/index.md) | Data masking and anonymization utilities |
| [Dictionary Functions](./19-dictionary-functions/index.md) | Real-time external data source queries (MySQL, Redis) |
| [Test Functions](./19-test-functions/index.md) | Testing and debugging utilities |
| [Other Functions](./20-other-functions/index.md) | Miscellaneous helpers and utilities |
Loading