Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SHOW FUNCTION and update docs for UDF #1140

Merged
merged 2 commits into from
May 11, 2019
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
83 changes: 83 additions & 0 deletions docs/documentation/cn/extending-doris/user-defined-function.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# USER DEFINED FUNCTION
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议文档中加入对 function 权限的说明,比如需要有 db 的select 权限才可以使用 function 等。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK


用户可以通过UDF机制来扩展Doris的能力。通过这篇文档,用户能够创建自己的UDF。

## 编写UDF函数

在使用UDF之前,用户需要先在Doris的UDF框架下,编写自己的UDF函数。在`be/src/udf_samples/udf_sample.h|cpp`文件中是一个简单的UDF Demo。

编写一个UDF函数需要以下几个步骤

### 编写函数

创建对应的头文件、CPP文件,在CPP文件中实现你需要的逻辑。CPP文件中的实现函数格式与UDF的对应关系。

#### 非可变参数

对于非可变参数的UDF,那么两者之间的对应关系很直接。
比如`INT MyADD(INT, INT)`的UDF就会对应`IntVal AddUdf(FunctionContext* context, const IntVal& arg1, const IntVal& arg2)`

1. `AddUdf`可以为任意的名字,只要创建UDF的时候指定即可
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里每个语句有些有句号有些没有句号,统一一下吧。下同

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

2. 实现函数中的第一个参数永远是`FunctionContext*`。实现者可以通过这个结构体获得一些查询相关的内容,以及申请一些需要使用的内存。具体使用的接口可以参考`udf/udf.h`中的定义。
3. 实现函数中从第二个参数开始需要与UDF的参数一一对应,比如`IntVal`对应`INT`类型。这部分的类型都要使用`const`引用
4. 返回参数与UDF的参数的类型要相对应

#### 可变参数

对于可变参数,可以参见一下例子,UDF`String md5sum(String, ...)`对应的
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

一下 -> 以下

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

实现函数是`StringVal md5sumUdf(FunctionContext* ctx, int num_args, const StringVal* args)`

1. `md5sumUdf`这个也是可以任意改变的,创建的时候指定即可
2. 第一个参数与非可变参数函数一样,传入的是一个`FunctionContext*`
3. 可变参数部分由两部分组成,首先会传入一个整数,说明后面还有几个参数。后面传入的是一个可变参数部分的数组

#### 类型对应关系

|UDF Type|Argument Type|
|----|---------|
|TinyInt|TinyIntVal|
|SmallInt|SmallIntVal|
|Int|IntVal|
|BigInt|BigIntVal|
|LargeInt|LargeIntVal|
|Float|FloatVal|
|Double|DoubleVal|
|Date|DateTimeVal|
|Datetime|DateTimeVal|
|Char|StringVal|
|Varchar|StringVal|
|Decimal|DecimalVal|

### 修改CMakeLists.txt

在`be/src/udf_samples/CMakeLists.txt`增加对应的动态库创建描述,类似于`add_library(udfsample SHARED udf_sample.cpp)`。这个描述增加了一个`udfsample`动态库。后面需要写上涉及的所有源文件(不包含头文件)。

### 执行编译

在最外部执行`sh build.sh`就可以生成对应的动态库。生成的动态库的位置,位于`be/build/src/udf_samples/`下。比如`udfsample`就会生成一个文件位于`be/build/src/udf_samples/libudfsample.so`

## 创建UDF函数

通过上述的步骤后,你可以得到一个动态库。你需要将这个动态库放到一个能够通过HTTP协议访问到的位置。然后执行创建UDF函数在Doris系统内部创建一个UDF,你需要拥有AMDIN权限才能够完成这个操作。

```
CREATE [AGGREGATE] FUNCTION
name ([argtype][,...])
[RETURNS] rettype
PROPERTIES (["key"="value"][,...])
```
说明:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里可以加一个,更多帮助请参阅 HELP CREATE FUNCTION;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK


1. PROPERTIES中"symbol"表示的是,执行入口函数的对应symbol,这个参数是必须设定。你可以通过`nm`命令来获得对应的symbol,比如`nm libudfsample.so | grep AddUdf`获得到的`_ZN9doris_udf6AddUdfEPNS_15FunctionContextERKNS_6IntValES4_`就是对应的symbol。
2. PROPERTIES中"object_file"表示的是从哪里能够下载到对应的动态库,这个参数是必须设定的。
3. name: 一个function是要归属于某个DB的,name的形式为`dbName`.`funcName`。当`dbName`没有明确指定的时候,就是使用当前session所在的db作为`dbName`.

## 使用UDF

UDF的使用与普通的函数方式一致,唯一的区别在于,内置函数的作用域是全局的,而UDF的作用域是DB内部。当链接session位于数据内部时,直接使用UDF名字会在当前DB内部查找对应的UDF。否则用户需要显示的指定UDF的数据库名字,例如`dbName`.`funcName`。


## 删除UDF函数

当你不再需要UDF函数时,你可以通过下述命令来删除一个UDF函数, 可以参考 `DROP FUNCTION`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以参考 DROP FUNCTION -> 可以参考 HELP DROP FUNCTION

Copy link
Contributor Author

@imay imay May 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里加HELP不太好,未来这个文档可能在PDF中出现,那么可能是另外一个章节的链接。所以还是不加HELP比较好


Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# CREATE FUNCTION

## Syntax

```
CREATE [AGGREGATE] FUNCTION function_name
(arg_type [, ...])
RETURNS ret_type
[INTERMEDIATE inter_type]
[PROPERTIES ("key" = "value" [, ...]) ]
```

## Description

此语句创建一个自定义函数。执行此命令需要用户拥有 `ADMIN` 权限。

如果 `function_name` 中包含了数据库名字,那么这个自定义函数会创建在对应的数据库中,否则这个函数将会创建在当前会话所在的数据库。新函数的名字与参数不能够与当前命名空间中已存在的函数相同,否则会创建失败。但是只有名字相同,参数不同是能够创建成果的。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

能够创建成果的。-> 能够创建成功的。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK


## Parameters

> `AGGREGATE`: 如果有此项,表示的是创建的函数是一个聚合函数,否则创建的是一个标量函数
>
> `function_name`: 要创建函数的名字, 可以包含数据库的名字。比如:`db1.my_func`
>
> `arg_type`: 函数的参数类型,与建表时定义的类型一致。变长参数时可以使用`, ...`来表示,如果是变长类型,那么变长部分参数的类型与最后一个费变长参数类型一致
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

最后一个费变长 -> 最后一个非变长

>
> `ret_type`: 函数返回类型
>
> `inter_type`: 用于表示聚合函数中间阶段的数据类型
>
> `properties`: 用于设定此函数相关属性,能够设置的属性包括
>
> "object_file": 自定义函数动态库的URL路径,当前只支持 HTTP/HTTPS 协议,此路径需要在函数整个生命周期内保持有效。此选项为必选项
>
> "symbol": 标量函数的函数签名,用于从动态库里面找到函数入口。此选项对于标量函数是必选项
>
> "init_fn": 聚合函数的初始化函数签名。对于聚合函数是必选项
>
> "update_fn": 聚合函数的更新函数签名。对于聚合函数是必选项
>
> "merge_fn": 聚合函数的合并函数签名。对于聚合函数是必选项
>
> "serialize_fn": 聚合函数的序列化函数签名。对于聚合函数是可选项,如果没有指定,那么将会使用默认的序列化函数
>
> "finalize_fn": 聚合函数获取最后结果的函数签名。对于聚合函数是可选项,如果没有指定,将会使用默认的获取结果函数
>
> "md5": 函数动态链接库的MD5值,用于校验下载的内容是否正确。此选项是可选项

## Examples

1. 创建一个自定义标量函数

```
CREATE FUNCTION my_add(INT, INT) RETURNS INT PROPERTIES (
"symbol" = "_ZN9doris_udf6AddUdfEPNS_15FunctionContextERKNS_6IntValES4_",
"object_file" = "http://host:port/libmyadd.so"
);
```

2. 创建一个自定义聚合函数

```
CREATE AGGREGATE FUNCTION my_count (BIGINT) RETURNS BIGINT PROPERTIES (
"init_fn"="_ZN9doris_udf9CountInitEPNS_15FunctionContextEPNS_9BigIntValE",
"update_fn"="_ZN9doris_udf11CountUpdateEPNS_15FunctionContextERKNS_6IntValEPNS_9BigIntValE",
"merge_fn"="_ZN9doris_udf10CountMergeEPNS_15FunctionContextERKNS_9BigIntValEPS2_",
"finalize_fn"="_ZN9doris_udf13CountFinalizeEPNS_15FunctionContextERKNS_9BigIntValE",
"object_file"="http://host:port/libudasample.so"
);
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# DROP FUNCTION

## Syntax

```
DROP FUNCTION function_name
(arg_type [, ...])
```

## Description

删除一个自定义函数。函数的名字、参数类型完全一致才能够被删除

## Parameters

> `function_name`: 要删除函数的名字
>
> `arg_type`: 要删除函数的参数列表
>

## Examples

1. 删除掉一个函数

```
DROP FUNCTION my_add(INT, INT)
```
6 changes: 5 additions & 1 deletion docs/documentation/cn/sql-reference/sql-statements/insert.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# insert
# INSERT

## Syntax

Expand Down Expand Up @@ -33,6 +33,10 @@ column是目标列,可以以任意的顺序存在。如果没有指定目标
>
> hint: 用于指示`INSERT`执行行为的一些指示符。`streaming`,用于指示使用同步方式来完成`INSERT`语句执行。

## Note

当前执行 `INSERT` 语句时,对于有不符合目标表格式的数据,默认的行为是过滤,比如字符串超长等。但是对于有要求数据不能够被过滤的业务场景,可以通过设置会话变量 `enable_insert_strict` 为 `true` 来确保当有数据被过滤掉的时候,`INSERT` 不会被执行成功。

## Examples

`test` 表包含两个列`c1`, `c2`。
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# SHOW FUNCTION

## Syntax

```
SHOW FUNCTION [FROM db]
```

## Description

查看数据库下所有的自定义函数。如果用户指定了数据库,那么查看对应数据库的,否则直接查询当前会话所在数据库

需要对这个数据库拥有 `SHOW` 权限

## Parameters

> `db`: 要查询的数据库名字

## Examples

```
mysql> show function in testDb\G
*************************** 1. row ***************************
Signature: my_count(BIGINT)
Return Type: BIGINT
Function Type: Aggregate
Intermediate Type: NULL
Properties: {"object_file":"http://host:port/libudasample.so","finalize_fn":"_ZN9doris_udf13CountFinalizeEPNS_15FunctionContextERKNS_9BigIntValE","init_fn":"_ZN9doris_udf9CountInitEPNS_15FunctionContextEPNS_9BigIntValE","merge_fn":"_ZN9doris_udf10CountMergeEPNS_15FunctionContextERKNS_9BigIntValEPS2_","md5":"37d185f80f95569e2676da3d5b5b9d2f","update_fn":"_ZN9doris_udf11CountUpdateEPNS_15FunctionContextERKNS_6IntValEPNS_9BigIntValE"}
*************************** 2. row ***************************
Signature: my_add(INT,INT)
Return Type: INT
Function Type: Scalar
Intermediate Type: NULL
Properties: {"symbol":"_ZN9doris_udf6AddUdfEPNS_15FunctionContextERKNS_6IntValES4_","object_file":"http://host:port/libudfsample.so","md5":"cfe7a362d10f3aaf6c49974ee0f1f878"}
2 rows in set (0.00 sec)
```
4 changes: 4 additions & 0 deletions fe/src/main/cup/sql_parser.cup
Original file line number Diff line number Diff line change
Expand Up @@ -1894,6 +1894,10 @@ show_param ::=
{:
RESULT = new ShowRolesStmt();
:}
| KW_FUNCTION opt_db:dbName
{:
RESULT = new ShowFunctionStmt(dbName);
:}
;

keys_or_index ::=
Expand Down
21 changes: 10 additions & 11 deletions fe/src/main/java/org/apache/doris/analysis/CreateFunctionStmt.java
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,16 @@

// create a user define function
public class CreateFunctionStmt extends DdlStmt {
public static final String OBJECT_FILE_KEY = "object_file";
public static final String SYMBOL_KEY = "symbol";
public static final String MD5_CHECKSUM = "md5";
public static final String INIT_KEY = "init_fn";
public static final String UPDATE_KEY = "update_fn";
public static final String MERGE_KEY = "merge_fn";
public static final String SERIALIZE_KEY = "serialize_fn";
public static final String FINALIZE_KEY = "finalize_fn";
public static final String GET_VALUE_KEY = "get_value_fn";
public static final String REMOVE_KEY = "remove_fn";

private final FunctionName functionName;
private final boolean isAggregate;
Expand Down Expand Up @@ -102,7 +112,6 @@ private void analyzeCommon(Analyzer analyzer) throws AnalysisException {
intermediateType = returnType;
}

String OBJECT_FILE_KEY = "object_file";
objectFile = properties.get(OBJECT_FILE_KEY);
if (Strings.isNullOrEmpty(objectFile)) {
throw new AnalysisException("No 'object_file' in properties");
Expand All @@ -113,7 +122,6 @@ private void analyzeCommon(Analyzer analyzer) throws AnalysisException {
throw new AnalysisException("cannot to compute object's checksum");
}

String MD5_CHECKSUM = "md5";
String md5sum = properties.get(MD5_CHECKSUM);
if (md5sum != null && !md5sum.equalsIgnoreCase(checksum)) {
throw new AnalysisException("library's checksum is not equal with input, checksum=" + checksum);
Expand All @@ -140,14 +148,6 @@ private void computeObjectChecksum() throws IOException, NoSuchAlgorithmExceptio
}

private void analyzeUda() throws AnalysisException {
final String INIT_KEY = "init_fn";
final String UPDATE_KEY = "update_fn";
final String MERGE_KEY = "merge_fn";
final String SERIALIZE_KEY = "serialize_fn";
final String FINALIZE_KEY = "finalize_fn";
final String GET_VALUE_KEY = "get_value_fn";
final String REMOVE_KEY = "remove_fn";

AggregateFunction.AggregateFunctionBuilder builder = AggregateFunction.AggregateFunctionBuilder.createUdfBuilder();

builder.name(functionName).argsType(argsDef.getArgTypes()).retType(returnType.getType())
Expand All @@ -173,7 +173,6 @@ private void analyzeUda() throws AnalysisException {
}

private void analyzeUdf() throws AnalysisException {
final String SYMBOL_KEY = "symbol";
String symbol = properties.get(SYMBOL_KEY);
if (Strings.isNullOrEmpty(symbol)) {
throw new AnalysisException("No 'symbol' in properties");
Expand Down
72 changes: 72 additions & 0 deletions fe/src/main/java/org/apache/doris/analysis/ShowFunctionStmt.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

package org.apache.doris.analysis;

import com.google.common.base.Strings;
import org.apache.doris.catalog.Catalog;
import org.apache.doris.catalog.Column;
import org.apache.doris.catalog.ScalarType;
import org.apache.doris.cluster.ClusterNamespace;
import org.apache.doris.common.ErrorCode;
import org.apache.doris.common.ErrorReport;
import org.apache.doris.common.UserException;
import org.apache.doris.mysql.privilege.PrivPredicate;
import org.apache.doris.qe.ConnectContext;
import org.apache.doris.qe.ShowResultSetMetaData;

public class ShowFunctionStmt extends ShowStmt {
private static final ShowResultSetMetaData META_DATA =
ShowResultSetMetaData.builder()
.addColumn(new Column("Signature", ScalarType.createVarchar(256)))
.addColumn(new Column("Return Type", ScalarType.createVarchar(32)))
.addColumn(new Column("Function Type", ScalarType.createVarchar(16)))
.addColumn(new Column("Intermediate Type", ScalarType.createVarchar(16)))
.addColumn(new Column("Properties", ScalarType.createVarchar(16)))
.build();

private String dbName;

public ShowFunctionStmt(String dbName) {
this.dbName = dbName;
}

public String getDbName() { return dbName; }

@Override
public void analyze(Analyzer analyzer) throws UserException {
super.analyze(analyzer);
if (Strings.isNullOrEmpty(dbName)) {
dbName = analyzer.getDefaultDb();
if (Strings.isNullOrEmpty(dbName)) {
ErrorReport.reportAnalysisException(ErrorCode.ERR_NO_DB_ERROR);
}
} else {
dbName = ClusterNamespace.getFullName(getClusterName(), dbName);
}

if (!Catalog.getCurrentCatalog().getAuth().checkDbPriv(ConnectContext.get(), dbName, PrivPredicate.SHOW)) {
ErrorReport.reportAnalysisException(
ErrorCode.ERR_DB_ACCESS_DENIED, ConnectContext.get().getQualifiedUser(), dbName);
}
}

@Override
public ShowResultSetMetaData getMetaData() {
return META_DATA;
}
}
Loading