Skip to content
Permalink
Browse files
[feature-wip](hudi) Step1: Support create hudi external table (#9559)
support create hudi table
support show create table for hudi table

### Design
1. create hudi table without schema(recommanded)
```sql
    CREATE [EXTERNAL] TABLE table_name
    ENGINE = HUDI
    [COMMENT "comment"]
    PROPERTIES (
    "hudi.database" = "hudi_db_in_hive_metastore",
    "hudi.table" = "hudi_table_in_hive_metastore",
    "hudi.hive.metastore.uris" = "thrift://127.0.0.1:9083"
    );
```

2. create hudi table with schema
```sql
    CREATE [EXTERNAL] TABLE table_name
    [(column_definition1[, column_definition2, ...])]
    ENGINE = HUDI
    [COMMENT "comment"]
    PROPERTIES (
    "hudi.database" = "hudi_db_in_hive_metastore",
    "hudi.table" = "hudi_table_in_hive_metastore",
    "hudi.hive.metastore.uris" = "thrift://127.0.0.1:9083"
    );
```
When create hudi table with schema, the columns must exist in corresponding table in hive metastore.
  • Loading branch information
dujl committed May 17, 2022
1 parent bee5c2f commit 72e0042efb675debafb66cfed5f41ca990c88d92
Showing 19 changed files with 782 additions and 10 deletions.
@@ -224,7 +224,8 @@ module.exports = [
"doris-on-es",
"odbc-of-doris",
"hive-of-doris",
"iceberg-of-doris"
"iceberg-of-doris",
"hudi-external-table"
],
},
"audit-plugin",
@@ -224,7 +224,8 @@ module.exports = [
"doris-on-es",
"odbc-of-doris",
"hive-of-doris",
"iceberg-of-doris"
"iceberg-of-doris",
"hudi-external-table"
],
},
"audit-plugin",
@@ -0,0 +1,137 @@
---
{
"title": "Doris Hudi external table",
"language": "en"
}
---

<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

# Hudi External Table of Doris

Hudi External Table of Doris provides Doris with the ability to access hdui external tables directly, eliminating the need for cumbersome data import and leveraging Doris' own OLAP capabilities to solve hudi table data analysis problems.

1. support hudi data sources for Doris
2. Support joint query between Doris and hdui data source tables to perform more complex analysis operations

This document introduces how to use this feature and the considerations.

## Glossary

### Noun in Doris

* FE: Frontend, the front-end node of Doris, responsible for metadata management and request access
* BE: Backend, the backend node of Doris, responsible for query execution and data storage

## How to use

### Create Hudi External Table

Hudi tables can be created in Doris with or without schema. You do not need to declare the column definitions of the table when creating an external table, Doris can resolve the column definitions of the table in hive metastore when querying the table.

1. Create a separate external table to mount the Hudi table.
The syntax can be viewed in `HELP CREATE TABLE`.

```sql
-- Syntax
CREATE [EXTERNAL] TABLE table_name
[(column_definition1[, column_definition2, ...])]
ENGINE = HUDI
[COMMENT "comment"]
PROPERTIES (
"hudi.database" = "hudi_db_in_hive_metastore",
"hudi.table" = "hudi_table_in_hive_metastore",
"hudi.hive.metastore.uris" = "thrift://127.0.0.1:9083"
);
-- Example: Mount hudi_table_in_hive_metastore under hudi_db_in_hive_metastore in Hive MetaStore
CREATE TABLE `t_hudi`
ENGINE = HUDI
PROPERTIES (
"hudi.database" = "hudi_db_in_hive_metastore",
"hudi.table" = "hudi_table_in_hive_metastore",
"hudi.hive.metastore.uris" = "thrift://127.0.0.1:9083"
);
-- Example:Mount hudi table with schema.
CREATE TABLE `t_hudi` (
`id` int NOT NULL COMMENT "id number",
`name` varchar(10) NOT NULL COMMENT "user name"
) ENGINE = HUDI
PROPERTIES (
"hudi.database" = "hudi_db_in_hive_metastore",
"hudi.table" = "hudi_table_in_hive_metastore",
"hudi.hive.metastore.uris" = "thrift://127.0.0.1:9083"
);
```


#### Parameter Description
- column_definition
- When create hudi table without schema(recommended), doris will resolve columns from hive metastore when query.
- When create hudi table with schema, the columns must exist in corresponding table in hive metastore.
- ENGINE needs to be specified as HUDI
- PROPERTIES property.
- `hudi.hive.metastore.uris`: Hive Metastore service address
- `hudi.database`: the name of the database to which Hudi is mounted
- `hudi.table`: the name of the table to which Hudi is mounted, not required when mounting Hudi database.

### Show table structure

Show table structure can be viewed by `HELP SHOW CREATE TABLE`.



## Data Type Matching

The supported Hudi column types correspond to Doris in the following table.

| Hudi | Doris | Description |
| :------: | :----: | :-------------------------------: |
| BOOLEAN | BOOLEAN | |
| INTEGER | INT | |
| LONG | BIGINT | |
| FLOAT | FLOAT | |
| DOUBLE | DOUBLE | |
| DATE | DATE | |
| TIMESTAMP | DATETIME | Timestamp to Datetime with loss of precision |
| STRING | STRING | |
| UUID | VARCHAR | Use VARCHAR instead |
| DECIMAL | DECIMAL | |
| TIME | - | not supported |
| FIXED | - | not supported |
| BINARY | - | not supported |
| STRUCT | - | not supported |
| LIST | - | not supported |
| MAP | - | not supported |

**Note:**
- The current default supported version of hudi is 0.10.0 and has not been tested in other versions. More versions will be supported in the future.


### Query Usage

Once you have finished building the hdui external table in Doris, it is no different from a normal Doris OLAP table except that you cannot use the data models in Doris (rollup, preaggregation, materialized views, etc.)

```sql
select * from t_hudi where k1 > 1000 and k3 = 'term' or k4 like '%doris';
```

@@ -34,7 +34,7 @@ CREATE EXTERNAL TABLE

This statement is used to create an external table, see [CREATE TABLE](./CREATE-TABLE.md) for the specific syntax.

Which type of external table is mainly identified by the ENGINE type, currently MYSQL, BROKER, HIVE, ICEBERG are optional
Which type of external table is mainly identified by the ENGINE type, currently MYSQL, BROKER, HIVE, ICEBERG, HUDI are optional

1. If it is mysql, you need to provide the following information in properties:

@@ -111,6 +111,20 @@ Which type of external table is mainly identified by the ENGINE type, currently
hive.metastore.uris is the hive metastore service address;
catalog.type defaults to HIVE_CATALOG. Currently only HIVE_CATALOG is supported, more Iceberg catalog types will be supported in the future.

5. In case of hudi, you need to provide the following information in properties:

```sql
PROPERTIES (
"hudi.database" = "hudi_db_in_hive_metastore",
"hudi.table" = "hudi_table_in_hive_metastore",
"hudi.hive.metastore.uris" = "thrift://127.0.0.1:9083"
)
````

Where hudi.database is the corresponding database name in HiveMetaStore;
hudi.table is the corresponding table name in HiveMetaStore;
hive.metastore.uris is the hive metastore service address;

### Example

1. Create a MYSQL external table
@@ -225,6 +239,32 @@ Which type of external table is mainly identified by the ENGINE type, currently
);
````

5. Create an Hudi external table

create hudi table without schema(recommend)
```sql
CREATE TABLE example_db.t_hudi
ENGINE=HUDI
PROPERTIES (
"hudi.database" = "hudi_db_in_hive_metastore",
"hudi.table" = "hudi_table_in_hive_metastore",
"hudi.hive.metastore.uris" = "thrift://127.0.0.1:9083"
);
````

create hudi table with schema
```sql
CREATE TABLE example_db.t_hudi (
`id` int NOT NULL COMMENT "id number",
`name` varchar(10) NOT NULL COMMENT "user name"
)
ENGINE=HUDI
PROPERTIES (
"hudi.database" = "hudi_db_in_hive_metastore",
"hudi.table" = "hudi_table_in_hive_metastore",
"hudi.hive.metastore.uris" = "thrift://127.0.0.1:9083"
);
````

### Keywords

@@ -0,0 +1,134 @@
---
{
"title": "Doris Hudi external table",
"language": "zh-CN"
}
---

<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

# Hudi External Table of Doris

Hudi External Table of Doris 提供了 Doris 直接访问 Hudi 外部表的能力,外部表省去了繁琐的数据导入工作,并借助 Doris 本身的 OLAP 的能力来解决 Hudi 表的数据分析问题:

1. 支持 Hudi 数据源接入Doris
2. 支持 Doris 与 Hive数据源Hudi中的表联合查询,进行更加复杂的分析操作

本文档主要介绍该功能的使用方式和注意事项等。

## 名词解释

### Doris 相关

* FE:Frontend,Doris 的前端节点,负责元数据管理和请求接入
* BE:Backend,Doris 的后端节点,负责查询执行和数据存储

## 使用方法

### Doris 中创建 Hudi 的外表

可以通过以下两种方式在 Doris 中创建 Hudi 外表。建外表时无需声明表的列定义,Doris 可以在查询时从HiveMetaStore中获取列信息。

1. 创建一个单独的外表,用于挂载 Hudi 表。
具体相关语法,可以通过 [CREATE TABLE](../../sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-TABLE.md) 查看。

```sql
-- 语法
CREATE [EXTERNAL] TABLE table_name
[(column_definition1[, column_definition2, ...])]
ENGINE = HUDI
[COMMENT "comment"]
PROPERTIES (
"hudi.database" = "hudi_db_in_hive_metastore",
"hudi.table" = "hudi_table_in_hive_metastore",
"hudi.hive.metastore.uris" = "thrift://127.0.0.1:9083"
);
-- 例子:挂载 HiveMetaStore 中 hudi_db_in_hive_metastore 下的 hudi_table_in_hive_metastore,挂载时不指定schema。
CREATE TABLE `t_hudi`
ENGINE = HUDI
PROPERTIES (
"hudi.database" = "hudi_db_in_hive_metastore",
"hudi.table" = "hudi_table_in_hive_metastore",
"hudi.hive.metastore.uris" = "thrift://127.0.0.1:9083"
);
-- 例子:挂载时指定schema
CREATE TABLE `t_hudi` (
`id` int NOT NULL COMMENT "id number",
`name` varchar(10) NOT NULL COMMENT "user name"
) ENGINE = HUDI
PROPERTIES (
"hudi.database" = "hudi_db_in_hive_metastore",
"hudi.table" = "hudi_table_in_hive_metastore",
"hudi.hive.metastore.uris" = "thrift://127.0.0.1:9083"
);
```


#### 参数说明:

- 外表列
- 可以不指定列名,这时查询时会从HiveMetaStore中获取列信息,推荐这种建表方式
- 指定列名时指定的列名要在 Hudi 表中存在
- ENGINE 需要指定为 HUDI
- PROPERTIES 属性:
- `hudi.hive.metastore.uris`:Hive Metastore 服务地址
- `hudi.database`:挂载 Hudi 对应的数据库名
- `hudi.table`:挂载 Hudi 对应的表名

### 展示表结构

展示表结构可以通过 [SHOW CREATE TABLE](../../sql-manual/sql-reference/Show-Statements/SHOW-CREATE-TABLE.md) 查看。

## 类型匹配

支持的 Hudi 列类型与 Doris 对应关系如下表:

| Hudi | Doris | 描述 |
| :------: | :----: | :-------------------------------: |
| BOOLEAN | BOOLEAN | |
| INTEGER | INT | |
| LONG | BIGINT | |
| FLOAT | FLOAT | |
| DOUBLE | DOUBLE | |
| DATE | DATE | |
| TIMESTAMP | DATETIME | Timestamp 转成 Datetime 会损失精度 |
| STRING | STRING | |
| UUID | VARCHAR | 使用 VARCHAR 来代替 |
| DECIMAL | DECIMAL | |
| TIME | - | 不支持 |
| FIXED | - | 不支持 |
| BINARY | - | 不支持 |
| STRUCT | - | 不支持 |
| LIST | - | 不支持 |
| MAP | - | 不支持 |

**注意:**
- 当前默认支持的 Hudi 版本为 0.10.0,未在其他版本进行测试。后续后支持更多版本。

### 查询用法

完成在 Doris 中建立 Hudi 外表后,除了无法使用 Doris 中的数据模型(rollup、预聚合、物化视图等)外,与普通的 Doris OLAP 表并无区别

```sql
select * from t_hudi where k1 > 1000 and k3 ='term' or k4 like '%doris';
```

0 comments on commit 72e0042

Please sign in to comment.