Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/lakehouse/file-analysis.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,8 @@ For more usage methods, refer to the Table Value Function documentation:

* [HDFS](../sql-manual/sql-functions/table-valued-functions/hdfs.md): Supports file analysis on HDFS.

* [FILE](../sql-manual/sql-functions/table-valued-functions/file.md): Unified table function, which can support reading S3/HDFS/Local files at the same time. (Supported since version 3.1.0.)

## Basic Usage

Here we illustrate how to analyze files on object storage using the S3 Table Value Function as an example.
Expand Down
126 changes: 126 additions & 0 deletions docs/sql-manual/sql-functions/table-valued-functions/file.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
---
{
"title": "FILE",
"language": "en"
}
---

<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

## Description

The File table-valued-function (tvf) is a wrapper around table functions like [S3](./s3.md), [HDFS](./hdfs.md), and [LOCAL](local.md), providing a unified interface to access file contents on different storage systems.

This function is supported since version 3.1.0.

## Syntax

```sql
FILE(
{StorageProperties},
{FileFormatProperties}
)
```

- `{StorageProperties}`

The StorageProperties section is used to fill in connection and authentication information related to the storage system. For details, please refer to the [Supported Storage Systems] section.

- `{FileFormatProperties}`

The FileFormatProperties section is used to fill in properties related to file formats, such as CSV delimiters. For details, please refer to the [Supported File Formats] section.

## Supported Storage Systems

* [ hdfs](../../../lakehouse/storages/hdfs.md)

* [ aws s3](../../../lakehouse/storages/s3.md)

* [ google cloud storage](../../../lakehouse/storages/gcs.md)

* [ Alibaba Cloud OSS](../../../lakehouse/storages/aliyun-oss.md)

* [ Tencent Cloud COS](../../../lakehouse/storages/tencent-cos.md)

* [ Huawei Cloud OBS](../../../lakehouse/storages/huawei-obs.md)

* [ MINIO](../../../lakehouse/storages/minio.md)

## Supported File Formats

* [Parquet](../../../lakehouse/file-formats/parquet.md)

* [ORC](../../../lakehouse/file-formats/orc.md)

* [Text/CSV/JSON](../../../lakehouse/file-formats/text.md)

## Examples

### Accessing S3 Storage

```sql
select * from file(
"fs.s3.support" = "true",
"uri" = "s3://bucket/file.csv",
"s3.access_key" = "ak",
"s3.secret_key" = "sk",
"s3.endpoint" = "endpoint",
"s3.region" = "region",
"format" = "csv"
);
```

### Accessing HDFS Storage

```sql
select * from file(
"fs.hdfs.support" = "true",
"uri" = "hdfs://path/to/file.csv",
"fs.defaultFS" = "hdfs://localhost:9000",
"hadoop.username" = "doris",
"format" = "csv"
);
```

### Accessing Local Storage

```sql
select * from file(
"fs.local.support" = "true",
"file_path" = "student.csv",
"backend_id" = "10003",
"format" = "csv"
);
```

### Using desc function to View Table Structure

```sql
desc function file(
"fs.s3.support" = "true",
"uri" = "s3://bucket/file.csv",
"s3.access_key" = "ak",
"s3.secret_key" = "sk",
"s3.endpoint" = "endpoint",
"s3.region" = "region",
"format" = "csv"
);
```

Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,8 @@ under the License.

* [HDFS](../sql-manual/sql-functions/table-valued-functions/hdfs.md):支持 HDFS 上的文件分析。

* [FILE](../sql-manual/sql-functions/table-valued-functions/file.md):统一表函数,可以同时支持 S3/HDFS/Local 文件的读取。(自 3.1.0 版本支持。)

## 基础使用

这里我们通过 S3 Table Value Function 举例说明如何对对象存储上的文件进行分析。
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
---
{
"title": "FILE",
"language": "zh-CN"
}
---

<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

## 描述

File 表函数(table-valued-function,tvf)是对 [S3](./s3.md)、[HDFS](./hdfs.md) 和 [LOCAL](local.md) 等表函数的封装,提供了一个统一的接口来访问不同存储系统上的文件内容。

该函数自 3.1.0 版本支持。

## 语法

```sql
FILE(
{StorageProperties},
{FileFormatProperties}
)
```

- `{StorageProperties}`

StorageProperties 部分用于填写存储系统相关的连接和认证信息。具体可参阅【支持的存储系统】部分。

- `{FileFormatProperties}`

FileFormatProperties 部分用于填写文件格式相关的属性,如 CSV 的分割符等。具体可参阅【支持的文件格式】部分。

## 支持的存储系统

* [ hdfs](../../../lakehouse/storages/hdfs.md)

* [ aws s3](../../../lakehouse/storages/s3.md)

* [ google cloud storage](../../../lakehouse/storages/gcs.md)

* [ 阿里云 OSS](../../../lakehouse/storages/aliyun-oss.md)

* [ 腾讯云 COS](../../../lakehouse/storages/tencent-cos.md)

* [ 华为云 OBS](../../../lakehouse/storages/huawei-obs.md)

* [ MINIO](../../../lakehouse/storages/minio.md)

## 支持的文件格式

* [Parquet](../../../lakehouse/file-formats/parquet.md)

* [ORC](../../../lakehouse/file-formats/orc.md)

* [Text/CSV/JSON](../../../lakehouse/file-formats/text.md)

## 示例

### 访问 S3 存储

```sql
select * from file(
"fs.s3.support" = "true",
"uri" = "s3://bucket/file.csv",
"s3.access_key" = "ak",
"s3.secret_key" = "sk",
"s3.endpoint" = "endpoint",
"s3.region" = "region",
"format" = "csv"
);
```

### 访问 HDFS 存储

```sql
select * from file(
"fs.hdfs.support" = "true",
"uri" = "hdfs://path/to/file.csv",
"fs.defaultFS" = "hdfs://localhost:9000",
"hadoop.username" = "doris",
"format" = "csv"
);
```

### 访问本地存储

```sql
select * from file(
"fs.local.support" = "true",
"file_path" = "student.csv",
"backend_id" = "10003",
"format" = "csv"
);
```

### 使用 desc function 查看表结构

```sql
desc function file(
"fs.s3.support" = "true",
"uri" = "s3://bucket/file.csv",
"s3.access_key" = "ak",
"s3.secret_key" = "sk",
"s3.endpoint" = "endpoint",
"s3.region" = "region",
"format" = "csv"
);
```
3 changes: 2 additions & 1 deletion sidebars.json
Original file line number Diff line number Diff line change
Expand Up @@ -1632,6 +1632,7 @@
"type": "category",
"label": "Table Valued Functions",
"items": [
"sql-manual/sql-functions/table-valued-functions/file",
"sql-manual/sql-functions/table-valued-functions/s3",
"sql-manual/sql-functions/table-valued-functions/hdfs",
"sql-manual/sql-functions/table-valued-functions/local",
Expand Down Expand Up @@ -2202,4 +2203,4 @@
]
}
]
}
}