Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 14 additions & 1 deletion docs/table-design/row-store.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,12 +24,14 @@ specific language governing permissions and limitations
under the License.
-->

# Hybrid Storage
## Hybrid Storage

Doris defaults to columnar storage, where each column is stored contiguously. Columnar storage offers excellent performance for analytical scenarios (such as aggregation, filtering, sorting, etc.), as it only reads the necessary columns, reducing unnecessary IO. However, in point query scenarios (such as `SELECT *`), all columns need to be read, requiring an IO operation for each column, which can lead to IOPS becoming a bottleneck, especially for wide tables with many columns (e.g., hundreds of columns).

To address the IOPS bottleneck in point query scenarios, starting from version 2.0.0, Doris supports hybrid storage. When users create tables, they can specify whether to enable row storage. With row storage enabled, each row only requires one IO operation for point queries (such as `SELECT *`), significantly improving performance.

The principle of row storage is that an additional column is added during storage. This column concatenates all the columns of the corresponding row and stores them using a special binary format.

## Syntax

When creating a table, specify whether to enable row storage, which columns to enable row storage for, and the storage compression unit size page_size in the table's PROPERTIES.
Expand Down Expand Up @@ -77,4 +79,15 @@ PROPERTIES (
);
```

Query
```
SELECT key, v1, v3, v5, v7 FROM tbl_point_query WHERE key = 100;
```

For more information on point query usage, please refer to [High-Concurrent Point Query](../query-acceleration/high-concurrent-point-query).


## Notice

1. Enabling row storage will increase the storage space used. The increase in storage space is related to the data characteristics and is generally 2 to 10 times the size of the original table. The exact space usage needs to be tested with actual data.
2. The `page_size` of row storage also affects the storage space. You can adjust it based on the previous table attribute parameter `row_store_page_size`.
Original file line number Diff line number Diff line change
Expand Up @@ -24,12 +24,13 @@ specific language governing permissions and limitations
under the License.
-->


## 行列混存介绍

Doris 默认采用列式存储,每个列连续存储,在分析场景(如聚合,过滤,排序等)有很好的性能,因为只需要读取所需要的列减少不必要的 IO。但是在点查场景(比如 `SELECT *`),需要读取所有列,每个列都需要一次 IO 导致 IOPS 成为瓶颈,特别对列多的宽表(比如上百列)尤为明显。

为了解决点查场景 IOPS 的瓶颈问题,Doris 2.0.0 版本开始支持行列混存,用户建表时指定开启行存后,点查(比如 `SELECT *`)每一行只需要一次 IO,性能有数量级提升
为了解决点查场景 IOPS 的瓶颈问题,Doris 2.0.0 版本开始支持行列混存,用户建表时指定开启行存后,点查(比如 `SELECT *`)每一行只需要一次 IO,在宽表列很多的情况下性能有数量级提升

行存的原理是在存储时增加了一个额外的列,这个列将对应行的所有列拼接起来采用特殊的二进制格式存储。

## 使用语法

Expand All @@ -53,7 +54,7 @@ Doris 默认采用列式存储,每个列连续存储,在分析场景(如
page 是存储读写的最小单元,page_size 是行存 page 的大小,也就是说读一行也需要产生一个 page 的 IO。这个值越大压缩效果越好存储空间占用越低,但是点查时 IO 开销越大性能越低(因为一次 IO 至少读一个 page),反过来值越小存储空间约高,点查性能越好。默认值 16KB 是大多数情况下比较均衡的选择,如果更偏向查询性能可以配置较小的值比如 4KB 甚至更低,如果更偏向存储空间可以配置较大的值比如 64KB 甚至更高。


## 使用实例
## 使用示例

下面的例子创建一个 8 列的表,其中 "key,v1,v3,v5,v7" 这 5 列开启行存,为了高并发点查性能配置 page_size 为 4KB。

Expand All @@ -79,4 +80,15 @@ PROPERTIES (
);
```

查询
```
SELECT key, v1, v3, v5, v7 FROM tbl_point_query WHERE key = 100;
```

更多点查的使用请参考 [高并发点查](../query-acceleration/high-concurrent-point-query) 。


## 注意事项

1. 开启行存后占用的存储空间会增加,存储空间的增加和数据特点有关,一般是原来表的 2 到 10 倍,具体空间占用需要使用实际数据测试。
2. 行存的 page_size 对存储空间的也有影响,可以根据前面的表属性参数 `row_store_page_size` 说明进行调整。
Original file line number Diff line number Diff line change
Expand Up @@ -24,21 +24,23 @@ specific language governing permissions and limitations
under the License.
-->


## 行列混存介绍

Doris 默认采用列式存储,每个列连续存储,在分析场景(如聚合,过滤,排序等)有很好的性能,因为只需要读取所需要的列减少不必要的 IO。但是在点查场景(比如 `SELECT *`),需要读取所有列,每个列都需要一次 IO 导致 IOPS 成为瓶颈,特别对列多的宽表(比如上百列)尤为明显。

为了解决点查场景 IOPS 的瓶颈问题,Doris 2.0.0 版本开始支持行列混存,用户建表时指定开启行存后,点查(比如 `SELECT *`)每一行只需要一次 IO,性能有数量级提升
为了解决点查场景 IOPS 的瓶颈问题,Doris 2.0.0 版本开始支持行列混存,用户建表时指定开启行存后,点查(比如 `SELECT *`)每一行只需要一次 IO,在宽表列很多的情况下性能有数量级提升

行存的原理是在存储时增加了一个额外的列,这个列将对应行的所有列拼接起来采用特殊的二进制格式存储。

## 使用语法

建表时在表的 PROPERTIES 中指定是否开启行存,默认为 false 不开启
建表时在表的 PROPERTIES 中指定是否开启行存
```
"store_row_column" = "true"
```

## 使用实例

## 使用示例

下面的例子创建一个 8 列的表,开启行存。

Expand All @@ -59,8 +61,19 @@ DISTRIBUTED BY HASH(`key`) BUCKETS 1
PROPERTIES (
"enable_unique_key_merge_on_write" = "true",
"light_schema_change" = "true",
"store_row_column" = "true"
"store_row_column" = "true",
"row_store_page_size" = "4096"
);
```

更多点查的使用请参考 [高并发点查](../query/high-concurrent-point-query) 。
查询
```
SELECT * FROM tbl_point_query WHERE key = 100;
```

更多点查的使用请参考 [高并发点查](../query-acceleration/high-concurrent-point-query) 。


## 注意事项

1. 开启行存后占用的存储空间会增加,存储空间的增加和数据特点有关,一般是原来表的 2 到 10 倍,具体空间占用需要使用实际数据测试。
Original file line number Diff line number Diff line change
Expand Up @@ -24,38 +24,34 @@ specific language governing permissions and limitations
under the License.
-->


## 行列混存介绍

Doris 默认采用列式存储,每个列连续存储,在分析场景(如聚合,过滤,排序等)有很好的性能,因为只需要读取所需要的列减少不必要的 IO。但是在点查场景(比如 `SELECT *`),需要读取所有列,每个列都需要一次 IO 导致 IOPS 成为瓶颈,特别对列多的宽表(比如上百列)尤为明显。

为了解决点查场景 IOPS 的瓶颈问题,Doris 2.0.0 版本开始支持行列混存,用户建表时指定开启行存后,点查(比如 `SELECT *`)每一行只需要一次 IO,性能有数量级提升
为了解决点查场景 IOPS 的瓶颈问题,Doris 2.0.0 版本开始支持行列混存,用户建表时指定开启行存后,点查(比如 `SELECT *`)每一行只需要一次 IO,在宽表列很多的情况下性能有数量级提升

行存的原理是在存储时增加了一个额外的列,这个列将对应行的所有列拼接起来采用特殊的二进制格式存储。

## 使用语法

建表时在表的 PROPERTIES 中指定是否开启行存,哪些列开启行存,行存的存储压缩单元大小 page_size。
建表时在表的 PROPERTIES 中指定是否开启行存,行存的存储压缩单元大小 page_size。

1. 是否开启行存:默认为 false 不开启
```
"store_row_column" = "true"
```

2. 哪些列开启行存:如果 `"store_row_column" = "true"`,默认所有列开启行存,若需要指定部分列开启行存,设置 row_store_columns 参数,格式为逗号分割的列名
```
"row_store_columns" = "column1,column2,column3"
```

3. 行存 page_size:默认为 16KB。
1. 行存 page_size:默认为 16KB。
```
"row_store_page_size" = "16384"
```

page 是存储读写的最小单元,page_size 是行存 page 的大小,也就是说读一行也需要产生一个 page 的 IO。这个值越大压缩效果越好存储空间占用越低,但是点查时 IO 开销越大性能越低(因为一次 IO 至少读一个 page),反过来值越小存储空间约高,点查性能越好。默认值 16KB 是大多数情况下比较均衡的选择,如果更偏向查询性能可以配置较小的值比如 4KB 甚至更低,如果更偏向存储空间可以配置较大的值比如 64KB 甚至更高。


## 使用实例
## 使用示例

下面的例子创建一个 8 列的表,其中 "key,v1,v3,v5,v7" 这 5 列开启行存,为了高并发点查性能配置 page_size 为 4KB。
下面的例子创建一个 8 列的表,开启行存,为了高并发点查性能配置 page_size 为 4KB。

```
CREATE TABLE `tbl_point_query` (
Expand All @@ -74,9 +70,20 @@ DISTRIBUTED BY HASH(`key`) BUCKETS 1
PROPERTIES (
"enable_unique_key_merge_on_write" = "true",
"light_schema_change" = "true",
"row_store_columns" = "key,v1,v3,v5,v7",
"store_row_column" = "true",
"row_store_page_size" = "4096"
);
```

查询
```
SELECT * FROM tbl_point_query WHERE key = 100;
```

更多点查的使用请参考 [高并发点查](../query-acceleration/high-concurrent-point-query) 。


## 注意事项

1. 开启行存后占用的存储空间会增加,存储空间的增加和数据特点有关,一般是原来表的 2 到 10 倍,具体空间占用需要使用实际数据测试。
2. 行存的 page_size 对存储空间的也有影响,可以根据前面的表属性参数 `row_store_page_size` 说明进行调整。
Original file line number Diff line number Diff line change
Expand Up @@ -24,12 +24,13 @@ specific language governing permissions and limitations
under the License.
-->


## 行列混存介绍

Doris 默认采用列式存储,每个列连续存储,在分析场景(如聚合,过滤,排序等)有很好的性能,因为只需要读取所需要的列减少不必要的 IO。但是在点查场景(比如 `SELECT *`),需要读取所有列,每个列都需要一次 IO 导致 IOPS 成为瓶颈,特别对列多的宽表(比如上百列)尤为明显。

为了解决点查场景 IOPS 的瓶颈问题,Doris 2.0.0 版本开始支持行列混存,用户建表时指定开启行存后,点查(比如 `SELECT *`)每一行只需要一次 IO,性能有数量级提升
为了解决点查场景 IOPS 的瓶颈问题,Doris 2.0.0 版本开始支持行列混存,用户建表时指定开启行存后,点查(比如 `SELECT *`)每一行只需要一次 IO,在宽表列很多的情况下性能有数量级提升

行存的原理是在存储时增加了一个额外的列,这个列将对应行的所有列拼接起来采用特殊的二进制格式存储。

## 使用语法

Expand All @@ -53,7 +54,7 @@ Doris 默认采用列式存储,每个列连续存储,在分析场景(如
page 是存储读写的最小单元,page_size 是行存 page 的大小,也就是说读一行也需要产生一个 page 的 IO。这个值越大压缩效果越好存储空间占用越低,但是点查时 IO 开销越大性能越低(因为一次 IO 至少读一个 page),反过来值越小存储空间约高,点查性能越好。默认值 16KB 是大多数情况下比较均衡的选择,如果更偏向查询性能可以配置较小的值比如 4KB 甚至更低,如果更偏向存储空间可以配置较大的值比如 64KB 甚至更高。


## 使用实例
## 使用示例

下面的例子创建一个 8 列的表,其中 "key,v1,v3,v5,v7" 这 5 列开启行存,为了高并发点查性能配置 page_size 为 4KB。

Expand All @@ -79,4 +80,15 @@ PROPERTIES (
);
```

查询
```
SELECT key, v1, v3, v5, v7 FROM tbl_point_query WHERE key = 100;
```

更多点查的使用请参考 [高并发点查](../query-acceleration/high-concurrent-point-query) 。


## 注意事项

1. 开启行存后占用的存储空间会增加,存储空间的增加和数据特点有关,一般是原来表的 2 到 10 倍,具体空间占用需要使用实际数据测试。
2. 行存的 page_size 对存储空间的也有影响,可以根据前面的表属性参数 `row_store_page_size` 说明进行调整。
21 changes: 16 additions & 5 deletions versioned_docs/version-2.0/table-design/row-store.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,23 +24,24 @@ specific language governing permissions and limitations
under the License.
-->

# Hybrid Storage
## Hybrid Storage

Doris defaults to columnar storage, where each column is stored contiguously. Columnar storage offers excellent performance for analytical scenarios (such as aggregation, filtering, sorting, etc.), as it only reads the necessary columns, reducing unnecessary IO. However, in point query scenarios (such as `SELECT *`), all columns need to be read, requiring an IO operation for each column, which can lead to IOPS becoming a bottleneck, especially for wide tables with many columns (e.g., hundreds of columns).

To address the IOPS bottleneck in point query scenarios, starting from version 2.0.0, Doris supports hybrid storage. When users create tables, they can specify whether to enable row storage. With row storage enabled, each row only requires one IO operation for point queries (such as `SELECT *`), significantly improving performance.

The principle of row storage is that an additional column is added during storage. This column concatenates all the columns of the corresponding row and stores them using a special binary format.

## Syntax

When creating a table, specify whether to enable row storage, defaults to false (not enabled).
When creating a table, specify whether to enable row storage in the table's PROPERTIES.
```
"store_row_column" = "true"
```


## Example

The example below creates an 8-column table, where row storage is enabled.
The example below creates an 8-column table with row storage enabled.

```
CREATE TABLE `tbl_point_query` (
Expand All @@ -63,4 +64,14 @@ PROPERTIES (
);
```

For more information on point query usage, please refer to [High-Concurrent Point Query](../query/high-concurrent-point-query).
Query
```
SELECT * FROM tbl_point_query WHERE key = 100;
```

For more information on point query usage, please refer to [High-Concurrent Point Query](../query-acceleration/high-concurrent-point-query).


## Notice

Enabling row storage will increase the storage space used. The increase in storage space is related to the data characteristics and is generally 2 to 10 times the size of the original table. The exact space usage needs to be tested with actual data.
28 changes: 18 additions & 10 deletions versioned_docs/version-2.1/table-design/row-store.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,27 +24,24 @@ specific language governing permissions and limitations
under the License.
-->

# Hybrid Storage
## Hybrid Storage

Doris defaults to columnar storage, where each column is stored contiguously. Columnar storage offers excellent performance for analytical scenarios (such as aggregation, filtering, sorting, etc.), as it only reads the necessary columns, reducing unnecessary IO. However, in point query scenarios (such as `SELECT *`), all columns need to be read, requiring an IO operation for each column, which can lead to IOPS becoming a bottleneck, especially for wide tables with many columns (e.g., hundreds of columns).

To address the IOPS bottleneck in point query scenarios, starting from version 2.0.0, Doris supports hybrid storage. When users create tables, they can specify whether to enable row storage. With row storage enabled, each row only requires one IO operation for point queries (such as `SELECT *`), significantly improving performance.

The principle of row storage is that an additional column is added during storage. This column concatenates all the columns of the corresponding row and stores them using a special binary format.

## Syntax

When creating a table, specify whether to enable row storage, which columns to enable row storage for, and the storage compression unit size page_size in the table's PROPERTIES.
When creating a table, specify whether to enable row storage, and the storage compression unit size page_size in the table's PROPERTIES.

1. Whether to enable row storage: defaults to false (not enabled).
```
"store_row_column" = "true"
```

2. Which columns to enable row storage for:if `"store_row_column" = "true"`, all columns are enabled by default. If you need to specify that only some columns are enabled for row storage, set the row_store_columns parameter, formatted as a comma-separated list of column names.
```
"row_store_columns" = "column1,column2,column3"
```

3. Row storage page_size: defaults to 16KB.
1. Row storage page_size: defaults to 16KB.
```
"row_store_page_size" = "16384"
```
Expand All @@ -53,7 +50,7 @@ The page is the smallest unit of storage read/write operations, and page_size is

## Example

The example below creates an 8-column table, where "key,v1,v3,v5,v7" are the 5 columns enabled for row storage. To optimize for high-concurrency point query performance, the page_size is configured to 4KB.
The example below creates an 8-column table with row storage enabled. To optimize for high-concurrency point query performance, the page_size is configured to 4KB.

```
CREATE TABLE `tbl_point_query` (
Expand All @@ -72,9 +69,20 @@ DISTRIBUTED BY HASH(`key`) BUCKETS 1
PROPERTIES (
"enable_unique_key_merge_on_write" = "true",
"light_schema_change" = "true",
"row_store_columns" = "key,v1,v3,v5,v7",
"store_row_column" = "true",
"row_store_page_size" = "4096"
);
```

Query
```
SELECT * FROM tbl_point_query WHERE key = 100;
```

For more information on point query usage, please refer to [High-Concurrent Point Query](../query-acceleration/high-concurrent-point-query).


## Notice

1. Enabling row storage will increase the storage space used. The increase in storage space is related to the data characteristics and is generally 2 to 10 times the size of the original table. The exact space usage needs to be tested with actual data.
2. The `page_size` of row storage also affects the storage space. You can adjust it based on the previous table attribute parameter `row_store_page_size`.
Loading