Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release note 0.15.0 #6806

Closed
morningman opened this issue Oct 10, 2021 · 13 comments
Closed

Release note 0.15.0 #6806

morningman opened this issue Oct 10, 2021 · 13 comments

Comments

@morningman
Copy link
Contributor

morningman commented Oct 10, 2021

Highlight

Multi-tenant & Resource isolation

Now users can divide BE nodes in a Doris cluster into multiple resource groups by means of resource tags, so that online and offline business can be unified management and node-level resource isolation.
At the same time, the resource overhead of a single query can be controlled by limiting the CPU, memory overhead, and complexity of a single query task, thereby reducing the problem of resource preemption between different queries.
For details, please refer to Usage Document.

#6159 #6203 #6443

Performance optimization

Added Runtime Filter (#6121) and Join Reorder (#6226) functions.

The Runtime Filter function filters the data of the left table by using the join key column condition of the right table in the join operator, which can significantly improve query efficiency in most join scenarios.
For example, under the Star Schema Benchmark (TPCH's condensed test set), a 2-10 times performance improvement can be obtained.
For details, please refer to Usage Document.

The Join Reorder function can automatically help adjust the order of joins in SQL through the cost model to help obtain the best join efficiency.
It can be turned on through the conference return variable set enable_cost_based_join_reorder=true.

New features

  1. Support direct connection to Canal Server to synchronize MySQL binlog data. For details, please refer to Use Document. ([New Feature] Support synchronizing MySQL binlog in real time [stage 1] #6289)
  2. Support String column type, length range 1-2GB (Support long text type STRING with a maximum length of 2GB #6391)
  3. Support the List partition function, which can create partitions for enumerated values. ([Feature] add list partition support #5529)
  4. Support transactional insert statement function. You can import data in batches by begin; insert;,insert;,...; commit; (Add transaction for the operation of insert #6244 #6245)
  5. Support the update statement function on the Unique Key model. The update set where statement can be executed on the Unique Key model table ([Update] Support update syntax #6230)
  6. Support SQL blocking list function. The execution of some SQL can be prevented through regular, hash value matching, etc. (ADD: support sql block rule #6192)
  7. Support LDAP login authentication ([Feature][LDAP] Add LDAP authentication login and LDAP group authorization support. #6333)

Extensions

  1. Support Flink-Doris-Connector ([Feature] Flink Doris Connector (#5372) #5375)

    For details, please refer to Usage Document

  2. Support DataX doriswriter plug-in (feat: Implementation Datax doriswriter plugin #6107)

    For details, please refer to Usage Document

  3. Spark-Doris-Connector supports data writing to Doris. ([feature]:support spark connector sink data to doris #6256)

Optimization

Query

Supports the calculation of all constant expressions by using BE's function calculation ability in the SQL query planning stage (#6233).

Load

  1. When importing text format files, specify multi-byte row and column separators or invisible separators ([Load] Support multi bytes LineDelimiter and ColumnSeparator #5462 support unescape some invisible char in separator #5524)
  2. Support for importing compressed format files through StreamLoad ([Load]Support compressed csv file in stream load #5463)
  3. Stream Load supports importing json data in multi-line format (5774)

Export

  1. Support the Export function to specify where filter conditions. Support exporting files to use multi-byte row and column separators. Support export to local file ([Export] Expand function of export stmt #5445)
  2. The Export export function supports exporting only the specified columns. (Data export function, add export to specify certain columns #5689)
  3. Support to export the result set to local disk through outfile statement, and support to write the successfully exported mark file after export ([Outfile] Support exporting query result to local disk #5489)

Utilities

  1. The dynamic partition function supports the creation and retention of designated historical partitions, and supports automatic cold and hot data migration settings. ([Feature] Support create history dynamic partition #5703 [DynamicPartition] Support specifying hot data partition #5877 [Dynamic Partition] reserve specific history periods by dynamic partition. #6554)
  2. Support the use of a visual tree structure to display queries, imported plans and profiles on the command line. ([Profile] Visualize the query plan and query profile #5475 [Profile] Support show load profile for broker load job #6214)
  3. Support recording and viewing stream load operation logs. ([Audit][Stream Load] Support audit function for stream load #5452 [Feature][Stream Load] Add "show stream load" to show stream load record #5488)
  4. When consuming Kafka data through Routine Load, you can specify the time point for consumption. ([Feature][RoutineLoad] Support for consuming kafka from the point of time #5832)
  5. Support to export the creation statement of routine load through the show create routine load function. ([Feature] ADD: show create routine load #6110)
  6. Support to start and stop all routine load jobs with one key via the pause/resume all routine load command. ([RoutineLoad] Support pause or resume all routine load jobs #6394)
  7. Support to modify routine load broker list and topic through alter routine load statement ([RoutineLoad] Support alter broker list and topic for kafka routine load #6335)
  8. Support create table as select function. (Support create table as select #6102)
  9. Support to modify column comments and table comments through the alter table command. ([Alter] Support alter table and column's comment #6387)
  10. show tablet status adds table creation time and data update time (Add update time to show table status #6117)
  11. Support the show data skew command to view the data distribution of the table to troubleshoot data skew problems. ([Feature] Support SHOW DATA SKEW stmt #6219)
  12. Support the show/clean trash command to view the disk occupancy of the BE file recycle bin and actively clear it ([Feature] Support for querying the trash used capacity #6247 [Feature] Support for cleaning the trash actively #6323)
  13. Support showing which views a table is referenced by the show view statement. ([Feature] Support show view statement for table #5813)

New builtin function

  1. bitmap_min, bit_length ([Function] Add BE udf bitmap_min (#2538) #5581 [Feature] support bit_length function #6140)
  2. yearweek, week, makedate ([Function] Support date function: yearweek(), week(), makedate().  #6000)
  3. Percentile exact percentile function ([Feature] Support exact percentile aggregate function #6410)
  4. json_array, json_object, json_quote ([Feature] Support for storage layer benchmark #6506)
  5. Supports the creation of custom public keys for the AES_ENCRYPT and AES_DECRYPT functions. (6115)
  6. Support creating function alias through create alias function to combine multiple functions. ([Feature] Support alias function #6261)

Others

  1. Support to access the ES appearance of the SSL connection protocol ([Doris On ES][WIP] Support external ES table with SSL secured and configurable node sniffing #5325)
  2. Support to specify the number of hot partitions in the dynamic partition properties, and the hot partitions will be stored in the SSD disk. ([DynamicPartition] Support specifying hot data partition #5877)
  3. Supports importing json format data through Broker Load. ([BrokerLoad] Support read properties for broker load when read data #5845)
  4. Supports directly accessing hdfs through the libhdfs3 library for data import and export without the need for a broker process. (Support read data with format of parquet from hdfs, using libhdfs3 #5686)
  5. The select into outfile function supports export parquet file format, and supports parallel export ([Feature] Select outfile support parquet format #5938 Support concurrent export of query results #6539)
  6. ODBC external table supports SQLServer. (6223)

Contributors

Thanks for all contributors who contrubute to this release:

@924060929
@acelyc111
@Aimiyoo
@amosbird
@arthur-zhang
@azurenake
@BiteTheDDDDt
@caiconghui
@caneGuy
@caoliang-web
@ccoffline
@chaplinthink
@chovy-3012
@ChPi
@copperybean
@crazyleeyang
@dh-cloud
@DinoZhang
@dixingxing0
@dohongdayi
@e0c9
@EmmyMiao87
@eyesmoons
@francisoliverlee
@Gabriel39
@gaodayue
@GoGoWen
@HappenLee
@harveyyue
@Henry2SS
@hf200012
@huangmengbin
@huozhanfeng
@huzk8
@hxianshun
@ikaruga4600
@JameyWoo
@jennifer88huang
@JinLiOnline
@jinyuanlu
@JNSimba
@killxdcj
@kuncle
@liutang123
@luozenglin
@luzhijing
@MarsXDM
@mh-boy
@mk8310
@morningman
@Myasuka
@nimuyuhan
@pan3793
@PatrickNicholas
@pengxiangyu
@pierre94
@qidaye
@qzsee
@shiyi23
@smallhibiscus
@songenjie
@spaces-X
@stalary
@stdpain
@Stephen-Robin
@Sunt-ing
@Taaang
@tarepanda1024
@tianhui5
@tinkerrrr
@TobKed
@ucasfl
@Userwhite
@vinson0526
@wangbo
@wangliansong
@wangshuo128
@weajun
@weihongkai2008
@weizuo93
@WindyGao
@wunan1210
@wuyunfeng
@xhmz
@xiaokangguo
@xiaoxiaopan118
@xinghuayu007
@xinyiZzz
@xuliuzhe
@xxiao2018
@xy720
@yangzhg
@yx91490
@zbtzbtzbt
@zenoyang
@zh0122
@zhangboya1
@zhangstar333
@zuochunwei

@morningman
Copy link
Contributor Author

morningman commented Oct 10, 2021

Highlight

资源划分与隔离

现在用户可以通过资源标签的方式将一个 Doris 集群中的 BE 节点划分成多个资源组,从而可以进行在线、离线业务的统一管理和节点级别的资源隔离。
同时,还可以通过限制单个查询任务的 CPU、内存开销以及复杂度,来控制单个查询的资源开销,从而降低不同查询之间的资源抢占问题。
具体可参阅 使用文档

#6159 #6203 #6443

性能优化

新增 Runtime Filter(#6121) 及 Join Reorder(#6226) 功能。

Runtime Filter 功能通过使用join算子中右表的join key 列条件来过滤左表的数据,在大部分join场景下可以显著提升查询效率。
如在 Star Schema Benchmark(TPCH 的精简测试集) 下可以获得2-10倍的性能提升。
具体可参阅 使用文档

Join Reorder 功能可以通过通过代价模型自动帮助调整SQL中join的顺序,以帮助获得最优的 join 效率。
可通过会还变量 set enable_cost_based_join_reorder=true 开启。

新增功能

  1. 支持直接对接 Canal Server 同步 MySQL binlog 数据。具体可参阅 使用文档。([New Feature] Support synchronizing MySQL binlog in real time [stage 1] #6289)
  2. 支持 String 列类型,长度范围 1-2GB(Support long text type STRING with a maximum length of 2GB #6391
  3. 支持 List 分区功能,可以针对枚举值创建分区。([Feature] add list partition support #5529
  4. 支持事务性 insert 语句功能。可以通过 begin; insert;,insert;,...; commit; 的方式批量导入数据(Add transaction for the operation of insert #6244 #6245
  5. 支持在 Unique Key 模型上的 update 语句功能。可以在 Unique Key 模型表上执行 update set where 语句([Update] Support update syntax #6230
  6. 支持 SQL 阻塞名单功能。可以通过正则、hash值匹配等方式阻止部分 SQL 的执行(ADD: support sql block rule #6192
  7. 支持 LDAP 登陆验证([Feature][LDAP] Add LDAP authentication login and LDAP group authorization support. #6333

扩展功能

  1. 支持 Flink-Doris-Connector([Feature] Flink Doris Connector (#5372) #5375

    具体可参阅 使用文档

  2. 支持 DataX doriswriter 插件(feat: Implementation Datax doriswriter plugin #6107

    具体可参阅 使用文档

  3. Spark-Doris-Connector 支持数据写入Doris。([feature]:support spark connector sink data to doris #6256

功能优化

查询

支持在SQL查询规划阶段,利用BE的函数计算能力计算所有常量表达式(#6233)。

导入

  1. 支持导入文本格式文件时,指定多字节行列分隔符或不可见分隔符([Load] Support multi bytes LineDelimiter and ColumnSeparator #5462 support unescape some invisible char in separator #5524
  2. 支持通过StreamLoad导入压缩格式文件([Load]Support compressed csv file in stream load #5463
  3. Stream Load支持导入多行格式的json数据(5774)

导出

  1. 支持Export导出功能指定where过滤条件。支持导出文件使用多字节行列分隔符。支持导出到本地文件([Export] Expand function of export stmt #5445
  2. Export 导出功能支持仅导出指定的列。(Data export function, add export to specify certain columns #5689
  3. 支持通过outfile语句导出结果集到本地磁盘,并支持导出后写入导出成功的标记文件([Outfile] Support exporting query result to local disk #5489

易用性

  1. 动态分区功能支持创建、保留指定的历史分区、支持自动冷热数据迁移设置。([Feature] Support create history dynamic partition #5703 [DynamicPartition] Support specifying hot data partition #5877 [Dynamic Partition] reserve specific history periods by dynamic partition. #6554)
  2. 支持在命令行使用可视化的树形结构展示查询、导入的计划和Profile。([Profile] Visualize the query plan and query profile #5475 [Profile] Support show load profile for broker load job #6214
  3. 支持记录并查看stream load操作日志。([Audit][Stream Load] Support audit function for stream load #5452 [Feature][Stream Load] Add "show stream load" to show stream load record #5488
  4. 通过Routine Load消费Kafka数据时,可以指定时间点进行消费。([Feature][RoutineLoad] Support for consuming kafka from the point of time #5832
  5. 支持通过 show create routine load 功能导出routine load 的创建语句。([Feature] ADD: show create routine load #6110
  6. 支持通过 pause/resume all routine load命令一键启停所有routine load job。([RoutineLoad] Support pause or resume all routine load jobs #6394
  7. 支持通过 alter routine load 语句修改 routine load 的 broker list 和 topic([RoutineLoad] Support alter broker list and topic for kafka routine load #6335
  8. 支持create table as select 功能。(Support create table as select #6102
  9. 支持通过alter table 命令修改列注释和表注释。([Alter] Support alter table and column's comment #6387
  10. show tablet status 增加表创建时间、数据更新时间(Add update time to show table status #6117
  11. 支持通过 show data skew 命令查看表的数据量分布,以排查数据倾斜问题。([Feature] Support SHOW DATA SKEW stmt #6219
  12. 支持通过 show/clean trash 命令查看BE文件回收站的磁盘占用情况并主动清除([Feature] Support for querying the trash used capacity #6247 [Feature] Support for cleaning the trash actively #6323
  13. 支持通过show view语句展示一个表被哪些视图所引用。([Feature] Support show view statement for table #5813

新增函数

  1. bitmap_min,bit_length([Function] Add BE udf bitmap_min (#2538) #5581 [Feature] support bit_length function #6140
  2. yearweek,week,makedate([Function] Support date function: yearweek(), week(), makedate().  #6000
  3. percentile 精确百分位函数([Feature] Support exact percentile aggregate function #6410
  4. json_array,json_object,json_quote([Feature] Support for storage layer benchmark #6506
  5. 支持为AES_ENCRYPT和AES_DECRYPT函数创建自定义公钥。(6115)
  6. 支持通过create alias function 创建函数别名来组合多个函数。([Feature] Support alias function #6261

其他

  1. 支持访问SSL连接协议的ES外表([Doris On ES][WIP] Support external ES table with SSL secured and configurable node sniffing #5325)
  2. 支持在动态分区属性中指定热点分区的数量,热点分区将存储在SSD磁盘中。([DynamicPartition] Support specifying hot data partition #5877)
  3. 支持通过 Broker Load 导入json格式数据。([BrokerLoad] Support read properties for broker load when read data #5845
  4. 支持直接通过 libhdfs3 库访问 hdfs 进行数据的导入导出,而不需要 broker 进程。(Support read data with format of parquet from hdfs, using libhdfs3 #5686)
  5. select into outfile 功能支持导出 parquet 文件格式,并支持并行导出([Feature] Select outfile support parquet format #5938 Support concurrent export of query results #6539
  6. ODBC外表支持SQLServer。(6223)

@hectorhe001
Copy link

请问大概什么时候 Release?比较期待这个功能:Spark-Doris-Connector 支持数据写入Doris!😁

@morningman
Copy link
Contributor Author

请问大概什么时候 Release?比较期待这个功能:Spark-Doris-Connector 支持数据写入Doris!😁

You can download the trunk code and compile the spark-doris-connector in extensions/ dir to try it now.

@hf200012
Copy link
Contributor

hf200012 commented Oct 29, 2021 via email

@hectorhe001
Copy link

Spark Doris Connector now supports writing. It also supports Spark 2.x and 3.x versions. You can pull the code from the master branch and use the Docker 1.3.1 image to compile, because the 1.4.1 image thrift has been upgraded to 0.13.0 Version, compile command Spark 2.x: sh build.sh 2 Spark 3.x: sh build.sh 3 Mingyu Chen @.***> 于2021年10月27日周三 下午11:28写道:

请问大概什么时候 Release?比较期待这个功能:Spark-Doris-Connector 支持数据写入Doris!😁 You can download the trunk code and compile the spark-doris-connector in extensions/ dir to try it now. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#6806 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGIITOROUG42UBWBGHRJE3UJALATANCNFSM5FWRVWZA .
-- 张家峰


OK,我再试一下,可能之前版本不对,感谢!

@htyoung
Copy link
Contributor

htyoung commented Nov 3, 2021

请问大概什么时候 Release?比较期待多租户与资源隔离的功能

@morningman
Copy link
Contributor Author

请问大概什么时候 Release?比较期待多租户与资源隔离的功能

You can try this now: https://github.com/apache/incubator-doris/tree/0.15.0-rc01
Not officially released.

@htyoung
Copy link
Contributor

htyoung commented Nov 4, 2021

请问大概什么时候 Release?比较期待多租户与资源隔离的功能

You can try this now: https://github.com/apache/incubator-doris/tree/0.15.0-rc01 Not officially released.

thinks very much

@tinkerrrr
Copy link
Contributor

@morningman Will vectorization query optimization show up in this release?

@morningman morningman changed the title Release note 0.15.0 (WIP) Release note 0.15.0 Nov 16, 2021
@morningman
Copy link
Contributor Author

@morningman Will vectorization query optimization show up in this release?

No, it is still WIP. But you can try part of this feature by using Palo(powered by Doris) here: palo.baidu.com, version 0.15.1-rc09

@SeaAndHillMe
Copy link

@lordk911
Copy link

Spark Doris Connector build faild with spark3 #7363

@caoliang-web
Copy link
Contributor

caoliang-web commented Dec 10, 2021 via email

@morningman morningman unpinned this issue Mar 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants