Skip to content

Doris Roadmap 2024 #30669

Open
Open
@morningman

Description

@morningman

Roadmap 2023
Roadmap 2022

Separation of Storage and Computation

  • Flexibility & Stateless
    • Stateless BE node
    • Stateless FE node
  • Load Isolation
    • Multi cluster support
    • Read & write isolation
  • More storage support
    • AWS S3
    • Aliyun OSS
    • Tencent Cloud COS
    • Huawei Cloud OBS
    • Baidu Cloud BOS
    • GCP
    • Azure
    • HDFS
  • Performance
    • Optimized cache policy
    • Optimization for cold data querying
  • Support data deletion
  • SLA
    • Upgrade BE with no impact
    • Upgrade FE with no impact
  • Reliability
    • Snapshot & Time travel
    • Enhanced backup & restore
  • Data sharing

Async Materialized View

  • Build materialized view

    • Support full refresh
    • Support partition level refresh
    • Support building mv from olap table
    • Support building mv from hive table
    • Support building mv from iceberg table
    • Support building mv from hudi table
    • Nested materialized view with DAG
    • Incremental building for external table with partition granularity
    • Support partition rollup
    • Support partition TTL
    • Support REPLACE operation
    • Support refresh materialized view by time range
  • Transparent Rewriting

    • Support aggregation and rollup
    • Support join
    • Query Partial rewriting
    • Rewriting supports nested materialized view
  • Materialized view management

    • Materialized view recommendation

Semi Structure Data Analysis

  • Inverted Index

    • Support Inverted Index
    • Merging index files
    • Working with separation of storage and computation
    • Speed up the data loading with inverted index
  • VARIANT data type

    • Support VARIANT data type
    • Working with inverted index

Query Optimizer

  • Basic framework

    • Fully supports DQL, DML and DDL
    • Optimized memory consumption
    • Optimized apply order of RBO rules
    • Improved efficiency of Cascades enumeration
  • Planning quality

    • Statistics
      • Support statistical for synced materialized views
      • Support partition level statistics collection
      • Supports histogram statistics collection
    • New distributed cost model
      • Optimized distributed cost model framework
      • Support runtime cost revaluation
      • Supports more accurate operator cost fitting models
    • Rules and enumerations
      • Expand RBO rules
      • Improve the quality of Cascades enumeration plan
      • Enhanced dphyper enumeration framework function, supports outer join enumeration and CDC
    • Enhance runtime filter adaptive capability
      • Adaptive runtime filter size
      • Adaptive runtime filter type
      • Adaptive runtime filter waiting time
    • Supports histogram-based data skew adaptive processing framework

DataLake Analysis

  • Support more file format

    • RCFile
    • SequenceFile
  • Support more lake format

    • Support Iceberg with ORC
    • Support Iceberg Equality Delete
    • Support more systable on Hudi
    • Support CDC scan on Hudi
    • Support more systable on Paimon
  • Trino Connector compatibility

    • Trino Connector compatibility framework
    • Support Trino DeltaLake Connector
    • Support Trino Bigquery Connector
    • Support Trino Cassandra Connector
  • Datalake write back

    • Hive
      • Support unpartitioned table
      • Support partitioned table
      • Support INSERT OVERWRITE
      • Support INSERT
    • Iceberg
      • Support unpartitioned table
      • Support partitioned table
      • Support update and delete
    • Hudi
    • Paimon
  • Enhanced JDBC Catalog

    • Support DB2
    • Support sharded database
    • Support query concurrency
  • Enhanced file analysis

    • Support insert into table value function
  • Enhanced file cache

    • Support memory-level file cache
    • Enhanced cache statistic and hits analysis
  • Integrate with Apache Ranger

    • Support Catalog/Database/Table/Resource/WorkloadGroup auth
    • Support row policy
    • Support data mask
    • Support column level privilege
  • SQL dialect support

    • Presto/Trino
    • Spark
    • Hive
    • Clickhouse
    • Oracle
    • Postgres

Query Processing

  • Resource Isolation
    • Support hard/soft resource isolation for Query & Load
    • Enhance the visibility of resource usage
    • Automatically workload management at runtime
  • Support store procedure
  • Support Spill to disk
    • Sort Operator
    • Aggregate Operator
    • Join Operator
  • Working with shuffle service
  • Stage by stage query processing

Storage Engine

  • Data Loading
    • Support auto partiton when loading
    • Zero-ETL: Built-in data integration from OLTP CDC to Doris
    • Support transactional multi table INSERT INTO
    • Support MERGE INTO
  • Data Modeling
  • Cross cluster replication
    • Support Master/Slave switch
    • Support cross region deployment
    • Work with separation of storage and computation
  • Support data binlog
  • Enhanced Z-order index
  • Optimized high-concurrency point query

Ecosystem & Tools

  • Cluster Manager for Apache Doris
    • Support agent mode
    • Support k8s
    • Enhanced monitor and alert management
    • Visualized profile analysis
    • Support Notebook
    • Built-in visualized BI reports
  • Doris StreamLoader tool
  • Doris Operator
  • X2Doris
    • Support Hive to Doris
    • Support Doris to Doris
    • Support Kudu to Doris
    • Support StarRocks to Doris
    • Support Clickhouse to Doris
  • BI tools compatibility
    • Superset
    • Metabase
    • Navicat
    • Datagrip
    • Dbeaver
    • SmartBI
    • FineBI
  • Data Integration
    • Kettle

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions