Skip to content

Latest commit

 

History

History
89 lines (79 loc) · 5.49 KB

ROADMAP.md

File metadata and controls

89 lines (79 loc) · 5.49 KB

YDB Roadmap

Query Processor

  1. Support for Snapshot Readonly transactions mode
  2. Better resource management for KQP Resource Manager (share information about nodes resources, avoid OOMs)
  3. ✅ Switch to New Engine for OLTP queries
  4. ✅ Support not null for PK (primary key) table columns
  5. Aggregates and predicates push down to column-oriented tables
  6. Optimize data formats for data transition between query phases
  7. Index Rename/Rebuild
  8. KQP Session Actor as a replacement for KQP Worker Actor (optimize to reduce CPU usage)
  9. PostgreSQL compatibility
    • Support PostgreSQL datatypes serialization/deserialization in YDB Public API
    • PostgreSQL compatible query execution (TPC-C, TPC-H queries should work)
    • Support for PostgreSQL wire protocol
  10. Support a single Database connection string instead of multiple parameters
  11. Support constraints in query optimizer
  12. Query Processor 3.0 (a set of tasks to be more like traditional database in case of query execution functionality)
    • Support for Streaming Lookup Join via MVCC snapshots (avoid distributed transactions, scalability is better)
    • Universal API call for DML, DDL with unlimited results size (aka StreamExecuteQuery, which allows to execute each query)
    • Support for secondary indexes in ScanQuery
    • Transaction can see its own updates (updates made during transaction execution are not buffered in RAM anymore, but rather are written to disk and available to read by this transaction)
  13. Computation graphs caching (compute/datashard programs) (optimize CPU usage)
  14. RPC Deadline & Cancellation propagation (smooth timeout management)
  15. DDL for column-oriented tables

Database Core (Tablets, etc)

  1. ✅ Get YDB topics (aka pers queue, streams) ready for production
  2. ✅ Turn on MVCC support by default
  3. ✅ Enable Snapshot read mode by default (take and use MVCC snapshot for reads instead of running distributed transaction for reads)
  4. Change Data Capture (be able to get change feed of table updates)
  5. Async Replication between YDB databases
  6. Background compaction for DataShards
  7. Compressed Backups. Add functionality to compress backup data
  8. Process of Extending State Storage without cluster downtime. If a cluster grows from, say, 9 nodes to 900 State Storage configuration stays the same (9 nodes), it leads to a performance bottleneck.
  9. Splite/Merge DataShards BY LOAD by default. Most users require this feature turned on by default
  10. Support PostgreSQL datatypes in tablet local database
  11. Basic histogram for DataShards (first step towards cost based optimizations)
  12. Transaction can see its own updates (updates made during transaction execution are not buffered in RAM anymore, but rather are written to disk and available to read by this transaction)
  13. Data Ingestion from topic to table (implement built-in compatibility to ingest data to YDB tables from topics)
  14. Support snapshot read over read replicas (consistent reads against read replicas)
  15. Transactions between topics and tables

Hardcore

  1. Datashard iterator reads via MVCC
  2. Switch to TRope (or don't use TString/std::string directly, provide zero-copy data passing between components)
  3. Avoid Node Broker as SPF (NBS must work without Node Broker under emergency conditions)
  4. Subscriptions in SchemeBoard (optimize interaction with SchemeBoard via subsription to updates)

BlobStorage

  1. "One leg" storage migration without downtime (migrate 1/3 of the cluster from one AZ to another for mirror3-dc erasure encoding)
  2. ActorSystem 1.5 (dynamically reassign threads in different thread pools)
  3. Publish an utility for BlobStorage management (it's called ds_tool for now, improve it and open)
  4. Self-heal for degrated BlobStorage groups (automatic self-heal for groups with two broken disks, get VDisk Donors production ready)
  5. BlobDepot (a component for smooth blobs management between groups)
  6. Avoid BSC (BlobStorage Controller) as SPF (be able to run the cluster without BSC in emergency cases)
  7. BSC manages static group (reconfiguration of the static BlobStorage group must be done BlobStorage Controller as for any other group)
  8. (Semi-)Hard disk space separation (Better guarantees for disk space usage by VDisks on a single PDisk)
  9. Reduce space amplification (Optimize storage layer)
  10. Storage nodes decommission (Add ability to remove storage nodes)

Analytical Capabilities

  1. Log Store (log friendly column-oriented storage which allows to create 1+ million tables for logs storing)
  2. Column-oriented Tables (introduce a Column-oriented tables in additon to Row-orinted tables)
  3. Tiered Storage for Column-oriented Tables (with the ability to store the data in S3)

Federated Query

  1. Run the first version

Embedded UI

  1. Support for all schema entities
    • YDB Topics (add support for viewing metadata of YDB topics, its data, lag, etc)
    • CDC Streams
    • Secondary Indexes
    • Read Replicas
    • Column-oriented Tables
  2. Basic charts for database monitoring

Command Line Utility

  1. Use a single ydb yql instead of ydb table query or ydb scripting
  2. Interactive CLI

Tests and Benchmarks

  1. Built-in load test for DataShards in YCSB manner
  2. ydb workload for topics
  3. Jepsen tests support

Experiments

  1. Try RTMR-tablet for key-value workload