Roadmap 2024 (discussion) #58392

alexey-milovidov · 2023-12-31T18:40:04Z

This is ClickHouse roadmap 2024.
Descriptions and links are to be filled.

This roadmap does not cover the tasks related to infrastructure, orchestration, documentation, marketing, external integrations, drivers, etc.

SQL Compatibility

✔️ Enable Analyzer by default
Non-constant CASE, non-constant IN
Remove old predicate pushdown mechanics
Correlated subqueries with decorrelation
Transforming anti-join: LEFT JOIN ... WHERE ... IS NULL to NOT IN
Deriving index condition from the right-hand side of INNER JOIN
JOINs reordering and extended pushdown
Time data type

Data Storage

✔️ Userspace page cache
✔️ Adaptive mode for asynchronous inserts
✔️ Semistructured Data: Variant data type
Semistructured Data: Sharded Maps
Semistructured Data: JSON data type
Transactions for Replicated tables
Lightweight Updates v2
Uniform treatment of LowCardinality, Sparse, and Const columns
Settings to control the consistency of projections on updates
Replicated Catalog ☁️
On-disk storage for Keeper
Query cache on disk
✔️ Decoupling of object storages and metadata
Full-text indices (production readiness)
Vector search indices (production readiness)

Security, access control, and isolation

✔️ Definers (encapsulation of access control) for views
Warnings and limits on the number of database objects
Dynamic configuration of query handlers
JWT authentication ☁️
Data masking in row-level security ☁️
Secure storage for named collections ☁️
Cancellation points for long operations
Resource scheduler (continuation)

Query Processing

Parallel replicas with task callbacks (production readiness)
Parallel replicas with parallel distributed INSERT SELECT
Automatic usage of -Cluster table functions
Adaptive thresholds for data spilling on disk
Optimization with subcolumns by default

Interfaces & External Data

Support for Iceberg Data Catalog
Support for Hive-style partitioning
Explicit queries in external tables
Even simpler data upload
HTTP API for simple query construction
Unification of data lake and file-like functions

Testing & Hardening

Revive coverage
Fuzzer of data formats
Fuzzer of network protocols
Server-side AST query fuzzer
Generic fuzzer for query text
Randomization of DETACH/ATTACH in tests
Integration with SQLSmith
Embedded documentation

Experiments & Research

Multi-RAFT for Keeper
MaterializedPostgreSQL (production readiness)
SSH protocol for the server
Support for PromQL
Streaming queries
Freeform text format
Key-value data marts
Decouple of columns and buffers
Lazy reading of ranges
Instant attaching tables from backups
An object storage to borrow space from the filesystem cache
COW disks
ALTER PRIMARY KEY
Autocompletion with language models
Decentralized tables
Unique Key Constraint

The roadmap covers the top focus items for both external contributors and full-time ClickHouse employees.
The items marked with the ☁️ icon are meant for ClickHouse Cloud (proprietary).
We expect 50..80% completion of the roadmap according to the results from previous years.

alanpaulkwan · 2023-12-31T19:10:38Z

"HTTP API for simple query construction" - would be really awesome if Python/R/DuckDB could read an arbitrary output / filtered table like S3 or just a file download. Amazing.

Also glad to see join reordering on the list still.

chenziliang · 2023-12-31T19:15:11Z

Would like to see streaming processing become main stream in ClickHouse :)

ahmed-adly-khalil · 2023-12-31T20:25:28Z

Would be great to see deeper NATS integration, mainly using JWT auth.

ucasfl · 2024-01-02T07:23:09Z

Support for Iceberg Data Catalog

Catalog is a good concept, but if want to introduce catalog into ClickHouse, we need refactor current metadata's structure, from Database -> Table to Catalog -> Database -> Table, then all tables created with builtin engine will under a internal_catalog, and we can have iceberg_catalog, hive_catalog. Not sure it's a good idea.

Besides, we still face the difficulty that we don't have Iceberg C++ API.

olly-writes-code · 2024-01-02T12:28:31Z

Hi! Adding a request for more GIS features. One important one is the ability to transform a geometry to a specified SRID. Redshift docs here

alexey-milovidov · 2024-01-02T12:46:01Z

@ucasfl, we can support mapping a specific database from the Iceberg data catalog as a database in ClickHouse.
In this way, we don't have to map a whole catalog at once but allow doing it database by database.

alexey-milovidov · 2024-01-02T12:47:58Z

@ahmed-adly-khalil, while NATS is not on the main list (but nice to have), there are some items that we already started to do: #39459

alexey-milovidov · 2024-01-02T12:50:38Z

@alanpaulkwan, yes, in the simplest form, it represents a table like a file: #46925 but also allows to customize the result.

jrdi · 2024-01-02T14:48:41Z

Would be great to know if there are plans to keep working on improving zero-copy and Cloud Storage during 2024. I'm constantly seeing improvements and bug fixes which is super good. Will we see zero-copy ready for production this year?

Decoupling of object storages and metadata

Does this mean moving metadata from disks to Keeper or any other shared store?

alexey-milovidov · 2024-01-02T16:39:46Z

@jrdi

Would be great to know if there are plans to keep working on improving zero-copy and Cloud Storage during 2024. I'm constantly seeing improvements and bug fixes. Will we see zero-copy ready for production this year?

We have to fix the issues in zero-copy replication because it is still tested in CI, and used in production on older services in ClickHouse Cloud. For example, issues like this are found: #58333. But the track record of zero-copy replication is not good, and we expect to stop using it, then remove it from CI, and keep it on life support without further changes.

Does this mean moving metadata from disks to Keeper or any other shared store?

This is #58357

We currently have the following metadata options:

Metadata on local filesystem (s3).
No separate metadata (s3_plain).
Metadata in .index files in directories (web).
Metadata in a backup.
Metadata in Keeper (proprietary).
and more to come, e.g. A disk similar to s3_plain that allows directory rename. #58347

And, we have the following object storage options:

S3.
HDFS.
Azure.
Web.
Local filesystem.
Borrowing space from the filesystem cache.

The task is to allow the cross-product of these options.

chenziliang · 2024-01-02T18:02:33Z

Would be great to see deeper NATS integration, mainly using JWT auth.

@ahmed-adly-khalil may I ask if you like streaming processing / analytics against NATS via ClickHouse ?

jrdi · 2024-01-02T18:44:52Z

Thanks, @alexey-milovidov!

But the track record of zero-copy replication is not good, and we expect to stop using it, then remove it from CI, and keep it on life support without further changes.

I can understand this decision but it's a pity. This mean that open source version won't have a productive method to separate compute and storage. Do you think this could change in the short/mid term? Even something like a plan with CH help and guidance on improvements that can be done to keep support by external contributors sounds better than keeping the feature out of the CI.

alexey-milovidov · 2024-01-02T20:21:34Z

It is not guaranteed and not in the plans, but we might have an implementation in the future- the only thing for sure is that it will not be based on zero-copy replication.

bputt-e · 2024-01-05T15:07:29Z

Unique Key Constraint would be great, could remove our deduplication step in our processing pipeline

mbtolou · 2024-01-07T17:23:03Z

Unique Key Constraint is great idea.

alanpaulkwan · 2024-01-07T17:36:10Z

I really like the unique key idea - hope it can follow ReplacingMergeTree and allow user to decide which row entry to keep. For me the options seem to be (1) incumbent data entry, (2) newest data entry, (3) an integer describing version priority. I've created arbitrary values to keep the "best value", which allows for non-standard logic like keeping the value that minimizes the difference in two timestamps with some case-by-case logic.

mwarkentin · 2024-01-11T23:13:21Z

@jrdi theres a proposal from Altinity here: #54644

earlev4 · 2024-01-12T19:48:21Z

A huge thank you to ClickHouse team and all the contributors for the amazing work on ClickHouse! I sincerely appreciate it.

Just curious. The Roadmap 2023 mentioned a "Recursive CTE" task, but I do not see it mentioned in the Roadmap 2024. Are plans to implement recursive CTEs in the future?

Thanks again!

alexey-milovidov · 2024-01-13T19:04:57Z

@earlev4, It was planned for the previous year after enabling Analyzer, but we didn't manage to enable Analyzer under that schedule, so I've added it as the major item for 2024, but I'm afraid to add recursive CTEs. We are considering it for implementation, but not on the list of main items.

earlev4 · 2024-01-16T16:17:24Z

Thanks so much, @alexey-milovidov! I sincerely appreciate the detailed response. It is very helpful. I am very grateful to you and the team for ClickHouse!

domainio · 2024-01-23T10:28:12Z

Hi, is there a plan to support Apache Iceberg writing with MERGE operation?

zheyu001 · 2024-01-23T10:43:49Z

Is there any chance to support iceberg v2? Or support evolved schema.

1392657590 · 2024-01-24T03:51:45Z

Do you have time to resolve In high-concurrency scenarios, the performance of ClickHouse Keeper is lower than that of ZooKeeper.We found this issue when replacing zk with keeper. The replacement plan has been temporarily suspended

JackyWoo · 2024-01-24T10:48:25Z

@1392657590 maybe you can try RaftKeeper

jiugem · 2024-02-04T04:44:44Z

I don't see MaterializedMySQL in the Roadmap

zhanglistar · 2024-02-05T07:59:14Z

@alexey-milovidov What about non-equal join? Any plan? Thanks.

chrisgoddard · 2024-02-15T17:29:38Z

Would be super excited to see production support for Vector search indices - especially on Clickhouse Cloud. Every week there seems like there's a different vector database and I can't wait until I can just use Clickhouse for everything

xevix · 2024-02-16T19:26:15Z

I would have major performance bottlenecks alleviated with materialized CTEs, to avoid DB roundtrips to create many intermediate results in tables. DuckDB recently added this last year which was great to see, was wondering if Clickhouse was thinking about this as well: #53449.

Fantastic work on recent releases, with usability features like ORDER BY ALL making it faster to query things ad hoc, and tight S3 integration with system credentials just magically pulled in 🎉

guoxiaolongzte · 2024-02-19T07:42:36Z

Support for Hive style partitioning.

Does it support dynamic hive partition writing?

guoxiaolongzte · 2024-02-19T08:00:57Z

Support for Iceberg Data Catalog

Catalog is a good concept, but if want to introduce catalog into ClickHouse, we need refactor current metadata's structure, from Database -> Table to Catalog -> Database -> Table, then all tables created with builtin engine will under a internal_catalog, and we can have iceberg_catalog, hive_catalog. Not sure it's a good idea.

Besides, we still face the difficulty that we don't have Iceberg C++ API.

@alexey-milovidov
When do we plan to support the hive catalog and hudi catalog?

mingmwang · 2024-02-20T07:44:06Z

@1392657590 maybe you can try RaftKeeper

@JackyWoo

Do you have some benchmark data to share between the performance and throughput of ClickHouse Keeper and RaftKeeper ?

JackyWoo · 2024-02-20T08:05:10Z

@mingmwang we did not compare them right now, we only compare it with Zookeeper.

softiger · 2024-02-21T04:08:45Z

@JackyWoo could you share the comparison result between Zookeeper and RaftKeeper? I'm really interested in it, thanks!

JackyWoo · 2024-02-21T06:19:36Z

@softiger You can find it here. Let's talk about RaftKeeper here.

immelnikoff · 2024-03-13T15:11:58Z

I would like to see binary search for pre-ordered arrays in the next functions:
has()
hasAny()
hasAll()
arrayIntersect().
In general, I would like to see accelerated functions for pre-ordered arrays.

wordhardqi · 2024-03-30T07:22:52Z

CK could plan query optimizer for complex queries.

wordhardqi · 2024-03-30T07:23:35Z

Starrocks, Snowflake and Byconity have it.

Dileep-Dora · 2024-04-01T10:11:39Z

I see from 2023 roadmap inverted indices implementation is not a priority. #38667 .

are we considering this for this year or any other plans on improving text search performance.

alexey-milovidov · 2024-04-03T08:36:56Z

So far, there is a prototype implementation of inverted indices (you can unlock it with allow_experimental_inverted_index) - it is not ready and should not be used in production. It was not tested on realistic datasets.

Dileep-Dora · 2024-04-03T10:52:30Z

@alexey-milovidov thanks for the reply, yes we've tried this experimental feature but performance was not upto the mark. hence checking do we have any plans for this in 2024

johnpyp · 2024-05-05T00:57:32Z

Curious about prioritizing supporting ORDER BY optimizations for projections? This is the one thing holding my team back from using Clickhouse for WHERE query usecases where we want to replicate the ease-of-use and flexibility of traditional database indices.

We'd love to be able to create potentially many projections on top of one table with varied combinations of different ORDER BY query optimizations and WHERE query optimizations.

anvaari · 2024-05-11T04:32:20Z

Is there any plan to support deserializing Protobuf through a schema registry? It's needed.

alexey-milovidov added the feature label Dec 31, 2023

alexey-milovidov pinned this issue Dec 31, 2023

alexey-milovidov mentioned this issue Dec 31, 2023

Roadmap 2023 #44767

Closed

johnnymatthews mentioned this issue Jan 2, 2024

Add 2024 Roadmap to About ClickHouse -> Roadmap ClickHouse/clickhouse-docs#1818

Closed

johnnymatthews assigned johnnymatthews and unassigned johnnymatthews Jan 4, 2024

kadirakinkorkunc mentioned this issue Apr 26, 2024

Incorrect Nested Object Type Mapping for JSON Column in Distributed Table #62179

Open

Roadmap 2024 (discussion) #58392

Roadmap 2024 (discussion) #58392

Comments

alexey-milovidov commented Dec 31, 2023 • edited

SQL Compatibility

Data Storage

Security, access control, and isolation

Query Processing

Interfaces & External Data

Testing & Hardening

Experiments & Research

alanpaulkwan commented Dec 31, 2023 • edited

chenziliang commented Dec 31, 2023

ahmed-adly-khalil commented Dec 31, 2023

ucasfl commented Jan 2, 2024

olly-writes-code commented Jan 2, 2024

alexey-milovidov commented Jan 2, 2024

alexey-milovidov commented Jan 2, 2024

alexey-milovidov commented Jan 2, 2024

jrdi commented Jan 2, 2024 • edited

alexey-milovidov commented Jan 2, 2024 • edited

chenziliang commented Jan 2, 2024

jrdi commented Jan 2, 2024

alexey-milovidov commented Jan 2, 2024

bputt-e commented Jan 5, 2024

mbtolou commented Jan 7, 2024

alanpaulkwan commented Jan 7, 2024 • edited

mwarkentin commented Jan 11, 2024

earlev4 commented Jan 12, 2024

alexey-milovidov commented Jan 13, 2024

earlev4 commented Jan 16, 2024

domainio commented Jan 23, 2024

zheyu001 commented Jan 23, 2024 • edited

1392657590 commented Jan 24, 2024

JackyWoo commented Jan 24, 2024

jiugem commented Feb 4, 2024

zhanglistar commented Feb 5, 2024 • edited

chrisgoddard commented Feb 15, 2024

xevix commented Feb 16, 2024

guoxiaolongzte commented Feb 19, 2024

guoxiaolongzte commented Feb 19, 2024

mingmwang commented Feb 20, 2024

JackyWoo commented Feb 20, 2024 • edited

softiger commented Feb 21, 2024

JackyWoo commented Feb 21, 2024 • edited

immelnikoff commented Mar 13, 2024

wordhardqi commented Mar 30, 2024

wordhardqi commented Mar 30, 2024

Dileep-Dora commented Apr 1, 2024

alexey-milovidov commented Apr 3, 2024

Dileep-Dora commented Apr 3, 2024

johnpyp commented May 5, 2024 • edited

anvaari commented May 11, 2024

alexey-milovidov commented Dec 31, 2023 •

edited

alanpaulkwan commented Dec 31, 2023 •

edited

jrdi commented Jan 2, 2024 •

edited

alexey-milovidov commented Jan 2, 2024 •

edited

alanpaulkwan commented Jan 7, 2024 •

edited

zheyu001 commented Jan 23, 2024 •

edited

zhanglistar commented Feb 5, 2024 •

edited

JackyWoo commented Feb 20, 2024 •

edited

JackyWoo commented Feb 21, 2024 •

edited

johnpyp commented May 5, 2024 •

edited