Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roadmap 2023 #44767

Closed
alexey-milovidov opened this issue Dec 30, 2022 · 73 comments
Closed

Roadmap 2023 #44767

alexey-milovidov opened this issue Dec 30, 2022 · 73 comments
Assignees
Labels

Comments

@alexey-milovidov
Copy link
Member

alexey-milovidov commented Dec 30, 2022

This is ClickHouse roadmap 2023.
Descriptions and links are to be filled.

This roadmap does not cover the tasks related to infrastructure, orchestration, documentation, marketing, external integrations, drivers, etc.

See also:

Roadmap 2022: #32513
Roadmap 2021: #17623
Roadmap 2020: link

Testing and hardening

Fuzzer of data formats.
Fuzzer of network protocols.
✔️ Fuzzer of MergeTree settings.
Server-side AST query fuzzer.
Generic fuzzer for query text.
Randomization of DETACH/ATTACH in tests.
✔️ Integrate with SQLogicTest.
✔️ GWP-ASan integration.
✔️ Fully-static ClickHouse builds.
Embedded documentation.
Green CI: https://aretestsgreenyet.com/

Query analysis and optimization

Enable Analyzer by default.
Remove old predicate pushdown mechanics.
✔️ Support for all types of CASE operators with values.
Transforming anti-join: LEFT JOIN ... WHERE ... IS NULL to NOT IN.
Deriving index condition from the right-hand side of INNER JOIN.
JOINs reordering and extended pushdown.
Correlated subqueries (with decorrelation).
Parallel in-order GROUP BY.
GROUP BY optimizations based on query rewrite.
Use table sorting for DISTINCT optimization.
Use table sorting for merge JOIN.
Alias columns in views.
Recursive CTE.

Separation of storage and compute ☁️

✔️ Increased prefetching for VFS.
✔️ Lazy initialization of tables and data parts.
✔️ Cooperation of temporary data for query processing with VFS cache.
✔️ Multiplexed data parts format for lowering the number of writes ☁️
Parallel replicas with task callbacks (production readiness).
✔️ Parallel replicas with custom sharding key.
✔️ SharedMergeTree - a table engine with a shared set of data parts. ☁️
✔️ Shared metadata storage - no local metadata on replicas. ☁️
Optimization of high-cardinality operations with the help of parallel replicas.
Query offloading to dynamic stateless workers.
Instant attaching tables from backups.

Security and access control

Untangle auth methods for S3.
✔️ Managing named collections with SQL.
Secure storage for named collections. ☁️
Dynamic managing of custom query handlers.
✔️ Default password hashing schema.
✔️ Bcrypt or other PBKDF.
Resource scheduler.
TLS for the PostgreSQL compatibility protocol.
Data masking in row-level security.
JWT support. ☁️

Formats and integrations

✔️ Headers autodetect for -WithNames formats.
✔️ Support s3-style URLs along with gs and others.
✔️ Support for Apache Iceberg.
✔️ Optimization of reading for Parquet.
✔️ Overlay databases and File database engine.
✔️ Streaming consumption from a bunch of files.
✔️ Support for embedded indices inside Parquet.
Schema inference for table function mongodb.
Table function zookeeper.
Simple HTTP PUT or form file API for data insertion.
Asynchronous inserts by default.

Data storage

✔️ Lightweight deletes - production readiness.
Transactions for ReplicatedMergeTree
✔️ Indexing with space-filling curves.
Uniform treatment of LowCardinality, Sparse, and Const columns.
Grouping a huge number of columns together.
✔️ Deprecate in-memory data parts or make them production ready.

Experimental features and research

✔️ Query results cache.
✔️ Regexp-Tree dictionaries.
✔️ Batch jobs and refreshable materialized views.
Streaming queries.
Freeform text format.
Semistructured data support - production readiness.
Websocket protocol for ClickHouse.
ssh protocol for ClickHouse.
Key-value data marts.
Unique Key Constraint.
✔️ PRQL as a dialect.
✔️ Kusto support.

@nolar
Copy link

nolar commented Dec 31, 2022

Hi. Thanks for this topic where I can share my thoughts. We use ClickHoue to store some basic, not very complicated data (but in big volume). What we have found lacking is the ecosystem of ClickHouse, not ClickHouse's features.

Specifically, it means Python ecosystem: both the low-level driver and common frameworks integrations. As an example, clickhouse-sqlalchemy connector does not work properly with SQLAlchemy 1.4+, there is an issue (or several issues), but the maintainer probably cannot dedicate much time to the project (as an author of another (unrelated) project, I totally understand this). And none of us is a sufficiently good SQLAlchemy expert to understand the depth of its integration. And we had numerous issues with integration earlier (mostly resolved or worked around).

I might assume other languages and frameworks have the same problems.

I don't know what is a good solution here. Maybe to support those maintainers somehow. Or hire contractors to fix the issues.

But either way, in my belief, the ecosystem around ClickHouse is important enough for ClickHouse's wider adoption — maybe even more important than narrow use cases for performance inside ClickHouse. Sadly, I see no line about this in the roadmap. It would be good to have one.

@nolar
Copy link

nolar commented Dec 31, 2022

PS: Another feature request, though but maybe overly specific (which does not mean easy) — but it came to my mind after seeing "Fully-static ClickHouse builds":

Is it possible to reduce the size of clickhouse-client dramatically (100-fold)? I once tried to add it to an Ubuntu image, and it added 800 MB — quite a lot for Docker images for a supplementary tool used for debugging.

The idea is to have this client (so as psql for Postgres) installed in the runtime containers, where it is called not directly but through a wrapper command which feeds it the proper credentials from the container's secrets (app- & env-specific). E.g. /app/manage postgres shell, /app/manage clickhouse shell.

@Slach
Copy link
Contributor

Slach commented Dec 31, 2022

@nolar ;) try https://github.com/ClickHouse/clickhouse-connect/ instead of clickhouse-sqlalchemy

@azat
Copy link
Collaborator

azat commented Dec 31, 2022

Is it possible to reduce the size of clickhouse-client dramatically (100-fold)? I once tried to add it to an Ubuntu image, and it added 800 MB — quite a lot for Docker images for a supplementary tool used for debugging.

@nolar about binary size:

debug symbols takes most of the space, official packages has them in a separate package, so, if you will add only clickhouse-client package from the official repository, then you will have clickhouse binary that takes 421M.

But it is not possible to strip it even further, at least now, since ClickHouse uses one binary for everything, this approach has advantages, like client can parse query and fail it if the syntax is wrong without sending it to the server, there is also clickhouse-local which is like a client+server in a standalone binary.

However, clickhouse support mysql (default port 9004)/postgresql protocols (and http as well), and you can actually use those clients instead to send queries to clickhouse, though I wouldn't recommend this, since those clients are not that feature reach as ClickHouse's client.

@andrea-s
Copy link

andrea-s commented Jan 1, 2023

Is there any work planned on improving the state of window functions? At the moment they are technically working but very slow for a number of realistic use cases (see e.g. #40781).

@alanpaulkwan
Copy link

Love the proposed Parquet speed improvements, join improvements. Join improvements and data-duplication are my biggest paint points.

For some reason, full sorting merge seems to blow up in memory usage even if the right table is small - I imagine this is related to the use of table indices but I wonder there is a need to make it use disk? So I wonder if that can be part of the merge join fixes. Also for some reason various of the join algorithms seem to have misleading query progress (stuck at 99% for ten minutes when 0 to 99 was 30seconds). Also true with grace hash.

Also, any chance of this? #18536

It's the other thing I spend time maintaining, finding terribly compressed columns and optimizing them.

@alexey-milovidov
Copy link
Member Author

@andrea-s This particular case #40781 is interesting - it looks easy, but actually, it can be challenging to implement. We have to discuss the possible approaches to this task - we did not brain-storm it yet.

There are many other optimizations for window functions that we have to implement - for example, the re-use of the ordering of the data stream to optimize ORDER BY / PARTITION BY inside the window function.

@ucasfl
Copy link
Collaborator

ucasfl commented Jan 1, 2023

@alexey-milovidov I'm working on Unique Key Constraint currentlly, and have proposed the initial pull request: #44534. Could you or someone else help with review it? This feature is useful in many scenarios, I'd like to continue until finish it.

@alexey-milovidov
Copy link
Member Author

@alanpaulkwan

For some reason, full sorting merge seems to blow up in memory usage even if the right table is small - I imagine this is related to the use of table indices but I wonder there is a need to make it use disk? So I wonder if that can be part of the merge join fixes. Also for some reason various of the join algorithms seem to have misleading query progress (stuck at 99% for ten minutes when 0 to 99 was 30seconds). Also true with grace hash.

Currently, we have only a nascent low-level implementation of a few JOIN algorithms in external memory:

  1. Many algorithms are implemented (hash, parallel_hash, partial_merge, full_sorting_merge, grace_hash), but there is no good default choice or automatic choice between them. Even parallel_hash, while being faster than the hash in most cases, is not enabled by default.
  2. Merge JOIN algorithms don't take advantage of the table's / data stream sorting.
  3. While some algorithms can spill data to disk, the control of the memory usage before spilling is non-obvious.

I see it as a halfway implementation: the fundamental parts are there, but only experts can use them by manually tuning the settings. I have big expectations to improve it in 2023.

@alexey-milovidov
Copy link
Member Author

Also for some reason various of the join algorithms seem to have misleading query progress (stuck at 99% for ten minutes when 0 to 99 was 30seconds). Also true with grace hash.

Query progress is very simple - it is the number of rows already read from the tables divided by the number of expected rows to read. I think we will keep the progress simple, as it is. But there is going to be one improvement: #39694

@alexey-milovidov
Copy link
Member Author

Also, any chance of this? #18536

It's not in the plans. Can be good for a contribution. But there are some considerations about this task, I will write them there.

@alexey-milovidov
Copy link
Member Author

@ucasfl This PR will be reviewed. Today the majority of the team is on vacation, so maybe next week :)

@alanpaulkwan
Copy link

@alanpaulkwan

For some reason, full sorting merge seems to blow up in memory usage even if the right table is small - I imagine this is related to the use of table indices but I wonder there is a need to make it use disk? So I wonder if that can be part of the merge join fixes. Also for some reason various of the join algorithms seem to have misleading query progress (stuck at 99% for ten minutes when 0 to 99 was 30seconds). Also true with grace hash.

Currently, we have only a nascent low-level implementation of a few JOIN algorithms in external memory:

  1. Many algorithms are implemented (hash, parallel_hash, partial_merge, full_sorting_merge, grace_hash), but there is no good default choice or automatic choice between them. Even parallel_hash, while being faster than the hash in most cases, is not enabled by default.
  2. Merge JOIN algorithms don't take advantage of the table's / data stream sorting.
  3. While some algorithms can spill data to disk, the control of the memory usage before spilling is non-obvious.

I see it as a halfway implementation: the fundamental parts are there, but only experts can use them by manually tuning the settings. I have big expectations to improve it in 2023.

Amazing. Yes I have to tune it and often times I just write for loops to get a superior join pattern. Do you think Clickhouse will close the gap on TPC-H type workloads? Basically in academia I have a lot of big table joins.

@alexey-milovidov
Copy link
Member Author

alexey-milovidov commented Jan 1, 2023

Do you think Clickhouse will close the gap on TPC-H type workloads?

Yes! But not "close", maybe "approach" :)
We have benchmarks with TPC-DS, TPC-H, and Star Schema Benchmark, and for many queries, the results can be improved with low-hanging fruits.

@mo-avatar
Copy link
Contributor

mo-avatar commented Jan 5, 2023

Is it possible to shuffle data within the cluster to accelerate join in clickhouse, which means spread data with the same hash value to certain shard? Other big data component such as spark seem to have feature like this. Or, may be I can rephrase it like this, is automatic colocate join implementable in clickhouse?

@stergiotis
Copy link

ClickHouse has a nice set of functions for working with arrays already. I would see great value in expanding its capabilites in the following directions:

  • Supporting sorted arrays (e.g. with functions like arrayIntersectSorted(...), hasSorted(arr,val), indexOfSorted(arr,val), ...)
  • Introducing sorted arrays in the type system
  • Use Eytzinger layout to accelerate search in arrays
  • Use binary search on sorted arrays
  • Introduce a indicesOf(arr,val) function similiar to the indexOf(arr,val) function
  • Make co-arrays invariants first class (e.g. arrayCoSort(func,arr1,arr2,...))

Further it would be nice to have ClickHouse query parser and analyzer exposed by SQL e.g. for pretty-printing, parsing or canonicalizing queries. This could be implemented by extending EXPLAIN AST.

@MaxWk
Copy link
Contributor

MaxWk commented Jan 6, 2023

Is there any document about the Separation of storage and compute version?

@ucasfl
Copy link
Collaborator

ucasfl commented Jan 6, 2023

Is there any document about the Separation of storage and compute version?

Looks like currently it's developing in private.

@Pipboyguy
Copy link

What's happening to catboost integration? Is it still being supported or actively developed?

@zhanglistar
Copy link
Contributor

Still no shuffle support?

@lgbo-ustc
Copy link
Contributor

https://github.com/ByConity/ByConity/blob/master/src/QueryPlan/RemoteExchangeSourceStep.h

This looks like a shuffle implementation

@rhuddleston
Copy link

Azure blob store support? #19307

@AlekseiSaff
Copy link

object JSON ready for production in 2023?

@jackieli-tes
Copy link

I vote recursive CTE

@rupurt
Copy link

rupurt commented Jan 11, 2023

For Iceberg it would be useful to include support for Puffin files https://iceberg.apache.org/puffin-spec/. That would compete with Support for embedded indices inside Parquet..

This is a good overview of the why? https://www.dremio.com/subsurface/puffins-and-icebergs-additional-stats-for-apache-iceberg-tables/

Another great addition to Iceberg would be support for branch/merge/tag semantics along with support for nessie

@mga-chka
Copy link

mga-chka commented Jan 13, 2023

  1. Many algorithms are implemented (hash, parallel_hash, partial_merge, full_sorting_merge, grace_hash), but there is no good default choice or automatic choice between them. Even parallel_hash, while being faster than the hash in most cases, is not enabled by default.

Hi,
Can you give the reason why parallel_hash is not the default join algorithm? Because from afar, it seems much faster than hash (for large dataset) while having a memory overhead ok enough (from the discussion I saw on the PR).

nb: I haven't played with it yet but I was planning to test it and maybe use it in our production clusters to speed up some edge cases.

@alexey-milovidov
Copy link
Member Author

@lqhl, Currently, no - it's one of a few cloud-only modifications.

@devoxi
Copy link

devoxi commented Aug 19, 2023

@alexey-milovidov, will the the zero-replication feature be deprecated as a consequence? Or will you keep investing on it and stabilise it?

@alexey-milovidov
Copy link
Member Author

Zero-copy replication is clunky but mostly ok for now. We have fixed a ton of bugs in it.

@cangyin
Copy link
Contributor

cangyin commented Aug 22, 2023

@lqhl, Currently, no - it's one of a few cloud-only modifications.

Is there a cloud version binary package that I can download and take a trial?

@alexey-milovidov
Copy link
Member Author

@cangyin No, the cloud version is not redistributed.

@hkrutzer
Copy link

Currently, no - it's one of a few cloud-only modifications.

@alexey-milovidov will the corporation publish or has it already published what its stance is on cloud-only features? I don't think there is a technical reason for having this feature be cloud only. I understand the corporation needs to make money, that is fine. But without clear expectations of which features will be and will remain in the open source version, community trust can start to erode. For example, now that some features are cloud only, one might start to wonder what if Clickhouse Inc. finds its cloud offering is not taking off sufficiently, and starts to remove features from the open source version, or starts including important new features only to its cloud version.

@alexey-milovidov
Copy link
Member Author

It's good to have a small, limited number of modifications exclusive to ClickHouse Cloud, but only those that do not compromise the features or operation in self-managed usages, but in the same way, are crucial and distinguishing for the Cloud.

SharedMergeTree is a perfect choice - it gives advantages for ClickHouse Cloud with dynamic scaling; it could give an advantage to other cloud providers and, maybe, for large companies with many clusters and multi-tenant setups. It does not prevent using ClickHouse in distributed setups, with large and small clusters, local or shared storage, multi-region setups, etc. We keep in mind that we still need to provide improvements for self-managed operation, such as - support for re-sharding and easy setup on clusters.

@misaka-will
Copy link

@alexey-milovidov
Will ClickHouse Inc merge in master branch community driven attempts to implement features like shared metadata storage? (#48620 for example)

@alexey-milovidov
Copy link
Member Author

@misaka-will, This is a large feature, and it would take a lot of resources to implement and validate. It makes sense to send a pull request - to run CI checks and allow review and discussion with other community members.

There were cases when we received multiple competing implementations for a single feature and used parts of both to merge. We merge the vast majority of pull requests, but many others are closed due to conflicts in architecture or difficulties in support.

An alternative shared metadata might intersect and conflict with the private implementation, so it is at risk. But also there is a chance it can be merged and used for a broader area.

@alex-zaitsev
Copy link
Contributor

@alexey-milovidov , the Roadmap in this current form looks confusing. It mixes features for open source and closed source version. What do you think about splitting the Roadmap to an open source version, and the separate one for the closed source / cloud one. That would make it more transparent for the users and community.

@mga-chka
Copy link

@alexey-milovidov
2 questions regarding your move to a closed source version:

  • instead of having a closed source version, did you consider changing the license to avoid a cloud provider to sell clickhouse (for example with an SSPL license like Elastic)?
  • do you plan to sell the closed source version?

@rvasin
Copy link
Contributor

rvasin commented Aug 30, 2023

It's nice to hear what is the closed source version? I think the closed source is everything related to ClickHouse Cloud, for example the cloud's backend and frontend scripts. ClickHouse, Inc. already sells the cloud subscriptions. Or @mga-chka do you mean the selling the cloud closed source scripts to run own private cloud?

@mga-chka
Copy link

mga-chka commented Aug 30, 2023

I mean selling the binary of clickhouse (and the related scripts + the documentation) that contains non-open source features (like the SharedMergeTree) so that external companies can use these features.

Because for internet scale companies like mine (contentsquare), the amount of data becomes too big and having an object-storage based database becomes a necessity. But, with the current move of clickhouse.inc, I'm not sure if we can rely on the open source object-storage related features since their support are unclear (like the zero-replication feature we wanted to use, cf a few msg before)

@hkrutzer
Copy link

did you consider changing the license

To do that they would first have to get permission from all open source contributors that have not signed a CLA, or get rid of their code.

@athanhat
Copy link

athanhat commented Sep 2, 2023

ClickHouse as an alternative storage engine for CozoDB

Hi, my main interest is the integration and analysis of very large databases, knowledge bases, graph databases with extremely complex schema. I have used ClickHouse in the past and I am aware about its superb performance.

My problem for the case I described is that SQL with joins cannot cope with the complexity of schemas and graph data models. Nevertheless with ClickHouse merge tree table you can store triplets, or in fact any n-ary relation efficiently.

On the other hand I have been experimenting for some time with Datalog systems that utilize a backend for storage, e.g. a KV store (LMDB, RocksDB, DynamoDB) mainly for indexing purposes. A declarative logic programming language like datalog seems an excellent approach here but these systems really suffer from reading and writing on disk.

My question is whether a system like Clickhouse can be used as a storage engine for a system with a datalog query language that supports operations on n-ary relations. Recently I have been exploring CozoDB and I have opened an issue, cozodb/cozo#175, here. I have also discussed this subject in the past with Clojure developers of datalog systems e.g. XTDB that found my proposal very interesting.

I will be very keen in reading your views on that topic.

@javisantana
Copy link
Contributor

An alternative shared metadata might intersect and conflict with the private implementation, so it is at risk.

I think is totally fine you keep part of the code but it'd be great to be clear with the people contributing to the core (*) about the technical restrictions and the policy. It's clear the S3 and zero-copy design requires some iterations and, eventually, someone in the community was going to work on that regardless of your work in SharedMergeTree.

It does not make any sense to work for weeks on an alternative that may change something that you are not accepting. An alternative would be to be more open to talking to the community so we could discuss these kinds of things.

(*) writing as someone who is paying people full-time to contribute

@alexey-milovidov
Copy link
Member Author

@javisantana, This also worries me - a production-quality implementation of an alternative to SharedMergeTree will require not weeks but over a year for multiple engineers and the resources to test it on many various workloads. A realistic way will be to split it into multiple small steps so every step can be tested in isolation and has some incremental value on its own.

@mcarbonneaux
Copy link

automatic shard rebalancing when add shard to a cluster (background rebalancing) ?

@alexey-milovidov
Copy link
Member Author

@mcarbonneaux, this is not for 2023, maybe in 2024: #45766

@cangyin
Copy link
Contributor

cangyin commented Oct 16, 2023

What's the status of development on inverted index. Can we expect a time line for it to be production ready or is there a task schedule?

Specifically, I think these two optimizations are most outstanding:

  • inverted index pushed down to return result row id list directly, avoiding further data scan, as mentioned here. Current implementation can cause high IO load during index file reading and make inverted index slower than simple full scan.
  • possible refactor of parts merging logic to reduce memory usage during merges, as mentioned here

@alexey-milovidov
Copy link
Member Author

Inverted indices have a lower priority than other things, and currently, it is not in progress.

@olly-writes-code
Copy link

Adding a request for more GIS features. One important one is the ability to transform a geometry to a specified SRID. Redshift docs here

@fantom0005
Copy link

Hello.
Could you please tell me when you will take Replicated database feature out of the experimental ones?

This feature is very convenient, and I would like to use it.

You use Replicated databases in СlickHouse cloud. This tells me that you will not remove this feature and will bring it into production.

@alexey-milovidov alexey-milovidov changed the title Roadmap 2023 (discussion) Roadmap 2023 Dec 5, 2023
@alexey-milovidov
Copy link
Member Author

@fantom0005, there are several remaining issues with the Replicated database engine:

  1. It always waits for all replicas, even if some of them are unavailable. It gives a false impression of downtime when it's ok to wait only for available replicas.
  2. It does not support CREATE ... AS SELECT.
  3. DDL operations are run sequentially, and in the presence of many operations, the queue can grow too large.

We want to finish its implementation.

@alexey-milovidov
Copy link
Member Author

2024: #58392

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests