Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clickhouse stable ABI and shared object build for hackings like chdb #50750

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

auxten
Copy link
Member

@auxten auxten commented Jun 9, 2023

Changelog category (leave one):

  • New Feature

What's this

Hey folks, I'm the author of chdb which is an embedded SQL Engine powered by ClickHouse proudly.

It starts from an idea from Clickhouse Roadmap 2022.

I make it mainly in my spare time with many help from @lmangani @laodouya @nmreadelf @adubovikov @Dletta

Briefly, chdb did these things to make it barely work:

  1. Use WriteBufferFromVector as the output buffer and pass the vector to Python with minimized data copy provided by python memoryview
  2. Some dirty hacking on clickhouse compiling system to make clickhouse-local into a shared object.
  3. Make jemalloc works perfectly in the shared lib without influencing Python runtime.
    1. Enable jemalloc in linux x86_64 build chdb-io/chdb#22
    2. Also a patch for jemalloc: Make arenas_lookup_ctl triable jemalloc/jemalloc#2424
  4. Query on Pandas dataframe(need further optimization) Query on pandas dataframe and query result chdb-io/chdb#36
  5. Python DB-API 2.0 support: Add Python DB API 2.0 driver chdb-io/chdb#35
  6. Minor fix for clickhouse: Resize BufferFromVector underlying vector only pos_offset == vector.size() #50546

Now chdb has bindings for Python, Go, Rust, NodeJS, Bun

I think it's time to make some contribution to the clickhouse community: https://twitter.com/auxten/status/1666468268008427522?s=20

Let's start from this PR :-)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

todo

@UnamedRus
Copy link
Contributor

Just curious, did you look into other attempts of embedding of ClickHouse into software, may be the can have some good ideas?

@lmangani
Copy link
Contributor

lmangani commented Jun 9, 2023

Just curious, did you look into other attempts of embedding of ClickHouse into software, may be the can have some good ideas?

Could you list some relevant examples of any similar attempts?

@UnamedRus
Copy link
Contributor

UnamedRus commented Jun 9, 2023

ClickHouse as query engine to BigData platform (with custom storage layer and table engine) https://github.com/ytsaurus/ytsaurus/tree/main/yt/chyt/server
https://ytsaurus.tech/docs/en/user-guide/data-processing/chyt/general

ClickHouse as query engine to Spark SQL https://github.com/oap-project/gluten/tree/main/backends-clickhouse

AFAIK, there were other attempts as well.

@alexey-milovidov
Copy link
Member

I think a stable ABI is not the best option.

A better option is to include the code in the repository (either directly or using a submodule), so it will be built and tested with ClickHouse, and compatibility will be ensured, and at the same time, we will be free to break, rewrite, or remove certain interfaces in the code.

Advantages:

  1. No burden of introducing plain C ABI (and there is no such thing as stable C++ ABI).
  2. Fearless refactorings.
  3. The code is tested with sanitizers and fuzzing.
  4. Guaranteed compatibility with ClickHouse releases in the scope covered by tests.
  5. The scope of compatibility we have to support is explicitly defined and bounded.

For example, we can see LLVM. There are some plain-C library wrappers, but ClickHouse is using LLVM directly. Upgrading LLVM versions can break the interfaces that we use, but it is alright to us because ClickHouse is linked statically with LLVM.

While chdb is a dynamic library, similar to a dynamically linked executable, the dynamic library itself can be linked statically with all the dependencies. Therefore there is no concern about ABI breakage of the dependent libraries.

@ahmed-adly-khalil
Copy link

this is neat work 👍

@lmangani
Copy link
Contributor

Update for those interested: The current plan is to refactor and merge the chdb library builder (libchdb) back into the main ClickHouse codebase (just like clickhouse-local, clickhouse-keeper and other apps) and use the produced library to build chdb and its many bindings w/ stable and tested hooks into any future ClickHouse release and new features, lowering if not completely removing innovation barriers. Thanks @alexey-milovidov for supporting this exciting community initiative!

If you like the idea or want to adopt a binding - please join in! This will be a community driven initiative and we'll need many eyes, hands and brains to make this as great as it should be!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants