Is it possible to make greptimedb upon databend？ #1150

sundy-li · 2023-03-09T07:51:06Z

sundy-li
Mar 9, 2023

Hello, greptimedb members.

I noticed that greptimedb's query engine is powered by datafusion, which it's similar way like influxdb-iox.

Databend has a high-performance computing engine based on the Arrow memory model. I saw that greptimdb is depended on opendal and its type system and expression referred to the old version of databend's.

So I am wondering if we can build the computation layer of greptimedb on top of it.

Advantages:

We have a high-performance computing engine with expression, function, and aggregate function support, reference to clickbench. greptimedb can focus on optimizing for time-series scenarios.
Greptimedb can reuse the pipeline framework of databend.

Disadvantages:

Databend may need to spilt the computing engine into a library for greptimedb to call.
Some code migration and refactoring work may be required in greptimedb

waynexia · 2023-03-09T08:15:20Z

waynexia
Mar 9, 2023
Maintainer

Hi!

This is a very interesting idea and the performance of databend is very impressive according to the report 🚀 .

As you said, both databend and datafusion have memory models based on Arrow, so there is not much difference between us. But I'm not familiar with databend's execution engine, so I don't know how much it differs from datafusion in terms of functionality and data type support.

And, although we have abstracted the execution engine, there are still many places where we are tied to DataFusion, so switching (or adding support for the second engine) would require a lot of work for both parties. But I'm very optimistic about this proposal. Let's keep communicating on the details ❤️ !

0 replies

MichaelScofield · 2023-03-09T10:43:29Z

MichaelScofield
Mar 9, 2023
Maintainer

The query engine and pipeline framework in Databend look promising. If we were to adopt them, where to start first? @sundy-li

2 replies

sundy-li Mar 9, 2023
Author

We may need to separate the pipeline framework to be a standalone crate.

MichaelScofield Mar 9, 2023
Maintainer

Looking forward to it!

evenyag · 2023-03-09T13:25:34Z

evenyag
Mar 9, 2023
Maintainer

Cool! I'm very excited about this. @sundy-li

GreptimeDB

There are some blockers that greptimedb needs to resolve or workaround:

Its type system and some crates are built upon the official arrow instead of arrow2. But I'm happy to see that the arrow community is going to unify these two crates [Proposal] Combination of arrow-rs and arrow2, deprecation of arrow2 repository jorgecarleitao/arrow2#1429
Its promql extensions, and script engine are highly coupled with datafusion
We must both replace the type system and the query engine due to the first point as the underlying arrow crates are incompatible

Databend

There are also some points that databend might need to consider:

Is it possible to provide a way to allow users to define their own logical/physical plans (or even data types)?
greptimedb and databend have different catalog systems
Is it possible to share the same distributed query framework? DataFusion doesn't provide distributed planning and execution support.
It'd be useful to allow users to attach information to the query context (during planning/executing) and access it via downcasting (ergonomic than HashMap<String, String>)

Share People's Efforts

Recently, I'm going to re-design and refactor our type system, including:

decouple logical data types and physical vector types, so we are able to use different vector implementations to represent the same logical type (e.g. Use DictionaryArray/StringArray for String type)
implement more data types

I just realize that there are might be some other projects repeating the same work:

some might need a simple Value/DataValue wrapper
some might need a Vector/Column
some might also need an expression evaluation framework

Could we implement something like query engine building blocks, such as the type system, expression framework, and execution framework (like velox)? So we could combine people's efforts and eliminate some repeating work? DataFusion tries to solve similar things, but it's highly coupled with the official arrow's API.

Conclusion

It'd be great if users can have more choices. Maybe extracting databend's type system into an independent library would be a good starting point. I can also help bridge arrow and arrow2.

1 reply

sundy-li Mar 10, 2023
Author

I can also help bridge arrow and arrow2.

We are going to move back into arrow-rs. Let's stay tuned.

I'm going to re-design and refactor our type system

A good way to go, we spent nearly 6 months to refactor the type system into a new one.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Greptime

Is it possible to make greptimedb upon databend？ #1150

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Greptime

Is it possible to make greptimedb upon databend？ #1150

sundy-li Mar 9, 2023

Replies: 3 comments · 3 replies

waynexia Mar 9, 2023 Maintainer

MichaelScofield Mar 9, 2023 Maintainer

sundy-li Mar 9, 2023 Author

MichaelScofield Mar 9, 2023 Maintainer

evenyag Mar 9, 2023 Maintainer

GreptimeDB

Databend

Share People's Efforts

Conclusion

sundy-li Mar 10, 2023 Author

sundy-li
Mar 9, 2023

Replies: 3 comments 3 replies

waynexia
Mar 9, 2023
Maintainer

MichaelScofield
Mar 9, 2023
Maintainer

sundy-li Mar 9, 2023
Author

MichaelScofield Mar 9, 2023
Maintainer

evenyag
Mar 9, 2023
Maintainer

sundy-li Mar 10, 2023
Author