Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release v0.3 #153

Closed
waynexia opened this issue Jul 27, 2022 · 7 comments
Closed

Release v0.3 #153

waynexia opened this issue Jul 27, 2022 · 7 comments
Labels
tracking issue Issue tracks progress for something
Milestone

Comments

@waynexia
Copy link
Member

Description

We prepare to release v0.3 at the end of Aug. Here is the feature list:

  • Release multi-language client. Include Java, Rust and Python.
  • Support static cluster mode. And keep pushing toward a full-featured dynamic distributed version (related project: Distributed CeresDB).
  • Extend supported SQLs (tag: A-SQL).
  • Implement the hybrid storage format. And support reading from two formats.

Feel free to suggest or discuss other features you would like to add ❤️

@waynexia waynexia added the tracking issue Issue tracks progress for something label Jul 27, 2022
@waynexia waynexia pinned this issue Jul 27, 2022
@dust1
Copy link
Contributor

dust1 commented Aug 1, 2022

Will ceresdb support multiple data sources? e.g. read records from mysql's REDO log and structure them into ceresdb's data structure storage

@jiacai2050
Copy link
Contributor

Will ceresdb support multiple data sources?

This sounds like data ingest, are you meaning bulk load?

@dust1
Copy link
Contributor

dust1 commented Aug 1, 2022

Will ceresdb support multiple data sources?

This sounds like data ingest, are you meaning bulk load?

yes, which means that ceresdb can import data from other existing commercial database files. I don't know much about this, so i not sure the terminology.

@jiacai2050
Copy link
Contributor

jiacai2050 commented Aug 1, 2022

I think bulk ingest is an important feature for easy adoption, prometheus/influxdb all support this, so will we.

@waynexia
Copy link
Member Author

waynexia commented Aug 2, 2022

This might cover three scenarios. Let's narrow our discussion:

  • For offline data migration, our persistent format is relatively straightforward -- only a few metadata and data in the parquet format, all stored in OSS. We can achieve this in a few ways. And for some common formats like CSV or standard parquet generated in other systems, we can also support them directly.
  • Online data ingesting, on the other hand, would be a little more complicated. Maybe we need to add support for consuming data from streaming systems like Kafka, Flink, Pulsar or others. They have splendid ecosystems. By supporting them we can easily be integrated into various systems as a downstream warehouse.
  • The last one is querying from other databases. This may be a little off-topic but let me mention it as well. CeresDB is only a query frontend in this situation. In some cases I can imagine there are other projects that can do this. So I'll assign a low priority to this.

Offline migrating implementations are different case by case. We can support needed upstream on demand. Online ingesting also has a few candidating upstream, but I believe there is a common pattern among them. We can choose one to support at first if we decide to work on this. It can take a lot of effort and we need to discuss it further.

@archerny
Copy link
Contributor

archerny commented Aug 7, 2022

Thanks for the summary @waynexia . I will give some additional comments on these scenarios.

  1. For data migration or data initialization from external data source, there could be some tools. But as far as I know, demands of this scenario is not so frequent. This feature can be implemented as an independent binary, like tools in mysql ecosystem. We can discuss this feature later.
  2. Online data ingestion, this is a much more complex topic. If we start working on this, we should consider latency, consistency, transformation and other aspects in real-time computing. These requirements are commonly implemented using stream-computing framework like Apache Flink. So, in my opinion, the CeresDB project will be more focusing on core features of time-series database its own.
  3. For the scenario: querying from other databases, there is a better choice, presto. So, we will not work on this direction.

@jiacai2050 jiacai2050 added this to the Release v0.3 milestone Aug 29, 2022
@jiacai2050
Copy link
Contributor

jiacai2050 commented Aug 29, 2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tracking issue Issue tracks progress for something
Projects
None yet
Development

No branches or pull requests

4 participants