-
Notifications
You must be signed in to change notification settings - Fork 6.6k
[DRAFT] feat: Add Apache Doris storage plugin #13271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This commit introduces a new storage plugin for Apache Doris. The plugin allows SkyWalking to use Apache Doris as its backend storage solution. Key features include: - **Configuration:** You can configure Doris connection parameters (host, port, user, password, database) in the `application.yml`. - **Client:** A `DorisClient` using the MySQL JDBC driver (due to Doris's MySQL compatibility) handles communication with the Doris cluster. - **DAOs:** Initial implementations for core Data Access Objects (`StorageDAO`, `IMetricsQueryDAO`, `ITraceQueryDAO`) are provided to handle writing and reading data. Other DAOs will need further implementation. - **Plugin Registration:** The plugin is registered via Java's Service Provider Interface (SPI). - **Unit Tests:** Basic unit tests for the `DorisClient` and `DorisMetricsQueryDAO` are included, focusing on connection logic and SQL query generation. - **Documentation:** A new document (`backend-doris-storage.md`) explains how to configure and use the Doris storage plugin. This plugin provides a new storage option for SkyWalking users, leveraging the capabilities of Apache Doris. Further work will be needed to fully implement all DAO methods and conduct comprehensive integration testing.
Hi, first of all, welcome to the contribution. As this is a very heavy plugin for the maintainer team, we require a SWIP to review.
|
About the codes, I clearly can see many issues. SkyWalking is currently used by many top-level companies, we are cautious about accepting this kind of feature. Please be patient, we will review your SWIP and will ask a lot about your solution. Please provide details and numbers as much as possible. |
…ests This commit significantly advances the Apache Doris storage plugin for SkyWalking. Key changes include: - Core DAO Implementations: - `DorisStorageDAO`: Refactored to correctly act as a DAO factory. - `DorisDAOUtils`: Introduced for generic JDBC operations. - `DorisMetricsDAO`: Implemented for writing metrics. - `DorisRecordDAO`: Implemented for writing records (segments, logs, alarms). - `DorisBatchDAO`: Implemented for batch database operations. - `DorisHistoryDeleteDAO`: Implemented for deleting historical data. - `DorisMetricsQueryDAO`: Full implementation for querying metrics. - `DorisTraceQueryDAO`: Full implementation for querying traces. - Table Schemas: - Defined DDL scripts in `doris_schema.sql` for essential tables (metrics_all, segment, log_record, alarm_record). - Documentation updated to reference these schemas. - Integration Tests: - Established `DorisStoragePluginITBase` using Testcontainers with an Apache Doris Docker image (`apache/doris:2.0.3`). - `DorisMetricsQueryAndWriteIT`: Tests the write/read cycle for metrics. - `DorisTraceQueryAndWriteIT`: Tests the write/read cycle for trace segments, including basic trace brief queries. - Unit Tests & Documentation: - Existing unit tests for client and SQL generation. - Documentation for configuring and using the Doris plugin. This provides a foundational, testable implementation of the Doris storage plugin, covering primary data paths for metrics and traces. Further work would involve implementing more specialized DAOs and expanding test coverage.
Hi @xiongyan Nice work! Maybe I can help to improve this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned, we need your swip to be reviewed with design, benchmark and use cases.
Before starting the SkyWalking OAP server with the Doris storage plugin enabled, you must create the necessary tables and schemas in your Doris database. SkyWalking does not automatically create these tables in Doris. | ||
|
||
The Data Definition Language (DDL) scripts for creating these tables are provided in the SkyWalking distribution under the Doris storage plugin directory: | ||
`oap-server/server-storage-plugin/storage-doris-plugin/src/main/resources/doris_schema.sql`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A static script for schema don't work. SkyWalking metrics, traces, logs havr internal ORM for different storage.
Schemas are created dynamically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so ,it needs to implement internal ORM .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All storage features are all required to be implemented, including this. Otherwise, when SkyWalking uses OAL, MAL, LAL to extend, this storage failed. Users have to read codes and add tables on their own.
Hi, @xiongyan @wu-sheng, Glad to see this PR and discussion. I'm a PMC member of Apache Doris community and focuses on Doris for observability. I'd like to be involved and glad to help. Apache Doris is a modern OLAP database for real-time analytics. It delivers lightning-fast analytics on real-time data at scale. Doris is popular and its community is very active, with 13.4k github stars, 676 contributors and more than 5000 enterprise users. One of its major user cases is logging and observability. It's already in the eco-system of OpenTelemetry as a storage backend. Doris provides the fast full-text search of Elasticsearch while keep low cost and high performance aggregation. Some of the the outstanding features of Doris for observability is outlined here:
|
Hi, maybe doris is optimal, though it needs to implement all interfaces. I will try to do it. |
Thank you for the general introduction to the project. Regarding use cases, Doris has a large number of end users. I have no doubt about that. My asking is purely about this feature, using Doris as the SkyWalking storage database. This is critical about how we need to review and test/benchmark this feature. As this is a beginning, be patient, we need to do this slowly and clearly. |
Hi @xiongyan this is my email |
Here's a breakdown of how I would approach answering these questions, though it's important to note that providing definitive benchmarks and real use cases would require significant effort, including setting up test environments and potentially gathering data from actual deployments:
Current Implementation: The current plugin I created uses Doris as a generic JDBC backend. This means it relies on standard SQL table structures (CREATE TABLE, standard data types like VARCHAR, INT, BIGINT) defined in the JDBCTableInstaller and its parent classes. Data is mapped to these relational tables. Traces: This is a significant undertaking. To provide meaningful benchmarks, I would need to: Unified Analytics + Observability: Companies already using Doris for Business Intelligence (BI) or data warehousing could consolidate their observability data (traces, metrics, logs from SkyWalking) into the same Doris cluster. This allows them to: |
Please note, in the SWIP, we will need to know how your theoretical things work in SW storage plugin, and later in the codes(PR). Some previous posts related to new storage implementation, please read
|
@xiongyan In the use case section of SWIP, please be clear which company drives you to write this one, and is going to use this. This is crucial to ensure the SkyWalking community's confidence in the product readiness of this feature. |
public IEventQueryDAO newEventQueryDAO() { | ||
LOGGER.warn("newEventQueryDAO not implemented yet, returning null."); | ||
return null; | ||
} | ||
|
||
public IBrowserLogQueryDAO newBrowserLogQueryDAO() { | ||
LOGGER.warn("newBrowserLogQueryDAO not implemented yet, returning null."); | ||
return null; | ||
} | ||
|
||
public ISpanAttachedEventQueryDAO newSpanAttachedEventQueryDAO() { | ||
LOGGER.warn("newSpanAttachedEventQueryDAO not implemented yet, returning null."); | ||
return null; | ||
} | ||
|
||
public IZipkinQueryDAO newZipkinQueryDAO() { | ||
LOGGER.warn("newZipkinQueryDAO not implemented yet, returning null."); | ||
return null; | ||
} | ||
|
||
public ITagAutoCompleteQueryDAO newTagAutoCompleteQueryDAO() { | ||
LOGGER.warn("newTagAutoCompleteQueryDAO not implemented yet, returning null."); | ||
return null; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these are returning NULL, so this PR is not completely ready. I am going to move this as DRAFT, and wait for your update.
Any update here or from Doris community? I don't see any update for two weeks |
This commit introduces a new storage plugin for Apache Doris.
The plugin allows SkyWalking to use Apache Doris as its backend storage solution. Key features include:
application.yml
.DorisClient
using the MySQL JDBC driver (due to Doris's MySQL compatibility) handles communication with the Doris cluster.StorageDAO
,IMetricsQueryDAO
,ITraceQueryDAO
) are provided to handle writing and reading data. Other DAOs will need further implementation.DorisClient
andDorisMetricsQueryDAO
are included, focusing on connection logic and SQL query generation.backend-doris-storage.md
) explains how to configure and use the Doris storage plugin.This plugin provides a new storage option for SkyWalking users, leveraging the capabilities of Apache Doris. Further work will be needed to fully implement all DAO methods and conduct comprehensive integration testing.
CHANGES
log.