-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[feat](catalog)Support OneLake #57933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
TPC-H: Total hot run time: 34220 ms |
TPC-DS: Total hot run time: 188940 ms |
ClickBench: Total hot run time: 27.38 s |
|
run buildall |
TPC-H: Total hot run time: 35029 ms |
TPC-DS: Total hot run time: 187980 ms |
ClickBench: Total hot run time: 27.42 s |
|
run buildall |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
TPC-H: Total hot run time: 34587 ms |
TPC-DS: Total hot run time: 187671 ms |
ClickBench: Total hot run time: 27.48 s |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
hubgeter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Microsoft OneLake, part of the Fabric ecosystem, is a unified and open SaaS data lake that provides organizations with a centralized logical data storage layer. Data in OneLake is stored in Parquet format and can simultaneously maintain Delta Lake and Apache Iceberg metadata. This design allows multiple analytics engines to directly access shared datasets without data duplication or migration, significantly simplifying data management and governance. By leveraging Apache Doris’s Iceberg REST Catalog, users can directly query and analyze data stored in OneLake—again, without copying or moving it. This integration enables building end-to-end analytics pipelines on a single data lake, combining OneLake’s unified storage and governance capabilities with Doris’s high-performance analytical compute. At the technical level, Doris interacts with OneLake through open table formats and standardized interfaces to access both metadata and Parquet files. The architecture preserves centralized governance, access control, and security policies—ensuring platform reliability, scalability, and maintainability. ``` create catalog onelake_doris PROPERTIES ( 'type' = 'iceberg', 'iceberg.catalog.type' = 'rest', 'uri'='https://onelake.table.fabric.microsoft.com/iceberg', 'warehouse'='<workerspace_id>/<data_item_id>', 'iceberg.rest.security.type'='oauth2', 'iceberg.rest.oauth2.server-uri'='https://login.microsoftonline.com/<talent_id>/oauth2/v2.0/token', 'iceberg.rest.oauth2.credential'='<oauth2.client_id>:'<oauth2.client_secret>, 'iceberg.rest.oauth2.scope'='https://storage.azure.com/.default', 'fs.azure.support'='true', 'azure.endpoint'='https://onelake.dfs.fabric.microsoft.com', 'azure.auth_type'='OAuth2', 'azure.oauth2_account_host'='onelake.dfs.fabric.microsoft.com', 'azure.oauth2_server_uri'='https://login.microsoftonline.com/<talent_id>/oauth2/v2.0/token', 'azure.oauth2_client_id'='<oauth2.client_id>', 'azure.oauth2_client_secret'='<oauth2.client_secret>' ); ```
Microsoft OneLake, part of the Fabric ecosystem, is a unified and open SaaS data lake that provides organizations with a centralized logical data storage layer. Data in OneLake is stored in Parquet format and can simultaneously maintain Delta Lake and Apache Iceberg metadata. This design allows multiple analytics engines to directly access shared datasets without data duplication or migration, significantly simplifying data management and governance.
By leveraging Apache Doris’s Iceberg REST Catalog, users can directly query and analyze data stored in OneLake—again, without copying or moving it.
This integration enables building end-to-end analytics pipelines on a single data lake, combining OneLake’s unified storage and governance capabilities with Doris’s high-performance analytical compute.
At the technical level, Doris interacts with OneLake through open table formats and standardized interfaces to access both metadata and Parquet files. The architecture preserves centralized governance, access control, and security policies—ensuring platform reliability, scalability, and maintainability.