diff --git a/site/content/blog/2025/10/21/starrocks-polaris-integration.md b/site/content/blog/2025/10/21/starrocks-polaris-integration.md new file mode 100644 index 0000000000..3b7e4edd5b --- /dev/null +++ b/site/content/blog/2025/10/21/starrocks-polaris-integration.md @@ -0,0 +1,324 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: "StarRocks and Apache Polaris Integration: Building a Unified, High-Performance Data Lakehouse" +date: 2025-10-21 +author: Wayne Yang +--- +## Introduction: Why StarRocks + Apache Polaris? + +### Modern Data-Architecture Pain Points: Silos & Engine Lock-in +Today’s data-driven enterprises face two chronic ailments: + +* **Data silos** : transactional data sits in RDBMS, click-streams in an S3 data lake, CRM data in a SaaS vault, etc. Cross-domain analysis requires expensive ETL and still arrives stale. + +* **Engine lock-in** : every OLAP engine optimises for its own metadata layer and file layout. Migrating to a faster or cheaper engine means re-formatting, re-writing and re-governing years of data. + +An open, interoperable architecture is no longer a luxury, it is a survival requirement. + +### StarRocks at a Glance: Speed & Simplicity +[StarRocks](https://www.starrocks.io/) is a modern MPP database designed for high-performance analytics and real-time data processing. Key features include: + +* MPP, fully-vectorised execution, CBO that thrives on complex multi-table joins. + +* Sub-second response on TB-scale data without pre-aggregation. + +* Compute-storage separation since v3.0: scale stateless compute pods in seconds, keep data in cheap object storage. + +### Apache Polaris (Incubating): Vendor-Neutral Iceberg Catalog +* 100% open-source implementation of the Iceberg REST Catalog API. + +* Pluggable metadata backend (PostgreSQL, in-mem) and multi-cloud storage support. + +* Fine-grained RBAC + credential-vending (temporary STS tokens) for secure, governed sharing across engines. + +### Technical benefits +* Keep **ONE** copy of data in Iceberg on S3, queryable by Spark, Flink, Trino, and StarRocks concurrently. + +* Maintain **ONE** set of role-based permissions in Polaris that apply to every engine. + +* Utilize **StarRocks** to deliver BI dashboards, conduct ad-hoc exploration, and perform lightweight ETL operations on the same up-to-date dataset, zero data movement and zero vendor lock-in. + +## Architecture +![](/img/blog/2025/10/21/fig1-polaris-starrocks-architecture.png) + +**Polaris** acts as the single source of truth for: + +* Table schema, partitions, snapshots + +* Role-based access control (RBAC) + +* Short-lived, scoped cloud credentials (credential vending) + +**StarRocks** acts as a stateless compute layer: + +* Discover Iceberg metadata via REST calls + +* Directly reads Parquet/ORC files from cloud storage using the vendored credentials + +* Applies its own CBO and vectorised execution for query acceleration + +## Deploy and Configure Polaris + +User can refer to [Polaris Quickstart](https://polaris.apache.org/releases/1.1.0/getting-started/) to deploy Polaris, +Here we will compile from source code and deploy Polaris via a standalone process. + +### Clone Source Code and Start Polaris + +1. Clone source code and checkout to released version + +User can get the latest released version from https://github.com/apache/polaris/releases +```bash +# download the latest released version 1.1.0-incubating +wget https://dlcdn.apache.org/incubator/polaris/1.1.0-incubating/apache-polaris-1.1.0-incubating.tar.gz +tar -xvzf apache-polaris-1.1.0-incubating.tar.gz +cd apache-polaris-1.1.0-incubating +``` + +2. Build Polaris +```bash +./gradlew \ + :polaris-server:assemble \ + :polaris-server:quarkusAppPartsBuild --rerun \ + :polaris-admin:assemble \ + :polaris-admin:quarkusAppPartsBuild --rerun +``` + +3. Run Polaris + +Ensure you have Java 21+, and export aws access key and secret key first. +```bash +export AWS_ACCESS_KEY_ID= +export AWS_SECRET_ACCESS_KEY= + +./gradlew run +``` + +When Polaris is run using the./gradlew run command, the root principal credentials are **root** and **s3cr3t** for the **CLIENT_ID** and **CLIENT_SECRET**, respectively. + +When using a Gradle-launched Polaris instance, it’ll launch an instance of Polaris that stores entities only **in-memory**. This means that any entities that you define will be destroyed when Polaris is shut down. + +We suggest that users refer to this section (https://polaris.apache.org/releases/1.1.0/metastores/) to configure the metastore to persist Polaris entities. + +```bash +export POLARIS_PERSISTENCE_TYPE=relational-jdbc +export QUARKUS_DATASOURCE_USERNAME= +export QUARKUS_DATASOURCE_PASSWORD= +export QUARKUS_DATASOURCE_JDBC_URL= + +export AWS_ACCESS_KEY_ID= +export AWS_SECRET_ACCESS_KEY= + +./gradlew run +``` + +Using **Admin Tool** (https://polaris.apache.org/releases/1.1.0/admin-tool/) to bootstrap realms and create the necessary principal credentials for the Polaris server. + +For example, to bootstrap the POLARIS realm and create its root principal credential with the client ID **root** and client secret **root_secret**, you can run the following command: + +```bash +java -jar runtime/admin/build/quarkus-app/quarkus-run.jar bootstrap -r POLARIS -c POLARIS,root,root_secret +``` + +### Creating a Principal and Assigning it Privileges + +Use the **Polaris CLI** (already built in the same folder): + +1. Export CLIENT_ID and CLIENT_SECRET +```bash +export CLIENT_ID=root +export CLIENT_SECRET=root_secret +``` + +2. Create catalog +```bash +./polaris + --client-id ${CLIENT_ID} \ + --client-secret ${CLIENT_SECRET} \ + catalogs create \ + --storage-type s3 \ + --default-base-location ${DEFAULT_BASE_LOCATION} \ + --role-arn ${ROLE_ARN} \ + polaris_catalog +``` + +The DEFAULT_BASE_LOCATION you provide will be the default location that objects in this catalog should be stored in, and the ROLE_ARN you provide should be a Role ARN with access to read and write data in that location. +These credentials will be provided to engines reading data from the catalog once they have authenticated with Polaris using credentials that have access to those resources. + +3. Creating a Principal and Assigning it Privileges + +Use below commands to create principal, principal role and catalog role +```bash +./polaris \ + --client-id ${CLIENT_ID} \ + --client-secret ${CLIENT_SECRET} \ + principals \ + create \ + jack + +./polaris \ + --client-id ${CLIENT_ID} \ + --client-secret ${CLIENT_SECRET} \ + principal-roles \ + create \ + test_user_role + +./polaris \ + --client-id ${CLIENT_ID} \ + --client-secret ${CLIENT_SECRET} \ + catalog-roles \ + create \ + --catalog polaris_catalog \ + test_catalog_role +``` + +When the **principals create** commands successfully, it will return the credentials for this new principal,save it. + +Use below command to grant privileges + +```bash +./polaris \ + --client-id ${CLIENT_ID} \ + --client-secret ${CLIENT_SECRET} \ + principal-roles \ + grant \ + --principal jack \ + test_user_role + +./polaris \ + --client-id ${CLIENT_ID} \ + --client-secret ${CLIENT_SECRET} \ + catalog-roles \ + grant \ + --catalog polaris_catalog \ + --principal-role test_user_role \ + test_catalog_role + + ./polaris \ + --client-id ${CLIENT_ID} \ + --client-secret ${CLIENT_SECRET} \ + privileges \ + catalog \ + grant \ + --catalog polaris_catalog \ + --catalog-role test_catalog_role \ + CATALOG_MANAGE_CONTENT +``` + +We grant **CATALOG_MANAGE_CONTENT** privilege to the catalog role `test_catalog_role`, and assign the principal role `test_user_role` to principal `jack`, then assign the catalog role `test_catalog_role` to principal role `test_user_role`. + +## Configure StarRocks Iceberg Catalog + +First, you need to have a StarRocks cluster up and running. Please refer to the [StarRocks Quick Start Guide](https://docs.starrocks.io/docs/quick_start) for instructions on setting up a StarRocks cluster. +Then you can create an external Iceberg catalog in StarRocks that connects to Polaris. + +### Create External Catalog +1. Use credentials vending + +It's recommended to use Polaris's credential vending feature to enhance security by avoiding long-lived static credentials. + +Here is an example of creating an external catalog in StarRocks that connects to Polaris using credential vending: + +```sql +CREATE EXTERNAL CATALOG polaris_catalog +PROPERTIES ( + "iceberg.catalog.uri" = "http://:/api/catalog", + "type" = "iceberg", + "iceberg.catalog.type" = "rest", + "iceberg.catalog.warehouse" = "polaris_catalog", + "iceberg.catalog.security" = "oauth2", + "iceberg.catalog.oauth2.credential" = ":", + "iceberg.catalog.oauth2.scope"='PRINCIPAL_ROLE:ALL', + "aws.s3.region" = "us-west-2", + "iceberg.catalog.vended-credentials-enabled" = "true" + ); +``` +We use jack's credential(client_id and client_secret) created above to access polaris_catalog. + +Users will have the permissions of user jack when accessing Iceberg tables in polaris_catalog. + +2. Use S3 storage credentials + +If you prefer to use static S3 credentials instead of credential vending, you can create the external catalog in StarRocks as follows: +```sql +CREATE EXTERNAL CATALOG polaris_catalog +PROPERTIES ( + "iceberg.catalog.uri" = "http://:/api/catalog", + "type" = "iceberg", + "iceberg.catalog.type" = "rest", + "iceberg.catalog.warehouse" = "polaris_catalog", + "iceberg.catalog.security" = "oauth2", + "iceberg.catalog.oauth2.credential" = ":", + "iceberg.catalog.oauth2.scope"='PRINCIPAL_ROLE:ALL', + "aws.s3.region" = "us-west-2", + "aws.s3.access_key" = "", + "aws.s3.secret_key" = "", + "iceberg.catalog.vended-credentials-enabled" = "false" + ); +``` + +### Manage iceberg tables through StarRocks +Connect to StarRocks and run the following commands to create and query an Iceberg table through the Polaris catalog: + +```sql +-- switch to external iceberg catalog +StarRocks>set catalog polaris_catalog; +Query OK, 0 rows affected (0.00 sec) + +-- create database +StarRocks>create database polaris_db; +Query OK, 0 rows affected (0.06 sec) + +StarRocks>use polaris_db; +Database changed + +-- create iceberg table taxis +StarRocks>CREATE TABLE taxis + -> ( + -> trip_id bigint, + -> trip_distance float, + -> fare_amount double, + -> store_and_fwd_flag string, + -> vendor_id bigint + -> ) + -> PARTITION BY (vendor_id); +Query OK, 0 rows affected + +-- insert data +StarRocks>INSERT INTO taxis + -> VALUES (1000371, 1.8, 15.32, 'N', 1), (1000372, 2.5, 22.15, 'N', 2), (1000373, 0.9, 9.01, 'N', 2), (1000374, 8.4, 42.13, 'Y', 1); +Query OK, 4 rows affected + +-- query iceberg table +StarRocks>select * from taxis; ++---------+---------------+-------------+--------------------+-----------+ +| trip_id | trip_distance | fare_amount | store_and_fwd_flag | vendor_id | ++---------+---------------+-------------+--------------------+-----------+ +| 1000372 | 2.5 | 22.15 | N | 2 | +| 1000373 | 0.9 | 9.01 | N | 2 | +| 1000371 | 1.8 | 15.32 | N | 1 | +| 1000374 | 8.4 | 42.13 | Y | 1 | ++---------+---------------+-------------+--------------------+-----------+ +4 rows in set + +``` + +For more information about using Iceberg table with StarRocks, please refer to: + +https://docs.starrocks.io/docs/data_source/catalog/iceberg/iceberg_catalog \ No newline at end of file diff --git a/site/static/img/blog/2025/10/21/fig1-polaris-starrocks-architecture.png b/site/static/img/blog/2025/10/21/fig1-polaris-starrocks-architecture.png new file mode 100644 index 0000000000..5444269744 Binary files /dev/null and b/site/static/img/blog/2025/10/21/fig1-polaris-starrocks-architecture.png differ