diff --git a/site/content/in-dev/unreleased/getting-started/creating-a-catalog/s3/catalog-aws.md b/site/content/in-dev/unreleased/getting-started/creating-a-catalog/s3/catalog-aws.md index 5da41de0f5..b86ac874f8 100644 --- a/site/content/in-dev/unreleased/getting-started/creating-a-catalog/s3/catalog-aws.md +++ b/site/content/in-dev/unreleased/getting-started/creating-a-catalog/s3/catalog-aws.md @@ -23,16 +23,21 @@ type: docs weight: 100 --- +When creating a catalog based on AWS S3 storage only the `role-arn` is a required parameter. However, usually +one also provides the `region` and +[external-id](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_common-scenarios_third-party.html). -### example +Note: the name `quickstart_catalog` from the example below is referenced in other Getting Started examples, +but of course, it can be any valid catalog name. ```shell -CLIENT_ID=root \ -CLIENT_SECRET=s3cr3t \ -DEFAULT_BASE_LOCATION=s3://example-bucket/my_data \ -ROLE_ARN=arn:aws:iam::111122223333:role/ExampleCorpRole \ -REGION=us-west-2 \ -EXTERNAL_ID=12345678901234567890 \ +CLIENT_ID=root +CLIENT_SECRET=s3cr3t +DEFAULT_BASE_LOCATION=s3://example-bucket/my_data +ROLE_ARN=arn:aws:iam::111122223333:role/ExampleCorpRole +REGION=us-west-2 +EXTERNAL_ID=12345678901234567890 + ./polaris \ --client-id ${CLIENT_ID} \ --client-secret ${CLIENT_SECRET} \ @@ -43,5 +48,5 @@ EXTERNAL_ID=12345678901234567890 \ --role-arn ${ROLE_ARN} \ --region ${REGION} \ --external-id ${EXTERNAL_ID} \ - my_aws_catalog + quickstart_catalog ``` \ No newline at end of file diff --git a/site/content/in-dev/unreleased/getting-started/creating-a-catalog/s3/catalog-minio.md b/site/content/in-dev/unreleased/getting-started/creating-a-catalog/s3/catalog-minio.md index 18491617cd..cdeeb12775 100644 --- a/site/content/in-dev/unreleased/getting-started/creating-a-catalog/s3/catalog-minio.md +++ b/site/content/in-dev/unreleased/getting-started/creating-a-catalog/s3/catalog-minio.md @@ -23,94 +23,41 @@ type: docs weight: 200 --- -In this guide we walk through setting up a simple Polaris Server with local [MinIO](https://www.min.io/) storage. +When creating a catalog based on MinIO storage it is important to configure the `endpoint` property to point +to your own MinIO cluster. If the `endpoint` property is not set, Polaris will attempt to contact AWS +storage services (which is certain to fail in this case). -Similar configurations are expected to work with other S3-compatible systems that also have the -[STS](https://docs.aws.amazon.com/STS/latest/APIReference/welcome.html) API. +Note: the region setting is not required by MinIO, but it is set in this example for the sake of +simplicity as it is usually required by the AWS SDK (used internally by Polaris). One can also +set the `AWS_REGION` environment variable in the Polaris server process and avoid setting region +as a catalog property. -# Setup - -Clone the Polaris source repository, then build a docker image for Polaris. +Note: the name `quickstart_catalog` from the example below is referenced in other Getting Started examples, +but of course, it can be any valid catalog name. ```shell -./gradlew :polaris-server:assemble -Dquarkus.container-image.build=true -``` - -Start MinIO with Polaris using the `docker compose` example. - -```shell -docker compose -f getting-started/minio/docker-compose.yml up -``` - -The compose script will start MinIO on default ports (API on 9000, UI on 9001) -plus a Polaris Server pre-configured to that MinIO instance. - -In this example the `root` principal has its password set to `s3cr3t`. - -# Connecting from Spark - -Start Spark. - -```shell -export AWS_REGION=us-west-2 - -bin/spark-sql \ - --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.9.0,org.apache.iceberg:iceberg-aws-bundle:1.9.0 \ - --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \ - --conf spark.sql.catalog.polaris=org.apache.iceberg.spark.SparkCatalog \ - --conf spark.sql.catalog.polaris.type=rest \ - --conf spark.sql.catalog.polaris.uri=http://localhost:8181/api/catalog \ - --conf spark.sql.catalog.polaris.token-refresh-enabled=false \ - --conf spark.sql.catalog.polaris.warehouse=quickstart_catalog \ - --conf spark.sql.catalog.polaris.scope=PRINCIPAL_ROLE:ALL \ - --conf spark.sql.catalog.polaris.header.X-Iceberg-Access-Delegation=vended-credentials \ - --conf spark.sql.catalog.polaris.credential=root:s3cr3t +CLIENT_ID=root +CLIENT_SECRET=s3cr3t +DEFAULT_BASE_LOCATION=s3://example-bucket/my_data +REGION=us-west-2 + +./polaris \ + --client-id ${CLIENT_ID} \ + --client-secret ${CLIENT_SECRET} \ + catalogs \ + create \ + --storage-type s3 \ + --endpoint http://127.0.0.1:9100 + --default-base-location ${DEFAULT_BASE_LOCATION} \ + --region ${REGION} \ + quickstart_catalog ``` -Note: `AWS_REGION` is required by the AWS SDK used by Spark, but the value is irrelevant in this case. - -Create a table in Spark. - -```sql -use polaris; -create namespace ns; -create table ns.t1 as select 'abc'; -select * from ns.t1; -``` - -# Connecting from MinIO client - -```shell -mc alias set pol http://localhost:9000 minio_root m1n1opwd -mc ls pol/bucket123/ns/t1 -[2025-08-13 18:52:38 EDT] 0B data/ -[2025-08-13 18:52:38 EDT] 0B metadata/ -``` - -Note: the values of `minio_root`, `m1n1opwd` and `bucket123` are defined in the docker compose file. - -# Notes on Storage Configuation - -In this example the Polaris Catalog is defined as (excluding uninteresting properties): - -```json - { - "name": "quickstart_catalog", - "storageConfigInfo": { - "endpoint": "http://localhost:9000", - "endpointInternal": "http://minio:9000", - "pathStyleAccess": true, - "storageType": "S3", - "allowedLocations": [ - "s3://bucket123" - ] - } - } -``` +In more complex deployments it may be necessary to configure different endpoints for S3 requests +and for STS (AssumeRole) requests. This can be achieved via the `--sts-endpoint` CLI option. -Note that the `roleArn` parameter, which is required for AWS storage, does not need to be set for MinIO. +Additionally, the `--endpoint-internal` CLI option cane be used to set the S3 endpoint for use by +the Polaris Server itself, if it needs to be different from the endpoint used by clients / engines. -Note the two endpoint values. `endpointInternal` is used by the Polaris Server, while `endpoint` is communicated -to clients (such as Spark) in Iceberg REST API responses. This distinction allows the system to work smoothly -when the clients and the server have different views of the network (in this example the host name `minio` is -resolvable only inside the docker compose environment). \ No newline at end of file +A usable MinIO example for `docker-compose` is available in the Polaris source code under the +[getting-started/minio](https://github.com/apache/polaris/tree/main/getting-started/minio) module. diff --git a/site/content/in-dev/unreleased/getting-started/deploying-polaris/cloud-deploy/deploy-aws.md b/site/content/in-dev/unreleased/getting-started/deploying-polaris/cloud-deploy/deploy-aws.md index 13933f25ee..d62e13e093 100644 --- a/site/content/in-dev/unreleased/getting-started/deploying-polaris/cloud-deploy/deploy-aws.md +++ b/site/content/in-dev/unreleased/getting-started/deploying-polaris/cloud-deploy/deploy-aws.md @@ -45,7 +45,9 @@ export CLIENT_SECRET=s3cr3t ``` ## Next Steps -Congrats, you now have a running instance of1 Polaris! For details on how to use Polaris, check out the [Using Polaris]({{% relref "../../using-polaris" %}}) page. +Congrats, you now have a running instance of Polaris! For further information regarding how to use Polaris, +check out the [Creating a Catalog]({{% ref "../../creating-a-catalog" %}}) and +[Using Polaris]({{% relref "../../using-polaris" %}}) pages. ## Cleanup Instructions To shut down the Polaris server, run the following commands: diff --git a/site/content/in-dev/unreleased/getting-started/deploying-polaris/cloud-deploy/deploy-azure.md b/site/content/in-dev/unreleased/getting-started/deploying-polaris/cloud-deploy/deploy-azure.md index cdb911b3ab..4d25f86af0 100644 --- a/site/content/in-dev/unreleased/getting-started/deploying-polaris/cloud-deploy/deploy-azure.md +++ b/site/content/in-dev/unreleased/getting-started/deploying-polaris/cloud-deploy/deploy-azure.md @@ -40,7 +40,9 @@ export CLIENT_SECRET=s3cr3t ``` ## Next Steps -Congrats, you now have a running instance of Polaris! For further information regarding how to use Polaris, check out the [Using Polaris]({{% relref "../../using-polaris" %}}) page. +Congrats, you now have a running instance of Polaris! For further information regarding how to use Polaris, +check out the [Creating a Catalog]({{% ref "../../creating-a-catalog" %}}) and +[Using Polaris]({{% relref "../../using-polaris" %}}) pages. ## Cleanup Instructions To shut down the Polaris server, run the following commands: diff --git a/site/content/in-dev/unreleased/getting-started/deploying-polaris/cloud-deploy/deploy-gcp.md b/site/content/in-dev/unreleased/getting-started/deploying-polaris/cloud-deploy/deploy-gcp.md index d50a2c9d3a..384433d83c 100644 --- a/site/content/in-dev/unreleased/getting-started/deploying-polaris/cloud-deploy/deploy-gcp.md +++ b/site/content/in-dev/unreleased/getting-started/deploying-polaris/cloud-deploy/deploy-gcp.md @@ -40,7 +40,9 @@ export CLIENT_SECRET=s3cr3t ``` ## Next Steps -Congrats, you now have a running instance of Polaris! For further information regarding how to use Polaris, check out the [Using Polaris]({{% relref "../../using-polaris" %}}) page. +Congrats, you now have a running instance of Polaris! For further information regarding how to use Polaris, +check out the [Creating a Catalog]({{% ref "../../creating-a-catalog" %}}) and +[Using Polaris]({{% relref "../../using-polaris" %}}) pages. ## Cleanup Instructions To shut down the Polaris server, run the following commands: diff --git a/site/content/in-dev/unreleased/getting-started/deploying-polaris/local-deploy.md b/site/content/in-dev/unreleased/getting-started/deploying-polaris/local-deploy.md index 8c9d6afc12..c2b7b41743 100644 --- a/site/content/in-dev/unreleased/getting-started/deploying-polaris/local-deploy.md +++ b/site/content/in-dev/unreleased/getting-started/deploying-polaris/local-deploy.md @@ -114,4 +114,6 @@ docker run --name trino -d -p 8080:8080 trinodb/trino ``` ## Next Steps -Congrats, you now have a running instance of Polaris! For further information regarding how to use Polaris, check out the [Using Polaris]({{% ref "../using-polaris" %}}) page. +Congrats, you now have a running instance of Polaris! For further information regarding how to use Polaris, +check out the [Creating a Catalog]({{% ref "../creating-a-catalog" %}}) and +[Using Polaris]({{% ref "../using-polaris" %}}) pages. diff --git a/site/content/in-dev/unreleased/getting-started/using-polaris.md b/site/content/in-dev/unreleased/getting-started/using-polaris.md index ec509a404b..981c9fc356 100644 --- a/site/content/in-dev/unreleased/getting-started/using-polaris.md +++ b/site/content/in-dev/unreleased/getting-started/using-polaris.md @@ -31,29 +31,13 @@ export CLIENT_ID=YOUR_CLIENT_ID export CLIENT_SECRET=YOUR_CLIENT_SECRET ``` -## Defining a Catalog +Refer to the [Creating a Catalog]({{% ref "creating-a-catalog" %}}) page for instructions on defining a +catalog for your specific storage type. The following examples assume the catalog's name is `quickstart_catalog`. -In Polaris, the [catalog]({{% relref "../entities#catalog" %}}) is the top-level entity that objects like [tables]({{% relref "../entities#table" %}}) and [views]({{% relref "../entities#view" %}}) are organized under. With a Polaris service running, you can create a catalog like so: +In Polaris, the [catalog]({{% relref "../entities#catalog" %}}) is the top-level entity that objects like [tables]({{% relref "../entities#table" %}}) and [views]({{% relref "../entities#view" %}}) are organized under. -```shell -cd ~/polaris - -./polaris \ - --client-id ${CLIENT_ID} \ - --client-secret ${CLIENT_SECRET} \ - catalogs \ - create \ - --storage-type s3 \ - --default-base-location ${DEFAULT_BASE_LOCATION} \ - --role-arn ${ROLE_ARN} \ - quickstart_catalog -``` - -This will create a new catalog called **quickstart_catalog**. If you are using one of the Getting Started locally-built Docker images, we have already created a catalog named `quickstart_catalog` for you. - -The `DEFAULT_BASE_LOCATION` you provide will be the default location that objects in this catalog should be stored in, and the `ROLE_ARN` you provide should be a [Role ARN](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference-arns.html) with access to read and write data in that location. These credentials will be provided to engines reading data from the catalog once they have authenticated with Polaris using credentials that have access to those resources. - -If you’re using a storage type other than S3, such as Azure, you’ll provide a different type of credential than a Role ARN. For more details on supported storage types, see the [docs]({{% relref "../entities#storage-type" %}}). +The `DEFAULT_BASE_LOCATION` value you provided at catalog creation time will be the default location that objects in +this catalog should be stored in. Additionally, if Polaris is running somewhere other than `localhost:8181`, you can specify the correct hostname and port by providing `--host` and `--port` flags. For the full set of options supported by the CLI, please refer to the [docs]({{% relref "../command-line-interface" %}}).