Document Azure and GCP integration #8662

ajantha-bhat · 2023-09-27T07:02:29Z

We have a nice documentation for AWS (https://iceberg.apache.org/docs/latest/aws/) which explains if the warehouse path is S3, what configurations are needed for catalog and what dependencies are needed.

Expecting a similar documentation (which also covers dependent libraries and all the authentication methods. For example Azure's accesskey, SAS token, principal)

ajantha-bhat · 2023-09-27T07:13:03Z

I am clearly not the expert here on this to add documentation.
I saw that we have ADLSFileIO and GCSFileIO classes in code.
But didn't find any catalog level examples or test cases for the same.

ldacey · 2023-09-27T23:16:41Z

I was confused with where to start on GCP. I haven't dug too deep because I know there is no write support yet for pyiceberg (and I do not have Spark installed at all), but I did want to test a few things.

I know I need a catalog and it looks like the easiest one might be my PostgreSQL cloud SQL database:

catalog:
  default:
    type: sql
    uri: postgresql+psycopg2://username:password@localhost/mydatabase

But I did not see a description of how to create the actual table, if the database needs to be named specifically, what is the schema of the database table? Does it get created automatically if I just create a "iceberg" database with a iceberg user and password?

Since I am on GCP, I know of Dataproc as well and I have explored Dataplex before, but I am not sure if I can use that as a catalog for pyiceberg. If I can use this, I am not sure if there is a benefit compared to the PostgreSQL approach above either way.

I might be missing something - I am basing this information off of this documentation: https://py.iceberg.apache.org/configuration/

github-actions · 2024-09-20T00:15:22Z

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions · 2024-10-06T00:16:07Z

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'

ajantha-bhat mentioned this issue Sep 27, 2023

Example to use this tool with ADLS? databricks/iceberg-kafka-connect#94

Open

ismailsimsek mentioned this issue Sep 27, 2023

Support (or document) Azure Storage as sink memiiso/debezium-server-iceberg#222

Open

github-actions bot added the stale label Sep 20, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document Azure and GCP integration #8662

Document Azure and GCP integration #8662

ajantha-bhat commented Sep 27, 2023 •

edited

Loading

ajantha-bhat commented Sep 27, 2023

ldacey commented Sep 27, 2023

github-actions bot commented Sep 20, 2024

github-actions bot commented Oct 6, 2024

Document Azure and GCP integration #8662

Document Azure and GCP integration #8662

Comments

ajantha-bhat commented Sep 27, 2023 • edited Loading

ajantha-bhat commented Sep 27, 2023

ldacey commented Sep 27, 2023

github-actions bot commented Sep 20, 2024

github-actions bot commented Oct 6, 2024

ajantha-bhat commented Sep 27, 2023 •

edited

Loading