Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document Azure and GCP integration #8662

Closed
ajantha-bhat opened this issue Sep 27, 2023 · 4 comments
Closed

Document Azure and GCP integration #8662

ajantha-bhat opened this issue Sep 27, 2023 · 4 comments
Labels

Comments

@ajantha-bhat
Copy link
Member

ajantha-bhat commented Sep 27, 2023

We have a nice documentation for AWS (https://iceberg.apache.org/docs/latest/aws/) which explains if the warehouse path is S3, what configurations are needed for catalog and what dependencies are needed.

Expecting a similar documentation (which also covers dependent libraries and all the authentication methods. For example Azure's accesskey, SAS token, principal)

@ajantha-bhat
Copy link
Member Author

I am clearly not the expert here on this to add documentation.
I saw that we have ADLSFileIO and GCSFileIO classes in code.
But didn't find any catalog level examples or test cases for the same.

@ldacey
Copy link

ldacey commented Sep 27, 2023

I was confused with where to start on GCP. I haven't dug too deep because I know there is no write support yet for pyiceberg (and I do not have Spark installed at all), but I did want to test a few things.

I know I need a catalog and it looks like the easiest one might be my PostgreSQL cloud SQL database:

catalog:
  default:
    type: sql
    uri: postgresql+psycopg2://username:password@localhost/mydatabase
    

But I did not see a description of how to create the actual table, if the database needs to be named specifically, what is the schema of the database table? Does it get created automatically if I just create a "iceberg" database with a iceberg user and password?

Since I am on GCP, I know of Dataproc as well and I have explored Dataplex before, but I am not sure if I can use that as a catalog for pyiceberg. If I can use this, I am not sure if there is a benefit compared to the PostgreSQL approach above either way.

I might be missing something - I am basing this information off of this documentation: https://py.iceberg.apache.org/configuration/

Copy link

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

@github-actions github-actions bot added the stale label Sep 20, 2024
Copy link

github-actions bot commented Oct 6, 2024

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants