-
Notifications
You must be signed in to change notification settings - Fork 3k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Apache Iceberg version
1.6.1
Query engine
Spark
Please describe the bug 🐞
Hey,
We're experimenting setting up an Iceberg-based data lakehouse on BigLake Metastore. We tried to connect Dataproc Spark to BigLake via REST catalog endpoint, but unfortunately it only partially works... We use vended credentials mode.
Spark: 3.5.3, Dataproc: 2.2.38, Iceberg: 1.6.1, Iceberg GCP bundle: 1.6.1.
"org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.6.1,org.apache.iceberg:iceberg-gcp-bundle:1.6.1",
spark = (
SparkSession.builder.appName("biglake")
.config(
"spark.jars.packages",
"org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.6.1,org.apache.iceberg:iceberg-gcp-bundle:1.6.1",
)
.config(
f"spark.sql.catalog.{catalog_name}", "org.apache.iceberg.spark.SparkCatalog"
)
.config(f"spark.sql.catalog.{catalog_name}.type", "rest")
.config(
f"spark.sql.catalog.{catalog_name}.uri",
"https://biglake.googleapis.com/iceberg/v1/restcatalog",
)
.config(f"spark.sql.catalog.{catalog_name}.warehouse", "gs://{project}")
.config(
f"spark.sql.catalog.{catalog_name}.header.x-goog-user-project", project
)
.config(
f"spark.sql.catalog.{catalog_name}.oauth2-server-uri",
"https://biglake.googleapis.com/iceberg/v1/restcatalog/v1/oauth/tokens",
)
.config(
f"spark.sql.catalog.{catalog_name}.rest.auth.type",
"org.apache.iceberg.gcp.auth.GoogleAuthManager",
)
.config(
f"spark.sql.catalog.{catalog_name}.io-impl",
"org.apache.iceberg.gcp.gcs.GCSFileIO",
)
.config(
f"spark.sql.catalog.{catalog_name}.header.X-Iceberg-Access-Delegation",
"vended-credentials",
)
.config(f"spark.sql.catalog.{catalog_name}.rest-metrics-reporting-enabled", "false")
.config(
"spark.sql.extensions",
"org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions",
)
.config("spark.sql.defaultCatalog", "biglake")
.getOrCreate()
)
Metadata (catalog) exploration works (makes us think auth eventually succeeds), but immediately fails when querying actual data e.g. df = spark.sql("SELECT count(*) FROM feature_data.nodes") with Invalid credentials endpoint: null.
Willingness to contribute
- I can contribute a fix for this bug independently
- I would be willing to contribute a fix for this bug with guidance from the Iceberg community
- I cannot contribute a fix for this bug at this time
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working