NPE when using workload identity federation to fetch table #1099

zhz125 · 2023-10-20T18:34:53Z

Spark version: 3.2.4
Scala version: 2.12
Connector used: spark-3.2-bigquery-0.33.0.jar

Code:

   df = spark.read.format("bigquery").option("parentProject","***").option("credentialsFile", "/opt/bitnami/spark/creds_wif.json").load("bigquery-public-data:covid19_open_data.covid19_open_data")
   df.show()

Error:

  File "/opt/bitnami/spark/python/pyspark/sql/dataframe.py", line 494, in show
    print(self._jdf.showString(n, 20, vertical))
  File "/opt/bitnami/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1321, in __call__
    return_value = get_return_value(
  File "/opt/bitnami/spark/python/pyspark/sql/utils.py", line 111, in deco
    return f(*a, **kw)
  File "/opt/bitnami/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py", line 326, in get_return_value
    raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o48.showString.
: java.lang.NullPointerException
	at com.google.cloud.bigquery.connector.common.ReadSessionCreator.create(ReadSessionCreator.java:166)
	at com.google.cloud.spark.bigquery.v2.context.BigQueryDataSourceReaderContext.createReadSession(BigQueryDataSourceReaderContext.java:293)
	at com.google.cloud.spark.bigquery.repackaged.com.google.common.base.Suppliers$NonSerializableMemoizingSupplier.get(Suppliers.java:181)
	at com.google.cloud.spark.bigquery.v2.context.BigQueryDataSourceReaderContext.planBatchInputPartitionContexts(BigQueryDataSourceReaderContext.java:210)
	at com.google.cloud.spark.bigquery.v2.Spark31BigQueryScanBuilder.planInputPartitions(Spark31BigQueryScanBuilder.java:154)
	at org.apache.spark.sql.execution.datasources.v2.BatchScanExec.partitions$lzycompute(BatchScanExec.scala:52)
	at org.apache.spark.sql.execution.datasources.v2.BatchScanExec.partitions(BatchScanExec.scala:52)
	at org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExecBase.supportsColumnar(DataSourceV2ScanExecBase.scala:93)
	at org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExecBase.supportsColumnar$(DataSourceV2ScanExecBase.scala:92)
	at org.apache.spark.sql.execution.datasources.v2.BatchScanExec.supportsColumnar(BatchScanExec.scala:35)
	at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Strategy.apply(DataSourceV2Strategy.scala:123)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$1(QueryPlanner.scala:63)
	at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
	at org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:67)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$3(QueryPlanner.scala:78)
	at scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:196)
	at scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:194)
	at scala.collection.Iterator.foreach(Iterator.scala:943)
	at scala.collection.Iterator.foreach$(Iterator.scala:943)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
	at scala.collection.TraversableOnce.foldLeft(TraversableOnce.scala:199)
	at scala.collection.TraversableOnce.foldLeft$(TraversableOnce.scala:192)
	at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1431)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$2(QueryPlanner.scala:75)
	at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
	at org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:67)
	at org.apache.spark.sql.execution.QueryExecution$.createSparkPlan(QueryExecution.scala:453)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$sparkPlan$1(QueryExecution.scala:144)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:183)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:183)
	at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:144)
	at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:137)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executedPlan$1(QueryExecution.scala:157)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:183)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:183)
	at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:157)
	at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:150)
	at org.apache.spark.sql.execution.QueryExecution.simpleString(QueryExecution.scala:201)
	at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:246)
	at org.apache.spark.sql.execution.QueryExecution.explainString(QueryExecution.scala:215)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:98)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
	at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3704)
	at org.apache.spark.sql.Dataset.head(Dataset.scala:2728)
	at org.apache.spark.sql.Dataset.take(Dataset.scala:2935)
	at org.apache.spark.sql.Dataset.getRows(Dataset.scala:287)
	at org.apache.spark.sql.Dataset.showString(Dataset.scala:326)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.base/java.lang.Thread.run(Thread.java:829)

The text was updated successfully, but these errors were encountered:

…edentials

davidrabinowitz · 2023-12-21T23:36:50Z

Fixed in version 0.34.0

davidrabinowitz added a commit to davidrabinowitz/spark-bigquery-connector that referenced this issue Oct 23, 2023

Issue GoogleCloudDataproc#1099: Fixing the usage of ExternalAccountCr…

73e0a28

…edentials

davidrabinowitz added a commit to davidrabinowitz/spark-bigquery-connector that referenced this issue Oct 23, 2023

Issue GoogleCloudDataproc#1099: Fixing the usage of ExternalAccountCr…

aae5b5a

…edentials

davidrabinowitz added a commit that referenced this issue Oct 24, 2023

Issue #1099: Fixing the usage of ExternalAccountCredentials (#1102)

0d5a980

davidrabinowitz self-assigned this Dec 21, 2023

davidrabinowitz closed this as completed Dec 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NPE when using workload identity federation to fetch table #1099

NPE when using workload identity federation to fetch table #1099

zhz125 commented Oct 20, 2023

davidrabinowitz commented Dec 21, 2023

NPE when using workload identity federation to fetch table #1099

NPE when using workload identity federation to fetch table #1099

Comments

zhz125 commented Oct 20, 2023

davidrabinowitz commented Dec 21, 2023