Specifying a custom service account for dataproc cluster causes error #36

sat · 2019-07-02T15:19:52Z

When specifying a custom service account for the dataproc cluster:

data_proc_image: 1.4.0-debian9
spark-bigquery-connector: 0.7.0-beta

A problem has occured while running task
io.grpc.ManagedChannelProvider$ProviderNotFoundException: No functional channel service provider found. Try adding a dependency on the grpc-okhttp, grpc-netty, or grpc-netty-shaded artifact
	at io.grpc.ManagedChannelProvider.provider(ManagedChannelProvider.java:60)
	at io.grpc.ManagedChannelBuilder.forAddress(ManagedChannelBuilder.java:37)
	at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.createSingleChannel(InstantiatingGrpcChannelProvider.java:194)
	at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.createChannel(InstantiatingGrpcChannelProvider.java:169)
	at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.getTransportChannel(InstantiatingGrpcChannelProvider.java:156)
	at com.google.api.gax.rpc.ClientContext.create(ClientContext.java:157)
	at com.google.cloud.bigquery.storage.v1beta1.stub.EnhancedBigQueryStorageStub.create(EnhancedBigQueryStorageStub.java:90)
	at com.google.cloud.bigquery.storage.v1beta1.BigQueryStorageClient.<init>(BigQueryStorageClient.java:144)
	at com.google.cloud.bigquery.storage.v1beta1.BigQueryStorageClient.create(BigQueryStorageClient.java:125)
	at com.google.cloud.spark.bigquery.direct.DirectBigQueryRelation$.createReadClient(DirectBigQueryRelation.scala:170)
	at com.google.cloud.spark.bigquery.direct.DirectBigQueryRelation$$anonfun$$lessinit$greater$default$3$1.apply(DirectBigQueryRelation.scala:42)
	at com.google.cloud.spark.bigquery.direct.DirectBigQueryRelation$$anonfun$$lessinit$greater$default$3$1.apply(DirectBigQueryRelation.scala:42)
	at com.google.cloud.spark.bigquery.direct.DirectBigQueryRelation.buildScan(DirectBigQueryRelation.scala:81)
	at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$10.apply(DataSourceStrategy.scala:293)
	at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$10.apply(DataSourceStrategy.scala:293)
	at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:326)
	at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:325)
	at org.apache.spark.sql.execution.datasources.DataSourceStrategy.pruneFilterProjectRaw(DataSourceStrategy.scala:381)
	at org.apache.spark.sql.execution.datasources.DataSourceStrategy.pruneFilterProject(DataSourceStrategy.scala:321)
	at org.apache.spark.sql.execution.datasources.DataSourceStrategy.apply(DataSourceStrategy.scala:289)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:63)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:63)
	at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:78)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:75)
	at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
	at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
	at scala.collection.Iterator$class.foreach(Iterator.scala:891)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
	at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
	at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1334)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:75)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:67)
	at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:78)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:75)
	at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
	at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
	at scala.collection.Iterator$class.foreach(Iterator.scala:891)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
	at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
	at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1334)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:75)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:67)
	at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
	at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:72)
	at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:68)
	at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:77)
	at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:77)
	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
	at org.apache.spark.sql.Dataset.rdd$lzycompute(Dataset.scala:3038)
	at org.apache.spark.sql.Dataset.rdd(Dataset.scala:3036)

Without a custom service account, seems everything is fine. However, i need to use a custom service account for cross account access to BQ:

com.google.cloud.bigquery.BigQueryException: Access Denied: Table [[ omitted ]]: The user 564585695625-compute@developer.gserviceaccount.com does not have bigquery.tables.get permission for table

The text was updated successfully, but these errors were encountered:

kmjung · 2019-07-02T20:17:33Z

It looks like you've included only part of the error message here -- can you share the error itself, or the full thread stack?

sat · 2019-07-02T23:53:19Z

@kmjung i've attached the part of the error message that is relevant. It is working fine with the default service account.

sat · 2019-07-03T01:35:39Z

Get the same issue when explicitly passing the credentials via:

spark.read.format("bigquery").option("credentials", "<SERVICE_ACCOUNT_JSON_IN_BASE64>")

kmjung · 2019-07-03T01:37:48Z

Are you using gs://spark-lib/bigquery/spark-bigquery-latest.jar or building it yourself? "No functional channel provider found" sounds like a classpath issue rather than a configuration issue. cc: @pmkc @cyxxy

sat · 2019-07-03T02:12:42Z

@kmjung i am using 0.7.0-beta from maven.

The reader is being included in a dependencies jar, the same way as i include other custom readers not provided on the cluster (ie. spark-avro etc...) with no issues. Is it recommended that it be installed via spark-packages via a dataproc initialisation action? Then specified as "provided" in my build? I can also try with more recent dataproc 1.4.x version and Spark > 2.x ?

sat · 2019-07-03T03:48:01Z

@kmjung i could get it working via just including the shaded jar as you mentioned on the classpath via jarFileUris (--jars) along with my application jars:

gs://spark-lib/bigquery/spark-bigquery-assembly-0.7.0-beta.jar

Specifying "provided" in my build:

"com.google.cloud.spark" %% "spark-bigquery" % "0.7.0-beta" % "provided"

Thanks for your help.

sat changed the title ~~Specifying a custom service account in Dataproc causes error~~ Specifying a custom service account for dataproc cluster causes error Jul 2, 2019

sat closed this as completed Jul 3, 2019

pmkc mentioned this issue Jul 3, 2019

can not read from BQ table #35

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specifying a custom service account for dataproc cluster causes error #36

Specifying a custom service account for dataproc cluster causes error #36

sat commented Jul 2, 2019 •

edited

kmjung commented Jul 2, 2019

sat commented Jul 2, 2019

sat commented Jul 3, 2019

kmjung commented Jul 3, 2019

sat commented Jul 3, 2019

sat commented Jul 3, 2019

Specifying a custom service account for dataproc cluster causes error #36

Specifying a custom service account for dataproc cluster causes error #36

Comments

sat commented Jul 2, 2019 • edited

kmjung commented Jul 2, 2019

sat commented Jul 2, 2019

sat commented Jul 3, 2019

kmjung commented Jul 3, 2019

sat commented Jul 3, 2019

sat commented Jul 3, 2019

sat commented Jul 2, 2019 •

edited