Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specifying a custom service account for dataproc cluster causes error #36

Closed
sat opened this issue Jul 2, 2019 · 6 comments
Closed

Comments

@sat
Copy link

sat commented Jul 2, 2019

When specifying a custom service account for the dataproc cluster:

data_proc_image: 1.4.0-debian9
spark-bigquery-connector: 0.7.0-beta

A problem has occured while running task
io.grpc.ManagedChannelProvider$ProviderNotFoundException: No functional channel service provider found. Try adding a dependency on the grpc-okhttp, grpc-netty, or grpc-netty-shaded artifact
	at io.grpc.ManagedChannelProvider.provider(ManagedChannelProvider.java:60)
	at io.grpc.ManagedChannelBuilder.forAddress(ManagedChannelBuilder.java:37)
	at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.createSingleChannel(InstantiatingGrpcChannelProvider.java:194)
	at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.createChannel(InstantiatingGrpcChannelProvider.java:169)
	at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.getTransportChannel(InstantiatingGrpcChannelProvider.java:156)
	at com.google.api.gax.rpc.ClientContext.create(ClientContext.java:157)
	at com.google.cloud.bigquery.storage.v1beta1.stub.EnhancedBigQueryStorageStub.create(EnhancedBigQueryStorageStub.java:90)
	at com.google.cloud.bigquery.storage.v1beta1.BigQueryStorageClient.<init>(BigQueryStorageClient.java:144)
	at com.google.cloud.bigquery.storage.v1beta1.BigQueryStorageClient.create(BigQueryStorageClient.java:125)
	at com.google.cloud.spark.bigquery.direct.DirectBigQueryRelation$.createReadClient(DirectBigQueryRelation.scala:170)
	at com.google.cloud.spark.bigquery.direct.DirectBigQueryRelation$$anonfun$$lessinit$greater$default$3$1.apply(DirectBigQueryRelation.scala:42)
	at com.google.cloud.spark.bigquery.direct.DirectBigQueryRelation$$anonfun$$lessinit$greater$default$3$1.apply(DirectBigQueryRelation.scala:42)
	at com.google.cloud.spark.bigquery.direct.DirectBigQueryRelation.buildScan(DirectBigQueryRelation.scala:81)
	at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$10.apply(DataSourceStrategy.scala:293)
	at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$10.apply(DataSourceStrategy.scala:293)
	at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:326)
	at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:325)
	at org.apache.spark.sql.execution.datasources.DataSourceStrategy.pruneFilterProjectRaw(DataSourceStrategy.scala:381)
	at org.apache.spark.sql.execution.datasources.DataSourceStrategy.pruneFilterProject(DataSourceStrategy.scala:321)
	at org.apache.spark.sql.execution.datasources.DataSourceStrategy.apply(DataSourceStrategy.scala:289)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:63)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:63)
	at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:78)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:75)
	at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
	at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
	at scala.collection.Iterator$class.foreach(Iterator.scala:891)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
	at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
	at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1334)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:75)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:67)
	at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:78)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:75)
	at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
	at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
	at scala.collection.Iterator$class.foreach(Iterator.scala:891)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
	at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
	at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1334)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:75)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:67)
	at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
	at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:72)
	at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:68)
	at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:77)
	at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:77)
	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
	at org.apache.spark.sql.Dataset.rdd$lzycompute(Dataset.scala:3038)
	at org.apache.spark.sql.Dataset.rdd(Dataset.scala:3036)

Without a custom service account, seems everything is fine. However, i need to use a custom service account for cross account access to BQ:

com.google.cloud.bigquery.BigQueryException: Access Denied: Table [[ omitted ]]: The user 564585695625-compute@developer.gserviceaccount.com does not have bigquery.tables.get permission for table

@sat sat changed the title Specifying a custom service account in Dataproc causes error Specifying a custom service account for dataproc cluster causes error Jul 2, 2019
@kmjung
Copy link
Collaborator

kmjung commented Jul 2, 2019

It looks like you've included only part of the error message here -- can you share the error itself, or the full thread stack?

@sat
Copy link
Author

sat commented Jul 2, 2019

@kmjung i've attached the part of the error message that is relevant. It is working fine with the default service account.

@sat
Copy link
Author

sat commented Jul 3, 2019

Get the same issue when explicitly passing the credentials via:

spark.read.format("bigquery").option("credentials", "<SERVICE_ACCOUNT_JSON_IN_BASE64>")

@kmjung
Copy link
Collaborator

kmjung commented Jul 3, 2019

Are you using gs://spark-lib/bigquery/spark-bigquery-latest.jar or building it yourself? "No functional channel provider found" sounds like a classpath issue rather than a configuration issue. cc: @pmkc @cyxxy

@sat
Copy link
Author

sat commented Jul 3, 2019

@kmjung i am using 0.7.0-beta from maven.

The reader is being included in a dependencies jar, the same way as i include other custom readers not provided on the cluster (ie. spark-avro etc...) with no issues. Is it recommended that it be installed via spark-packages via a dataproc initialisation action? Then specified as "provided" in my build? I can also try with more recent dataproc 1.4.x version and Spark > 2.x ?

@sat
Copy link
Author

sat commented Jul 3, 2019

@kmjung i could get it working via just including the shaded jar as you mentioned on the classpath via jarFileUris (--jars) along with my application jars:

gs://spark-lib/bigquery/spark-bigquery-assembly-0.7.0-beta.jar

Specifying "provided" in my build:

"com.google.cloud.spark" %% "spark-bigquery" % "0.7.0-beta" % "provided"

Thanks for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants