Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to authenticate using 'credentials' option #249

Closed
sercanersoy opened this issue Oct 2, 2020 · 5 comments
Closed

Unable to authenticate using 'credentials' option #249

sercanersoy opened this issue Oct 2, 2020 · 5 comments
Assignees

Comments

@sercanersoy
Copy link

Hi,

I am writing a Spark (2.4.3) application using Scala (2.11.12) which reads data from a BigQuery table with the help of spark-bigquery-with-dependencies package (0.17.2). The application does not run on the Google Cloud machines, therefore it has to authenticate. Below is my code:

spark.read
  .option("parentProject", "xxx")
  .option("credentials", "xxx")
  .bigquery("xxx")
  .limit(10)
  .show()

I mustn't use credentialsFile option. Therefore, I converted my json credentials file to a Base64 string and I am passing it to credentials option. But this is what I got:

java.io.IOException: The Application Default Credentials are not available. They are available if running in Google Compute Engine. Otherwise, the environment variable GOOGLE_APPLICATION_CREDENTIALS must be defined pointing to a file defining the credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information.

The program is unable to infer credentials from credentials option. Actually I think it does not even trying to do so, it just directly looks at the GOOGLE_APPLICATION_CREDENTIALS and crashes if it does not exist. I mustn't use an external json credential file in my app hence I mustn't use an environment variable, too.

I would be appreciated if you could help. Thanks by now!

@davidrabinowitz davidrabinowitz self-assigned this Oct 2, 2020
@davidrabinowitz
Copy link
Member

Can you please try the following:

spark.conf.set("credentials", "<SERVICE_ACCOUNT_JSON_IN_BASE64>")
spark.read
  .option("parentProject", "xxx")
  .bigquery("xxx")
  .limit(10)
  .show()

davidrabinowitz added a commit to davidrabinowitz/spark-bigquery-connector that referenced this issue Oct 5, 2020
davidrabinowitz added a commit to davidrabinowitz/spark-bigquery-connector that referenced this issue Oct 5, 2020
davidrabinowitz added a commit that referenced this issue Oct 6, 2020
It appears that the previous code that had relied on Java streams had issues, reverting to simpler code structure, added additional tests.
@davidrabinowitz
Copy link
Member

Fixed by PR #250

@gbougeard
Copy link

Hi,

not sure it's the best place to post that but I'm getting the following error using 0.17.3 :

15:08:29.771 DEBUG c.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase - GHFS.configure
15:08:29.771 DEBUG c.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase - GHFS_ID = GHFS/1.6.1-hadoop2
15:08:29.779 DEBUG com.google.cloud.hadoop.util.CredentialConfiguration - Using service account credentials
15:08:29.779 DEBUG com.google.cloud.hadoop.util.CredentialConfiguration - Getting service account credentials from meta data service.
15:08:29.779 DEBUG com.google.cloud.hadoop.util.CredentialFactory - getCredentialFromMetadataServiceAccount()
[info] - should analyse data and persist summary *** FAILED *** (3 minutes, 0 seconds)
[info]   java.io.IOException: Error getting access token from metadata server at: http://metadata/computeMetadata/v1/instance/service-accounts/default/token
[info]   at com.google.cloud.hadoop.util.CredentialFactory.getCredentialFromMetadataServiceAccount(CredentialFactory.java:208)
[info]   at com.google.cloud.hadoop.util.CredentialConfiguration.getCredential(CredentialConfiguration.java:70)
[info]   at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.configure(GoogleHadoopFileSystemBase.java:1825)
[info]   at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.initialize(GoogleHadoopFileSystemBase.java:1012)
[info]   at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.initialize(GoogleHadoopFileSystemBase.java:975)

I'm setting up like that and my job will run on AWS EMR (only writes on BQ, no read):

 sparkSession.conf.set("credentials", base64credentials) 
 sparkSession.conf.set("parentProject", projectId)       
 sparkSession.conf.set("project", projectId)             
 sparkSession.conf.set("dataset", dataset)               
 sparkSession.conf.set("table", s"$dataset:$table")      
 sparkSession.conf.set("temporaryGcsBucket", bucket)     
 sparkSession.conf.set("maxParallelism", nbExecutors)    

@davidrabinowitz
Copy link
Member

@gbougeard Can you please validate that the service account JSON fields conforms with the one from https://cloud.google.com/iam/docs/creating-managing-service-account-keys ? especially the token_uri field?

@gbougeard
Copy link

@gbougeard Can you please validate that the service account JSON fields conforms with the one from https://cloud.google.com/iam/docs/creating-managing-service-account-keys ? especially the token_uri field?

here is an extract of my service account in json:

 "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://oauth2.googleapis.com/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants