Skip to content
This repository has been archived by the owner on Jan 23, 2024. It is now read-only.

BQ Data Transfer Auth causes failure in schedule.py #47

Closed
sganslandt opened this issue Dec 10, 2020 · 10 comments
Closed

BQ Data Transfer Auth causes failure in schedule.py #47

sganslandt opened this issue Dec 10, 2020 · 10 comments
Assignees
Labels
type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@sganslandt
Copy link

Running the python3 schedule.py --query_file=changes.sql --table=changes --access_token=${token} part of setup.sh initially fails with the following exception.

Traceback (most recent call last):
  File "/Users/segan3/Library/Python/3.8/lib/python/site-packages/google/api_core/grpc_helpers.py", line 57, in error_remapped_callable
    return callable_(*args, **kwargs)
  File "/Users/segan3/Library/Python/3.8/lib/python/site-packages/grpc/_channel.py", line 923, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/Users/segan3/Library/Python/3.8/lib/python/site-packages/grpc/_channel.py", line 826, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.INVALID_ARGUMENT
	details = "Failed to find a valid credential. The request to create a transfer config is supposed to contain an authorization code."
	debug_error_string = "{"created":"@1607609612.827897000","description":"Error received from peer ipv4:216.58.207.234:443","file":"src/core/lib/surface/call.cc","file_line":1063,"grpc_message":"Failed to find a valid credential. The request to create a transfer config is supposed to contain an authorization code.","grpc_status":3}"
>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "schedule.py", line 105, in <module>
    app.run(create_or_update_scheduled_query)
  File "/Users/segan3/Library/Python/3.8/lib/python/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/Users/segan3/Library/Python/3.8/lib/python/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "schedule.py", line 100, in create_or_update_scheduled_query
    response = client.create_transfer_config(parent, transfer_config)
  File "/Users/segan3/Library/Python/3.8/lib/python/site-packages/google/cloud/bigquery_datatransfer_v1/gapic/data_transfer_service_client.py", line 811, in create_transfer_config
    return self._inner_api_calls["create_transfer_config"](
  File "/Users/segan3/Library/Python/3.8/lib/python/site-packages/google/api_core/gapic_v1/method.py", line 145, in __call__
    return wrapped_func(*args, **kwargs)
  File "/Users/segan3/Library/Python/3.8/lib/python/site-packages/google/api_core/retry.py", line 281, in retry_wrapped_func
    return retry_target(
  File "/Users/segan3/Library/Python/3.8/lib/python/site-packages/google/api_core/retry.py", line 184, in retry_target
    return target()
  File "/Users/segan3/Library/Python/3.8/lib/python/site-packages/google/api_core/timeout.py", line 214, in func_with_timeout
    return func(*args, **kwargs)
  File "/Users/segan3/Library/Python/3.8/lib/python/site-packages/google/api_core/grpc_helpers.py", line 59, in error_remapped_callable
    six.raise_from(exceptions.from_grpc_error(exc), exc)
  File "<string>", line 3, in raise_from
google.api_core.exceptions.InvalidArgument: 400 Failed to find a valid credential. The request to create a transfer config is supposed to contain an authorization code.

This was from running it with an auth token from my personal account, which had the Owner role in the project.

There are probably better solutions, but I managed to get around it by first creating a similar scheduled query manually to there get option to grant access to the needed scope.

@sganslandt sganslandt changed the title Issue in setup.sh up data transfers Issue in setup.sh creating data transfers Dec 10, 2020
@sganslandt sganslandt changed the title Issue in setup.sh creating data transfers Issue in setup.sh creating scheduled queries Dec 10, 2020
@dinagraves
Copy link
Contributor

Interesting. I'm going to try to recreate the bug. Did you use the "create new project" option or did you use a pre-existing project?

@dinagraves dinagraves added the type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. label Dec 10, 2020
@dinagraves dinagraves self-assigned this Dec 10, 2020
@dinagraves
Copy link
Contributor

I ran it a couple times, one using a pre-existing project and one with a new project. So far unable to recreate. What system are you using? And to confirm, this does work when you run it manually?

token=$(gcloud auth print-access-token)
python3 schedule.py --query_file=changes.sql --table=changes --access_token=${token}

@sganslandt
Copy link
Author

sganslandt commented Dec 11, 2020

I used a pre-existing project, but I believe the issue was that is was a new account that had never used data transfers before.

Once I had created my first scheduled query manually and approved the "BigQuery Data Transfer Service" to do things on my behalf through the popup, the script worked as indented.

image

There was no difference in running the full setup.sh script and running the schedule creation in isolation.

@dinagraves
Copy link
Contributor

Thank you for that! I don't think there is a way to enable this automatically, but I'm going to look into forcing the prompt to pop-up before running the rest of the setup script.

@dinagraves dinagraves changed the title Issue in setup.sh creating scheduled queries BQ Data Transfer Auth causes failure in schedule.py Jan 20, 2021
@davidstanke
Copy link
Collaborator

I'm experiencing this problem consistently (even after manually creating a transfer and doing the OAuth dance as part of that process).

One thing that I just thought of which might be relevant: my dev environment is a remote SSH connection (in VSCode) to a cloud instance. I wonder if there's something happening where the BigQuery auth prompt is trying to launch, but because the process is actually running remotely, it's not triggering my computer to launch a browser.(???)

@dinagraves
Copy link
Contributor

Interesting. Can you try this from Cloud Shell?

@davidstanke
Copy link
Collaborator

So, it seems we're not the only ones having trouble with BigQuery+Python+Auth.

Anyway, Cloud Shell didn't fix it for me but I was able to push through it: I found a workaround, using bq instead of the Python SDK. Here's my replacement for the schedule_bq_queries() method in setup.sh:

schedule_bq_queries(){
  echo "Check BigQueryDataTransfer is enabled" 
  enabled=$(gcloud services list --enabled --filter name:bigquerydatatransfer.googleapis.com)

  while [[ "${enabled}" != *"bigquerydatatransfer.googleapis.com"* ]]
  do gcloud services enable bigquerydatatransfer.googleapis.com
  # Keep checking if it's enabled
  enabled=$(gcloud services list --enabled --filter name:bigquerydatatransfer.googleapis.com)
  done

  cd ${DIR}/../queries/
  pip3 install -r requirements.txt -q --user

  echo "Delete existing scheduled queries for derived tables..."; set -x
  # NOTE: this will delete ALL scheduled queries in `$FOURKEYS_PROJECT`
   for config in $(bq ls --project_id=$FOURKEYS_PROJECT --transfer_config --transfer_location=US --format=prettyjson | jq -r '.[].name')
   do
    bq rm --transfer_config -f $config
   done

  echo "Creating BigQuery scheduled queries for derived tables.."; set -x

  bq query --use_legacy_sql=false --destination_table=$FOURKEYS_PROJECT:four_keys.changes --display_name=changes \
    --schedule="every 24 hours" --replace=true "`cat changes.sql`"
  bq query --use_legacy_sql=false --destination_table=$FOURKEYS_PROJECT:four_keys.deployments --display_name=deployments \
    --schedule="every 24 hours" --replace=true "`cat deployments.sql`"
  bq query --use_legacy_sql=false --destination_table=$FOURKEYS_PROJECT:four_keys.incidents --display_name=incidents \
    --schedule="every 24 hours" --replace=true "`cat incidents.sql`"

  set +x; echo
  cd ${DIR}

...there are two things I'd like to work through before opening a PR: 1) it requires jq and I bet with some fancy text mangling, we can get rid of that. 2) it destroys and recreates all scheduled queries; that could also be fixed with fancy text mangling.

This was referenced Jan 23, 2021
@tswast
Copy link

tswast commented Jan 25, 2021

The difference between the bq command is that it defaults to user credentials, whereas the Python client libraries default to service account credentials. You can use user credentials with the Python client libraries by passing in a credentials object. Follow the tutorial at : https://cloud.google.com/bigquery/docs/authentication/end-user-installed for instructions on creating such an object.

@dinagraves
Copy link
Contributor

Hi Tim,

I believe we're doing that here:

https://github.com/GoogleCloudPlatform/fourkeys/blob/main/queries/schedule.py#L39

This works for me and many other users. But I believe the key difference is that when I first ran the script, I had to go to my browser and allow the BQ service to use my account. This is also what the user reports above. However, Dave isn't getting that prompt or pop-up and thus is having trouble authenticating with his access-token.

@dinagraves
Copy link
Contributor

dinagraves commented Feb 3, 2021

This should be solved via the switch to a bash script: #68. Closing for now.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

4 participants