-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Description
Apache Airflow Provider(s)
Versions of Apache Airflow Providers
apache-airflow-providers-google==6.1.0
Apache Airflow version
2.1.4
Operating System
Debian GNU/Linux 11 (bullseye)
Deployment
Docker-Compose
Deployment details
At my company we're developing our Airflow DAGs in local environments based on Docker Compose.
To authenticate against the GCP, we don't use service accounts and their keys, but instead use our user credentials and set them up as Application Default Credentials (ADC), i.e. we run
$ gcloud auth login
$ gcloud gcloud auth application-default login
We also set the default the Project ID in both gcloud and Airflow connections, i.e.
$ gcloud config set project $PROJECT
$ # run the following inside the Airflow Docker container
$ airflow connections delete google_cloud_default
$ airflow connections add google_cloud_default \
--conn-type=google_cloud_platform \
--conn-extra='{"extra__google_cloud_platform__project":"$PROJECT"}'
What happened
It seems that due to this part in base_google.py, when the Project ID is set in either the Airflow connections or gcloud config, gcloud auth (specifically gcloud auth activate-refresh-token) will not be executed.
This results in e.g. gcloud container clusters get-credentials in the GKEStartPodOperator to fail, since You do not currently have an active account selected:
[2021-12-20 15:21:12,059] {credentials_provider.py:295} INFO - Getting connection using `google.auth.default()` since no key file is defined for hook.
[2021-12-20 15:21:12,073] {logging_mixin.py:109} WARNING - /usr/local/lib/python3.8/site-packages/google/auth/_default.py:70 UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK without a quota project. You might receive a "quota exceeded" or "API not enabled" error. We recommend you rerun `gcloud auth application-default login` and make sure a quota project is added. Or you can use service accounts instead. For more information about service accounts, see https://cloud.google.com/docs/authentication/
[2021-12-20 15:21:13,863] {process_utils.py:135} INFO - Executing cmd: gcloud container clusters get-credentials REDACTED --zone europe-west1-b --project REDACTED
[2021-12-20 15:21:13,875] {process_utils.py:139} INFO - Output:
[2021-12-20 15:21:14,522] {process_utils.py:143} INFO - ERROR: (gcloud.container.clusters.get-credentials) You do not currently have an active account selected.
[2021-12-20 15:21:14,522] {process_utils.py:143} INFO - Please run:
[2021-12-20 15:21:14,523] {process_utils.py:143} INFO -
[2021-12-20 15:21:14,523] {process_utils.py:143} INFO - $ gcloud auth login
[2021-12-20 15:21:14,523] {process_utils.py:143} INFO -
[2021-12-20 15:21:14,523] {process_utils.py:143} INFO - to obtain new credentials.
[2021-12-20 15:21:14,523] {process_utils.py:143} INFO -
[2021-12-20 15:21:14,523] {process_utils.py:143} INFO - If you have already logged in with a different account:
[2021-12-20 15:21:14,523] {process_utils.py:143} INFO -
[2021-12-20 15:21:14,523] {process_utils.py:143} INFO - $ gcloud config set account ACCOUNT
[2021-12-20 15:21:14,523] {process_utils.py:143} INFO -
[2021-12-20 15:21:14,523] {process_utils.py:143} INFO - to select an already authenticated account to use.
[2021-12-20 15:21:14,618] {taskinstance.py:1463} ERROR - Task failed with exception
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1165, in _run_raw_task
self._prepare_and_execute_task_with_callbacks(context, task)
File "/usr/local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1283, in _prepare_and_execute_task_with_callbacks
result = self._execute_task(context, task_copy)
File "/usr/local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1313, in _execute_task
result = task_copy.execute(context=context)
File "/usr/local/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/kubernetes_engine.py", line 355, in execute
execute_in_subprocess(cmd)
File "/usr/local/lib/python3.8/site-packages/airflow/utils/process_utils.py", line 147, in execute_in_subprocess
raise subprocess.CalledProcessError(exit_code, cmd)
subprocess.CalledProcessError: Command '['gcloud', 'container', 'clusters', 'get-credentials', 'REDACTED', '--zone', 'europe-west1-b', '--project', 'REDACTED']' returned non-zero exit status 1.
If we set the environment variable GOOGLE_APPLICATION_CREDENTIALS, gcloud auth activate-service-account is run which only works with proper service account credentials, not user credentials.
What you expected to happen
From my POV, it should work to
- have the Project ID set in the
gcloudconfig and/or Airflow variables and still be able to use user credentials with GCP Operators, - set
GOOGLE_APPLICATION_CREDENTIALSto a file containing user credentials and be able to use these credentials with GCP Operators.
Item 1 was definitely possible in Airflow 1.
How to reproduce
See Deployment Details. In essence:
- Run Airflow within Docker Compose (but it's not only Docker Compose that is affected, as far as I can see).
- Use user credentials with
gcloud;gcloud auth login,gcloud auth application-default login - Configure project ID in
gcloudconfig (mounted in the Docker container) and/or Airflow connection - Run
GKEStartOperator
Anything else
Currently, the only workaround (apart from using service accounts) seems to be to not set a default project in either the gcloud config or google_cloud_platform connections.
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct