Skip to content

Comments

allow multiple elements in impersonation chain#35694

Merged
potiuk merged 2 commits intoapache:mainfrom
melugoyal:connection-support-impersonation-chain
Nov 25, 2023
Merged

allow multiple elements in impersonation chain#35694
potiuk merged 2 commits intoapache:mainfrom
melugoyal:connection-support-impersonation-chain

Conversation

@melugoyal
Copy link

@melugoyal melugoyal commented Nov 17, 2023

#33715 added the impersonation_chain parameter to Google Cloud connections, but didn't allow any way to set multiple values for the impersonation chain. with this change, users can specify a comma-separated list of service accounts to be used in the impersonation chain


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added area:providers kind:documentation provider:google Google (including GCP) related issues labels Nov 17, 2023
@potiuk
Copy link
Member

potiuk commented Nov 17, 2023

needs static checks fixes

@melugoyal melugoyal force-pushed the connection-support-impersonation-chain branch from 9aa4bc6 to ffae40a Compare November 17, 2023 17:03
@potiuk
Copy link
Member

potiuk commented Nov 17, 2023

cc: @VladaZakharova and team -> do you think it's a safe change (in terms of , being always recognized as separator)

@VladaZakharova
Copy link
Contributor

VladaZakharova commented Nov 17, 2023

Hi!
As i understand, to set multiple SA with current implementation in impersonation_chain parameter you should pass them as a list of strings. What new behaviour will add suggested implementation?

@melugoyal
Copy link
Author

melugoyal commented Nov 17, 2023

@VladaZakharova essentially there's a bug in the current implementation where there is no actual way to pass in a list of strings via the connection (since impersonation_chain is defined as a StringField). this PR enables that by expecting a comma-separated list of strings, then splitting them into a list when instantiating the hook

@melugoyal melugoyal force-pushed the connection-support-impersonation-chain branch from ffae40a to e75544a Compare November 18, 2023 00:51
@melugoyal melugoyal force-pushed the connection-support-impersonation-chain branch from e75544a to 4dbc17e Compare November 18, 2023 01:32
@VladaZakharova
Copy link
Contributor

Could you please provide your test using this operator where you are using multiple SA in impersonation_chain parameter as long as the results of this run? Thank you!

@melugoyal
Copy link
Author

melugoyal commented Nov 20, 2023

@VladaZakharova imo the added unit tests should be sufficient, given it's just extending the current functionality already in the provider

@VladaZakharova
Copy link
Contributor

I just want to be sure that previous functionality really doesn't work without your changes, since it is quite strange. That is why i have kindly asked you if you can provide system test that you have used to check this functionality.
@eladkal Can you please also take a look?

@potiuk
Copy link
Member

potiuk commented Nov 21, 2023

Yes. My question is also - what's the problem with keeping the sequence as before? WHY do you think using array is not good enough @melugoyal ? Is there a particular reason you want to introduce coma separated string that having array does not allow? Can you show an example case where this is problematic?

@melugoyal
Copy link
Author

@VladaZakharova @potiuk please correct me if i'm wrong but i don't see how you can specify a list of strings when configuring a connection through the Airflow UI for a connection field that is a string. this change allows the user to specify a comma-separated list of strings in the Airflow UI for the impersonation_chain argument, and upon instantiating the hook we detect a comma-separated string for that argument and convert it to a list of strings.

@potiuk
Copy link
Member

potiuk commented Nov 22, 2023

@VladaZakharova @potiuk please correct me if i'm wrong but i don't see how you can specify a list of strings when configuring a connection through the Airflow UI for a connection field that is a string. this change allows the user to specify a comma-separated list of strings in the Airflow UI for the impersonation_chain argument, and upon instantiating the hook we detect a comma-separated string for that argument and convert it to a list of strings.

Correct me if I am wrong - but I think you cannot do it with your change either? The impersonation_chain can only be passed by Hook / Operator init parameters, not through connection extra (which is what you are suggesting) ?

And it is defined like this everywhere in those:

        impersonation_chain: str | Sequence[str] | None = None,

But.maybe I am missing something ??

@potiuk
Copy link
Member

potiuk commented Nov 22, 2023

IMHO You'd need to implement quite a lot more changes to allow impersonation chain to be passed by extra (including fields customisations) - and I am not even sure if it is a good idea (I'd revert it to @VladaZakharova and others from Google to have a say on that).

@melugoyal
Copy link
Author

@potiuk not sure i understand. as mentioned in the PR description, #33715 added the ability to specify impersonation_chain via the extras in the Google Cloud Connection

@potiuk
Copy link
Member

potiuk commented Nov 22, 2023

@potiuk not sure i understand. as mentioned in the PR description, #33715 added the ability to specify impersonation_chain via the extras in the Google Cloud Connection

Ah.. I see - I completely forgotten about the other PR :)

You have to remember that when you look at multiple 10s of PRs a day it's hard to remember that something have been mentioned at description and you might simply not realise that when you look at just the code but have not re-read all the description from the beginning (I understand for you might be obvious because those are the only two PRs you are looking at.

With this context - it is clear what you are after. Now I would refer to @VladaZakharova and team to asy if having impersonation in extra seem like good idea or not - becasue really this is what should be decided (when it is, then indeed this one makes sense).

@VladaZakharova
Copy link
Contributor

Hi!
Thank you for the clarification.
These changes seem to be affecting existing functionality since we didn't properly check the changes from #33715 , so could you please create a simple DAG to check 2 cases:

  1. setup impersonation chain in connection, setup impersonation chain in operator => the impersonation chain from the operator should be used;
  2. setup impersonation chain in connection, don't setup impersonation chain in operator => the impersonation chain from the connection should be used.
    It would be nice to check different types of input impersonation_chain.
    Please, share the results here in PR. Thanks!

@melugoyal
Copy link
Author

melugoyal commented Nov 22, 2023

test DAG:

from airflow.providers.google.cloud.operators.gcs import GCSCreateBucketOperator
from airflow import DAG
from datetime import datetime

with DAG(
    dag_id="gcp-test", schedule="@once", start_date=datetime(2023, 1, 1), is_paused_upon_creation=False, catchup=False
) as dag:
    GCSCreateBucketOperator(
        task_id="gcscreate",
        bucket_name="test-mehulg",
        gcp_conn_id="gcp_test",
        impersonation_chain="test-sa-in-code@project.iam.gserviceaccount.com",
    )

with connection gcp_test specified without the impersonation_chain set: File "/usr/local/lib/python3.11/site-packages/google/auth/impersonated_credentials.py", line 100, in _make_iam_token_request raise exceptions.RefreshError(_REFRESH_ERROR, response_body) google.auth.exceptions.RefreshError: (\'Unable to acquire impersonated credentials\', \'{\ "error": {\ "code": 404,\ "message": "Not found; Gaia id not found for email test-sa-in-code@project.iam.gserviceaccount.com",\ "status": "NOT_FOUND"\ }\ }\ \')
so it attempted to use test-sa-in-code@project.iam.gserviceaccount.com from the DAG code. note the error is expected, since that is a fake service account and i'm not actually setting up any IAM policies

with impersonation_chain in connection gcp_test set to test-sa-in-connection@project.iam.gserviceaccount.com
image
File "/usr/local/lib/python3.11/site-packages/google/auth/impersonated_credentials.py", line 100, in _make_iam_token_request raise exceptions.RefreshError(_REFRESH_ERROR, response_body) google.auth.exceptions.RefreshError: (\'Unable to acquire impersonated credentials\', \'{\ "error": {\ "code": 404,\ "message": "Not found; Gaia id not found for email test-sa-in-code@project.iam.gserviceaccount.com",\ "status": "NOT_FOUND"\ }\ }\ \'), same error, indicating that as expected the impersonation_chain from the code takes precedence.

now redeployed the DAG removing the impersonation_chain specified in code:
File "/usr/local/lib/python3.11/site-packages/google/auth/impersonated_credentials.py", line 100, in _make_iam_token_request raise exceptions.RefreshError(_REFRESH_ERROR, response_body) google.auth.exceptions.RefreshError: (\'Unable to acquire impersonated credentials\', \'{\ "error": {\ "code": 404,\ "message": "Not found; Gaia id not found for email test-sa-in-connection@project.iam.gserviceaccount.com",\ "status": "NOT_FOUND"\ }\ }\ \')
as expected, it attempted to use the service account specified in the connection.

with impersonation chain in gcp_test updated to test-first-sa-in-connection@project.iam.gserviceaccount.com, test-sa-in-connection@project.iam.gserviceaccount.com
image
File "/usr/local/lib/python3.11/site-packages/google/auth/impersonated_credentials.py", line 100, in _make_iam_token_request raise exceptions.RefreshError(_REFRESH_ERROR, response_body) google.auth.exceptions.RefreshError: (\'Unable to acquire impersonated credentials\', \'{\ "error": {\ "code": 404,\ "message": "Not found; Gaia id not found for email test-first-sa-in-connection@project.iam.gserviceaccount.com",\ "status": "NOT_FOUND"\ }\ }\ \')
as expected, it attempted to use the first service account specified in the connection

with impersonation chain in gcp_test updated to astro-arithmetic-nebula-5764@astro-dev-us-east4-0001.iam.gserviceaccount.com, test-sa-in-connection@project.iam.gserviceaccount.com (and Airflow worker's service account granted access to impersonate astro-arithmetic-nebula-5764@astro-dev-us-east4-0001.iam.gserviceaccount.com)
image
File "/usr/local/lib/python3.11/site-packages/google/auth/impersonated_credentials.py", line 100, in _make_iam_token_request raise exceptions.RefreshError(_REFRESH_ERROR, response_body) google.auth.exceptions.RefreshError: (\'Unable to acquire impersonated credentials\', \'{\ "error": {\ "code": 404,\ "message": "Not found; Gaia id not found for email test-sa-in-connection@project.iam.gserviceaccount.com",\ "status": "NOT_FOUND"\ }\ }\ \')
as expected, it was able to impersonate the first service account in the chain, but then got an error when impersonating the second one.

@VladaZakharova
Copy link
Contributor

Thank you for explanation, i think we can merge the changes
@potiuk

@potiuk potiuk merged commit e2a5dbf into apache:main Nov 25, 2023
@melugoyal melugoyal deleted the connection-support-impersonation-chain branch November 27, 2023 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers kind:documentation provider:google Google (including GCP) related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants