DockerHook: obtain credentials and login to Amazon ECR #26162

Taragolis · 2022-09-05T14:21:51Z

Amazon ECR Private registry do not have permanent credentials. Users required manually obtain credentials and renew them every 12 hours or use amazon-ecr-credential-helper.

This PR allow DockerHook get credentials not only from Airflow Connection but also use internal BaseDockerCredentialHelper class for obtain credentials to private registries which not support permanent/service credentials

potiuk · 2022-09-05T19:54:08Z

I think this is good idea, but the ECRDockerCredentialHelper should be part of Amazon provider, and likely there should be an ECRDockerOperator that uses it by default. This is very similar to what we have for example in GKEStartPodOperator which is based on generic KubernetesPodOperator and pretty much does only authentication specific for GCP.

I think this is a very similar case.

mik-laj · 2022-09-05T23:45:52Z

airflow/providers/docker/hooks/docker.py

+            credential_helper = AirflowConnectionDockerCredentialHelper
+            credential_helper_kwargs = {}
+        else:
+            credential_helper = import_string(credential_helper)


It is not secure. We should not load user-defined classes as this makes our application vulnerable to CWE-502: Deserialization of Untrusted Data weakness.

We should change the logic so that it is not needed as @poituk suggest, or add a list of allowed classes as is done during DAG deserialization. See:

airflow/airflow/serialization/serialized_objects.py

Lines 999 to 1000 in 5b216e9

if _operator_link_class_path in get_operator_extra_links():

single_op_link_class = import_string(_operator_link_class_path)

see: https://github.com/apache/airflow/pull/7644/files#r392208874

As far as I know also Secrets Backends, REST API auth_backends and Amazon Session Factory also load by import_string method without any validation.

Also personally I add import retry handlers to SlackHook by #25852
If it security issue it could be removed. No slack provider released since this feature added.

Secrets Backends, REST API auth_backends

In this case, the values are not controlled by the user (e.g. via the Web UI), but by the administrator. You must be able to change the files on the disk to change the secret backend or auth backend.

The problem occurs when the value of the import_string parameter is controlled remotely, e.g. it is read from the database.

In the case of Slack, we should fix it so that only allowed/safe values can be imported.

There is no need for the class to be specially downloaded, as the class/function may already exist in any packages installed in the system, but we can misuse it.
I will have a hard time giving an example for Python now, but here you can see what an attack looks like in Java.
https://wololo.net/2022/06/11/ps5-ps4-hacker-theflow-discloses-blu-ray-disc-exploit-toolchain-ps5-piracy-not-a-matter-of-if-but-when/

The class com.sony.gemstack.org.dvb.user.UserPreferenceManagerImpl deserializes the userprefs file under privileged context using readObject() which is insecure
The class com.oracle.security.Service contains a method newInstance which calls Class.forName on an arbitrary class name. This allows arbitrary classes, even restricted ones (for example in sun.), to be instantiated.
The class com.sony.gemstack.org.dvb.io.ixc.IxcProxy contains the protected method invokeMethod which can call methods under privileged context. Permission checks in methods can be bypassed

In this case, we would have to look for a class or function or method that takes a conn argument or ignores unknown key arguments in order to be able to exploit that class, but it can be any class in any package, so we have quite a lot of potential attack vector.

I mean most possible attack it is the fact that both import_string or import load module first.

Some sample

# airflow.providers.exploit.hooks.some_db_api def unsafe_code(): """Grab fernet key from configs, nudes, decode, send to someone and post on Twitter.""" ... unsafe_code() class SomeDbApiHook(DbApiHook): conn_type = "awesome_conn_type" ...

So it would be the same if user call

from airflow.providers.exploit.hooks.some_db_api import SomeDbApiHook

import_string("airflow.providers.exploit.hooks.some_db_api.SomeDbApiHook")

Or use airflow.providers.slack.transfers.sql_to_slack.SqlToSlackOperator in Airflow < 2.3 with connection type referenced to awesome_conn_type

DAG File is considered trusted content as it is recommended that it passes through code review. Changing the value in the database does not require code review, so there is a risk.

conn_id also part of the DAG, so it should pass the review as well as all components which DAG expected to use.

Let's imagine that someone change connection after DAG deployed (some one with Admin or Op role) that could be mainly by two reason

Someone with access to change connections compromised their account and "bad guy" also have access to Airflow which mostly intend to live in private subnets with access by proxy, bastion/jump hosts and VPNs.

Someone with Admin or Op is "bad guy/imposer"

If some kind of exploit already in environment there is no problem to use Option 2 or might be Option 3 - it also uses import_string for DB Api Hooks.

But also it a lot of Operator expected to run template especially BashOperator might not be safe if templates stored in Airflow Variables. 'bash_command' + 'env' = run whatever you want.

One side effect of all above - required to much steps and a lot of things have to happen: docker installed, bash operators uses with Variables, credentials of airflow leaked, Airflow webserver exposed to the Internet.

Don't get me wrong I've also think that too much security Is Never Enough.

But I believe that more chance that user expose somehow cloud credentials which use by Airflow (a lot of chances with this creds has access to almost everything): e.g. some cred info stored in extra field in Connection.

Or also almost all providers which grant ability to send messages e.g.: Slack, Discord, Telegram and MS Teams (just a PR) has ability to get token/credentials directly in operator code (without connections) and do not mask it as secrets (actually it doesn't required because user will store directly in DAG code). A lot of chance that it expose in logs especially if it uses HttpHook. Some "bad guy" could easily use this creds for send messages in corporate messenger - e.g. nice link to change domain user/password.

mik-laj · 2022-09-06T00:18:09Z

@potiuk FYI: The same problem exists in Postgres hook:

airflow/airflow/providers/postgres/hooks/postgres.py

Line 171 in 5b216e9

def get_iam_token(self, conn: Connection) -> Tuple[str, str, int]:

Taragolis · 2022-09-06T02:57:41Z

I think this is good idea, but the ECRDockerCredentialHelper should be part of Amazon provider

Yeah it could be. I put it into docker provider for reduce complexity of dependency.

and likely there should be an ECRDockerOperator that uses it by default

Why I move this part inside of DockerHook hook:

docker_conn_id not only use in DockerOperator also it use in DockerSwarmOperator and @task.docker task-flow decorator which won't work with EcrDockerOperator
This is optional parameter

Taragolis · 2022-09-06T03:10:06Z

@potiuk FYI: The same problem exists in Postgres hook:

And also AWS RDS IAM auth exists in MySQLHook. However I don't think it even works in MySQL case

airflow/airflow/providers/mysql/hooks/mysql.py

Lines 215 to 220 in 5b216e9

    
               def get_iam_token(self, conn: Connection) -> Tuple[str, int]: 
        
                   """ 
        
                   Uses AWSHook to retrieve a temporary password to connect to MySQL 
        
                   Port is required. If none is provided, default 3306 is used 
        
                   """ 
        
                   from airflow.providers.amazon.aws.hooks.base_aws import AwsBaseHook

potiuk · 2022-09-06T10:57:29Z

@potiuk FYI: The same problem exists in Postgres hook:

Yeah it could be. I put it into docker provider for reduce complexity of dependency.
And also AWS RDS IAM auth exists in MySQLHook. However I don't think it even works in MySQL case

I tihnk this should be changed then also in Postgres and MySQL :) . The approach wwhere "all AWS" goes to AWS actually makes dependencies less not more complex for the user. I thik we should avoid any kind of circular dependencies and we won't avoid them if "mysql" provider refers to "aws" one.

Both Postgres and MySQL impementation with local imports are making the dependencies equally (or even more) complex, but they are worse because they are "hidden". If you want to reason for example "which version of AWS provider we should have in order to make Postgres work with AWS" - it is impossible to answer, because by using local imports we are hiding the fact that there is depedency (but is there but implicit and hidden).

I think making an explicit dependency chain AWS -> POSTGRES, MSQL -> COMMON.SQL is the only reasonable approach. and moving "AWS specific Postgres -related classes" should be in AWS.

COMMON.SQL -> should have only generic SQL
POSTGRES -> should add postgres functionality on top
AWS -> shoudl add AWS-specific postgres extensions

IMHO This is our ultimate goal even if there are are cases that are not following this.

potiuk · 2022-09-06T11:00:58Z

Why I move this part inside of DockerHook hook:

docker_conn_id not only use in DockerOperator also it use in DockerSwarmOperator and @task.docker task-flow decorator which won't work with EcrDockerOperator
This is optional parameter

It should be done in discoverable way. The very same how in "common.sql" we instantiate Hook based on "connection_id" (without knowing the type of the hook upfront). The only requirement is that it extends DBApiHook. We should have a common authentication interface implemented by Docker Operator, and some functionality in the Hook to allow the "authentication" to be discovered and used.

Taragolis · 2022-09-06T12:25:32Z

I tihnk this should be changed then also in Postgres and MySQL :) .

Let's open Pandora box. What we should do with this ones?

No dependencies in provider (exists for couple years)

airflow/airflow/providers/amazon/aws/hooks/base_aws.py

Lines 240 to 245 in 6931fbf

    
           def _fetch_saml_assertion_using_http_spegno_auth(self, saml_config: Dict[str, Any]) -> str: 
        
               # requests_gssapi will need paramiko > 2.6 since you'll need 
        
               # 'gssapi' not 'python-gssapi' from PyPi. 
        
               # https://github.com/paramiko/paramiko/pull/1311 
        
               import requests_gssapi 
        
               from lxml import etree

Local imports from google provider

airflow/airflow/providers/amazon/aws/hooks/base_aws.py

Lines 303 to 310 in 6931fbf

    
           def _get_google_identity_token_loader(self): 
        
               from google.auth.transport import requests as requests_transport 
        
               from airflow.providers.google.common.utils.id_token_credentials import ( 
        
                   get_default_id_token_credentials, 
        
               ) 
        
               audience = self.extra_config.get('assume_role_with_web_identity_federation_audience')

Thats is only that I know, might be also some of the same stuff exists somewhere.

We should have a common authentication interface implemented by Docker Operator

I think we should have some common authentication interface for all Hooks which implements connection ability.

potiuk · 2022-09-06T13:33:33Z

Wrong box to open :). This is not part of it.

Both of the examples you mentioned @Taragolis are clearly AWS ones they need no fix as they are in the right place.

They are (and should be) in AWS. In Both cases the "USER" of the authentication (AWS) has dependency on the functionality they use. Having such a functionality for optional AWS <> Google dependency is perfectly fine and it is even "blessed" in our package system. the amazon provider has [google] extra and google provider has [amazon] extra. We even have an exception specially foreseen for that - not yet heavily used but once we split providers into separate repos I was planning to consistently apply it in a places that do not have it.

class AirflowOptionalProviderFeatureException(AirflowException):
    """Raise by providers when imports are missing for optional provider features."""

Those two stories are different thatn here is very easy (even if not "technical" - this is more looking at the landscape of our providers from the business side of things than pure interface/API and it is more based on likelihood of being better maintained than anything else). The decision where to put so code that is in-between is bound more to whether there are clear stakeholders behind the provider that mostly maintain the provider and are interested in having this functionality in.

It is captured in a few places already. We have Release process for providers but also we already used the same line of thoughts when we decided where to put transfer operators. See AIP-21 Changes in import paths - and also touched upon in AIP-8 - Split Provider packages for Airflow 2.0. One of the decisions we made there is that the the "transfer" operators are put in the "target" of the transfer when there is a clear stakeholder behind the target. And in this case circular references are unavoidable.

To sum up the linne of thoughts in one sentence: the idea is that the code should be put in the "provider" where there is a stakeholder that is most likely to maintain the code.

And 'Postgres' and 'MySQL' are different. They have no clear stakeholders behind that are interested in maintaining those. Even though they are commercial databases with companies behind, they are 'commodity' APIS and they are not interested in maintaining services. And neither Postgres nor MySQL are insterested in adding interfacing with AWS/GCP to the provider. But both AWS and GCP are respectively interested to allow AWS/GCP authentication WITH the Postgres/MySQL provider they expose in their own Services offering.

This has been discussed for a long time when AIP-21 was discussed - there are different kinds of providers. Simply speaking SAML/GSSAPI are "protocols", there are also "databases" (like Postgres/MySQL) and then there are "Services" which are higher layer of abstraction and while "Services" can use "Protocols" and "Databases", neither "Protocols" nor "Databases" should use "Services" - they can only be "used" by services.

This will be much more visible and obvious when we split providers to separate repos. The ideal situation should be, that people who are from AWS should be subscribed to that single "amazon" provider repo to get notifications about all the AWS-specific changes they are interested in. And this authentication code clearly falls into this "bucket".

The Google Federated identity code is also a very good example here - while it is clearly GCP-related, this is AWS people who are interested to get it working and make sure they can connect to GCP (for example to be able ot Pull data from it etc.). They will be maintaining the code, not the Google people.

Taragolis · 2022-09-06T14:22:43Z

Wrong box to open :). This is not part of it.

But it do almost the same the same thing that PostgresHook.get_iam_token and MySqlHook.get_iam_token - grant access by use AWS IAM auth method. This methods exists for very long time (might be before even providers exists)

In case of airflow.providers.amazon.aws.hooks.base_aws.BaseSessionFactory._get_google_identity_token_loader it use GCP to auth in AWS. So if apply the same logic this method should live in Google Provider rather than Amazon Provider.

Taragolis · 2022-09-06T15:21:09Z

In the ideal world auth to different hooks should be pluggable. In this case user might be auth to Postgres by different ways (Explicit Credentials, pgpass, GSAPI, AWS IAM, etc) the same way also might work with Docker Registries (Explicit Credentials, ECR Get Token) as well as additional methods to auth into AWS Cloud.

Just for example Data Grip or Jetbrains Database Tools and SQL plugin. As soon as I installed third party plugin AWS Toolkit it enable additional auth methods to Postgres and MySQL

Comparison might be not correct Data Grip as well as Pro version of JB IDEs is commercial product and Airflow is OSS.

potiuk · 2022-09-11T23:05:09Z

Wrong box to open :). This is not part of it.

But it do almost the same the same thing that PostgresHook.get_iam_token and MySqlHook.get_iam_token - grant access by use AWS IAM auth method. This methods exists for very long time (might be before even providers exists)

In case of airflow.providers.amazon.aws.hooks.base_aws.BaseSessionFactory._get_google_identity_token_loader it use GCP to auth in AWS. So if apply the same logic this method should live in Google Provider rather than Amazon Provider.

Yes. the difference is WHAT is doing it. If this is the code that is under stewardship of AWS, they choose to do it and maintain it. In Postgres there is no-one to maintain it. This is the main criteria. You are using wrong comparision. It's not what code it is, but who is interested in maintainig it and reacting when there is a problem.

To put it simple. If there is in the future a problem with library of Google that will cause a problem with this operator - the AWS people will have to fix it. and if - for example - it causes us a conflict with other providers (say Google) we might choose not to release AWS provider until AWS people will fix the conflict. Plain and simple. AWS is impacted, they are fixing it. With spltting providers, I want AWS people to subscribe to aws repo and be aware of all the things they will need to fix.

When Google provider uses AWS for authentication - it still holds. It's the Google team that should fix any problems that might undermine their way of interfacing with AWS.

With Postgres, the situation is different. If the same problem happens with Postgres provider, It will hold Postgres provider from being released. So the problem with AWS will hold a new release of Postgres provider. This is not something we want.

Again - your parallells are not entirely correct - it's not "what code is needed". I agree it is similar code. But IMHO it does not matter. What matters is "who has the incentive to fix it if it is broken", This is fundamental difference. Think "human", "team", "responsibility", "governance" not "code".

In the ideal world auth to different hooks should be pluggable. In this case user might be auth to Postgres by different ways (Explicit Credentials, pgpass, GSAPI, AWS IAM, etc) the same way also might work with Docker Registries (Explicit Credentials, ECR Get Token) as well as additional methods to auth into AWS Cloud.

I don't think it's an "ideal world". It's just one of the choices. Right now in Airflow authentication is very closely connected to the Connection and corresponding Hook. Basically Auth is THE ONLY functionality that Hook provides on top of most of the integrations. You give it a connection id and it gives you back some form of a client that you can use. Other than that hooks are not very useful - they are passing through parameters to underlying "clients". Qute often, particular Hook returns you directly a Python client that you could instantiate and use yourself. The actual "benefit" of using Hooks is that you can pass it the connection id. That's pretty much it. And this is big and important feature.

Currently, the way to implement different authentications it is through inheritance (if we want to do handling at the Hook level). If you need an extended Hook with new authentication, you extend one that is there and add the authentication. This fits very well Airflow's model - where Hook is a class and it is extendable and can be passed a connection id. Postgres Hook should get postgres connection id and so on. If you want to implement Postgres Hook with the conneciton Id from Google, you should implement a GooglePostgresHook "child" that can take Google Connection as initialization param. Or if you want to use AWS authentication - implement an AWSPostgres Hook "child" that can take "AWS connection" as initialization param. And this is the model used currently. This is a different model than "pluggable" authentication - where you "extend" rather tha "inject" authentication.

We might have different view on which model is better. I personally think the "pluggable authentication" is far more difficult to make into a generic solution that is applicable to multiple types of hook, I'd say airflow's model where you extend functionality and add authentication as you need it in "child" class is much better - as it saves you to define a common "authenticaiton" layer - that would have to handle a lot of complexities - a good example is impersonation chain from GCP which is very specific for GCP, and implementing common "authentication" model which would handle, GCP, Azure, AWS, Databricks, and big number of other options - via HTTP, proprietary database connection types, OpenID, Oauth etc is far more complex and error-prone.

But no-matter what our thinking on more/less complex is - the "Extension" model is what is being used in Airflow.

Taragolis · 2022-09-15T12:15:52Z

Again - your parallells are not entirely correct - it's not "what code is needed". I agree it is similar code. But IMHO it does not matter. What matters is "who has the incentive to fix it if it is broken", This is fundamental difference. Think "human", "team", "responsibility", "governance" not "code".

It depends which way you look. For me it look like the same some additional method added years ago and not well maintained and documented:

IAM auth in Postgres Hook [AIRFLOW-4417] Add AWS IAM authenication for PostgresHook #5223
Google federation for AWS Add authentication to AWS with Google credentials #12079

The reason why IAM exists in postgres provider and not in amazon as well as google federation in amazon provide and not in google it is clear for me - no other way exists for extend authentication.

To be clear I'm agree that auth by IAM from postgres/mysql should move out from postgres provider. I just think about appropriate replacement which might easy to maintain and easy add additional integration with other cloud providers.

Currently, the way to implement different authentications it is through inheritance

The issue that right now if you extend hook than you need to also implements all operators.
So if we create AwsPostgresHook than also need to implement in additional all existed operators and task flow decorators like AwsPostgresToGCSOperator.

It could be done as well as done for DBApiHook, however DBApiHook not only about auth and also it about support specific protocols for specific database.

However for Postgres, Docker, MySQL and might be other it only about how to implements the way how to obtain credentials and pass for specific client. It might be expect execute some class. In this case it might required some changes in core, so it not available in providers until min version of airflow will meet requirements.

And right might be it more about create some AIP and discuss with different approaches and benefits, WDYT?

Taragolis · 2022-09-15T12:26:15Z

@mik-laj @potiuk Let's finalise discussion. We not ready to add auth to Amazon ECR by this way it described in the PR for different reasons:

We need to move auth to AWS from postgres provider to amazon
We can not use import_string for untrusted modules but right now but if we move EcrDockerCredentialHelper to aws I do not see any possibility to do that without specify it in somewhere in postgres hook.

So might be some additional way to help user to integrate with ECR if it required? It could be some page in to documentation like this snippet #9707 (comment) or create Operator which could update specific docker connection by ECR credentials

WDYT guys?

potiuk · 2022-09-19T06:30:04Z

Why not using the oppoertunity to add the auth mechanism now? I do not think this needs a super-generic solution that will work for all hooks. I think there are a few operators that really need this kind of pluggable authentication, I think this DockerHook change is a good opportunity to add it.

I would say we need something very similar that requests library uses, where you can pass auth object https://requests.readthedocs.io/en/latest/user/authentication/ - and you could add the "auth" class to pass to DockerHook and Docker operators to pass "DockerAuth" Protocol / Abstract class that would handle the authentication.

Once we get it implemented here, we could apply similar approach to Postgres and fix the current behaviour and any other class that has it done in a similar class. I think there could be various auth Classes in "Amazon" - one for Docker, one for Postgres, etc. and they could use some common code if needed.

Do you think of any drawbacks of the approach @Taragolis ?

Taragolis · 2022-09-19T11:04:50Z

@potiuk Right now I thought this is only one approach which could cover:

Auth live in provider packages or user-defined scripts
Do not use import_string

It is only required small changes in current operators and do not need re-implement everything in case if it required only the way of Auth. This PR also in draft by the one reason I do not have an idea the best place where write the documentation how to use Auth.

It might be:

In docker provider just information how to implements custom and what actually DockerHook send to this class
In amazon provider information about what user should provide in ECR related class

And also I would like to add some information about other Hooks and something which might (or might not) be nixe implemented in the future. Which not related to current PR

HttpHook

Right now my daily workload do not required but couple years ago we were using it very active. I do not remember might be in 1.10.4 HttpHook do not even have this parameter for request Auth (today I lazy to check it).

But major issue currently you need to create new hook and overwrite get_conn method if you need provide not only login and password from connection. So may be it also good idea to implement some generic way to grab credentials from Connection and provide into HttpHook, so it wouldn't require re-create hook and operators in cases if only required custom Auth

I found this code from the old project (note: code created for 1.10.x)

Custom Auth

class BearerAuth(AuthBase):
    """ Bearer Authorization implementation for requests """

    def __init__(self, token):
        self._token = token

    def __call__(self, r):
        r.headers["Authorization"] = "Bearer %s" % self._token
        return r

class SomeDumbAuth(AuthBase):
    """ Authorization for some private API """

    def __init__(self, key, secret):
        self._key = key
        self._secret = secret

    def __call__(self, r):
        if not r.body:
            r.body = ""
        if r.body:
            b = {x.split("=")[0]: x.split("=")[1] for x in r.body.split("&")}
        else:
            b = {}

        b.update({
            "some_key": self._key,
            "some_secret": self._secret,
        })
        r.body = "&".join("%s=%s" % (k, v) for k,v in b.items())

        return r

class AppStoreAuth(AuthBase):
    """ AppStore Authorization implementation for requests """

    def __init__(self, private_key, key_id, issuer_id):
        self._private_key = private_key
        self._key_id = key_id
        self._issuer_id = issuer_id

    def __call__(self, r):
        headers = {
            "alg": 'ES256',
            "kid": self._key_id,
            "typ": "JWT"
        }
        payload = {
            "iss": self._issuer_id,
            "exp": int((datetime.now() + timedelta(minutes=20)).strftime("%s")),
            "aud": "appstoreconnect-v1"
        }

        token = jwt.encode(
            payload=payload,
            key=self._private_key,
            algorithm='ES256',
            headers=headers
        ).decode(encoding="utf-8")

        r.headers["Authorization"] = "Bearer %s" % token
        return r

Hook which use custom auth from conn

class AppStoreSalesHttpHook(HttpHook):
    """ HTTP Hook for AppStore connect API """

    def __init__(self, endpoint, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.endpoint = endpoint

    def get_conn(self, headers: dict = None):
        """ Returns http requests session for AppStore connect use with requests

        Args:
            headers: additional headers to be passed through as a dictionary
        """
        session = requests.Session()
        if self.http_conn_id:
            conn = self.get_connection(self.http_conn_id)

            if conn.password:
                private_key = conn.password.replace("\\n", "\n")
                key_id = conn.extra_dejson.get('KeyId')
                issuer_id = conn.extra_dejson.get('IssuerId')

                session.auth = AppStoreAuth(
                    private_key=private_key,
                    key_id=key_id,
                    issuer_id=issuer_id,
                )
            else:
                raise ValueError("Missing extra parameters for connection %r" % self.http_conn_id)

            if conn.host and "://" in conn.host:
                self.base_url = conn.host
            else:
                # schema defaults to HTTP
                schema = conn.schema if conn.schema else "http"
                host = conn.host if conn.host else ""
                self.base_url = schema + "://" + host

            if conn.port:
                self.base_url = self.base_url + ":" + str(conn.port)

        if headers:
            session.headers.update(headers)

        return session

PostgresHook

It is not a big deal to change in hook to use custom Auth. One thing what I would change it drop Redshift support from PostgresHook

airflow/airflow/providers/postgres/hooks/postgres.py

Lines 191 to 205 in 6045f7a

    
           if conn.extra_dejson.get('redshift', False): 
        
               port = conn.port or 5439 
        
               # Pull the custer-identifier from the beginning of the Redshift URL 
        
               # ex. my-cluster.ccdre4hpd39h.us-east-1.redshift.amazonaws.com returns my-cluster 
        
               cluster_identifier = conn.extra_dejson.get('cluster-identifier', conn.host.split('.')[0]) 
        
               redshift_client = AwsBaseHook(aws_conn_id=aws_conn_id, client_type="redshift").conn 
        
               # https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/redshift.html#Redshift.Client.get_cluster_credentials 
        
               cluster_creds = redshift_client.get_cluster_credentials( 
        
                   DbUser=login, 
        
                   DbName=self.schema or conn.schema, 
        
                   ClusterIdentifier=cluster_identifier, 
        
                   AutoCreate=False, 
        
               ) 
        
               token = cluster_creds['DbPassword'] 
        
               login = cluster_creds['DbUser']

In Amazon provider there is two different ways (and two different hooks) how to interact with Redshift

DB-API by redshift-connector
By AWS API and boto3

Event thought Redshift use PostgreSQL protocol I'm not sure that psycopg2 officially support Redshift or not but psycopg (v3) not officially supported it - psycopg/psycopg#122 .

MySQL

I'm still not sure that the current implementation actually allow use AWS IAM (need to check).
But also for MySQL Auth class need to have ability for specify which driver user actually use

airflow/airflow/providers/mysql/hooks/mysql.py

Lines 169 to 179 in 6045f7a

    
           if client_name == 'mysqlclient': 
        
               import MySQLdb 
        
               conn_config = self._get_conn_config_mysql_client(conn) 
        
               return MySQLdb.connect(**conn_config) 
        
           if client_name == 'mysql-connector-python': 
        
               import mysql.connector 
        
               conn_config = self._get_conn_config_mysql_connector_python(conn) 
        
               return mysql.connector.connect(**conn_config)

Extensible Connection / Pluggable Auth

In my head right now this part a bit more than just Auth.
Currently Airflow Connection might use for different things.

For authentication
For configuration client
Parameters for API call (like Amazon EMR Connection, which actually not a connection and use only in one place)

The main problem how it showed in the UI and how it stored. Good sample is AWS Connection:

Quite a few different method and extra parameters which actually won't work together
boto3 configurations

Extensible Connection might show different ui depend on select Auth Type, also it would be nice have different tabs for Auth and Configurations.

But again it is just an idea which in my head might be brilliant but in real world required a lot of changes.
Also I'm not even try to draw some scratches on paper how it would work and integrate with airflow and components.

potiuk · 2022-09-20T12:03:28Z

Yep. I want to see how your proposal will look like after rebase, but yeah - the smaller set of changes, the better for now.

Taragolis · 2022-09-20T12:50:01Z

@potiuk I think it would required small changes if compare to the current PR.

Rename from BaseDockerCredentialHelper to something like BaseDockerRegistryAuth credential helper might confuse some user - in docker stack already exists utilities with the same name.
Move ECR related to amazon provider
Assign as Hook parameter rather than import_string
Some small changes in operator for allow provide this classes

But year better have a look after I make changes. I hope it will happen within a couple days.

potiuk · 2022-09-20T13:34:48Z

Fantastic!

github-actions · 2022-11-05T00:14:37Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

Taragolis · 2022-11-05T17:51:17Z

Will rebase and make changes. Totally forgot about this PR

github-actions · 2022-12-21T00:10:53Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

boring-cyborg bot added area:providers kind:documentation labels Sep 5, 2022

Taragolis force-pushed the ecr-docker-credentials-helper branch from 62a2a64 to 5467972 Compare September 5, 2022 14:23

DockerHook: obtain credentials and login to Amazon ECR

2721a0d

Taragolis force-pushed the ecr-docker-credentials-helper branch from 5467972 to 2721a0d Compare September 5, 2022 17:15

mik-laj reviewed Sep 5, 2022

View reviewed changes

kokorin mentioned this pull request Sep 6, 2022

Fix: git clone on Windows #26105

Merged

Taragolis mentioned this pull request Sep 18, 2022

Remove unsafe imports in Slack API Connection #26459

Merged

github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Nov 5, 2022

eladkal removed the stale Stale PRs per the .github/workflows/stale.yml policy file label Nov 5, 2022

This was referenced Dec 9, 2022

Add Amazon Elastic Container Registry (ECR) Hook #28279

Merged

make docker operators always use DockerHook for API calls #28363

Merged

github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Dec 21, 2022

github-actions bot closed this Dec 26, 2022

Taragolis mentioned this pull request Jan 19, 2023

Resolve RDS cname #29044

Closed

Taragolis mentioned this pull request Aug 22, 2023

PoC: Implement Docker Registry Auth Protocol #33602

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DockerHook: obtain credentials and login to Amazon ECR #26162

DockerHook: obtain credentials and login to Amazon ECR #26162

Taragolis commented Sep 5, 2022 •

edited

Loading

potiuk commented Sep 5, 2022

mik-laj Sep 5, 2022

mik-laj Sep 5, 2022 •

edited

Loading

Taragolis Sep 6, 2022

Taragolis Sep 6, 2022

mik-laj Sep 6, 2022

mik-laj Sep 6, 2022 •

edited

Loading

mik-laj Sep 6, 2022

Taragolis Sep 6, 2022 •

edited

Loading

mik-laj Sep 6, 2022

Taragolis Sep 6, 2022

mik-laj commented Sep 6, 2022

Taragolis commented Sep 6, 2022

Taragolis commented Sep 6, 2022

potiuk commented Sep 6, 2022 •

edited

Loading

potiuk commented Sep 6, 2022 •

edited

Loading

Taragolis commented Sep 6, 2022

potiuk commented Sep 6, 2022 •

edited

Loading

Taragolis commented Sep 6, 2022 •

edited

Loading

Taragolis commented Sep 6, 2022

potiuk commented Sep 11, 2022 •

edited

Loading

Taragolis commented Sep 15, 2022

Taragolis commented Sep 15, 2022

potiuk commented Sep 19, 2022

Taragolis commented Sep 19, 2022

potiuk commented Sep 20, 2022

Taragolis commented Sep 20, 2022

potiuk commented Sep 20, 2022

github-actions bot commented Nov 5, 2022

Taragolis commented Nov 5, 2022

github-actions bot commented Dec 21, 2022

	if _operator_link_class_path in get_operator_extra_links():
	single_op_link_class = import_string(_operator_link_class_path)

DockerHook: obtain credentials and login to Amazon ECR #26162

DockerHook: obtain credentials and login to Amazon ECR #26162

Conversation

Taragolis commented Sep 5, 2022 • edited Loading

potiuk commented Sep 5, 2022

mik-laj Sep 5, 2022

Choose a reason for hiding this comment

mik-laj Sep 5, 2022 • edited Loading

Choose a reason for hiding this comment

Taragolis Sep 6, 2022

Choose a reason for hiding this comment

Taragolis Sep 6, 2022

Choose a reason for hiding this comment

mik-laj Sep 6, 2022

Choose a reason for hiding this comment

mik-laj Sep 6, 2022 • edited Loading

Choose a reason for hiding this comment

mik-laj Sep 6, 2022

Choose a reason for hiding this comment

Taragolis Sep 6, 2022 • edited Loading

Choose a reason for hiding this comment

mik-laj Sep 6, 2022

Choose a reason for hiding this comment

Taragolis Sep 6, 2022

Choose a reason for hiding this comment

mik-laj commented Sep 6, 2022

Taragolis commented Sep 6, 2022

Taragolis commented Sep 6, 2022

potiuk commented Sep 6, 2022 • edited Loading

potiuk commented Sep 6, 2022 • edited Loading

Taragolis commented Sep 6, 2022

potiuk commented Sep 6, 2022 • edited Loading

Taragolis commented Sep 6, 2022 • edited Loading

Taragolis commented Sep 6, 2022

potiuk commented Sep 11, 2022 • edited Loading

Taragolis commented Sep 15, 2022

Taragolis commented Sep 15, 2022

potiuk commented Sep 19, 2022

Taragolis commented Sep 19, 2022

HttpHook

Custom Auth

Hook which use custom auth from conn

PostgresHook

MySQL

Extensible Connection / Pluggable Auth

potiuk commented Sep 20, 2022

Taragolis commented Sep 20, 2022

potiuk commented Sep 20, 2022

github-actions bot commented Nov 5, 2022

Taragolis commented Nov 5, 2022

github-actions bot commented Dec 21, 2022

Taragolis commented Sep 5, 2022 •

edited

Loading

mik-laj Sep 5, 2022 •

edited

Loading

mik-laj Sep 6, 2022 •

edited

Loading

Taragolis Sep 6, 2022 •

edited

Loading

potiuk commented Sep 6, 2022 •

edited

Loading

potiuk commented Sep 6, 2022 •

edited

Loading

potiuk commented Sep 6, 2022 •

edited

Loading

Taragolis commented Sep 6, 2022 •

edited

Loading

potiuk commented Sep 11, 2022 •

edited

Loading