-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DockerHook: obtain credentials and login to Amazon ECR #26162
Conversation
62a2a64
to
5467972
Compare
5467972
to
2721a0d
Compare
I think this is good idea, but the ECRDockerCredentialHelper should be part of Amazon provider, and likely there should be an ECRDockerOperator that uses it by default. This is very similar to what we have for example in GKEStartPodOperator which is based on generic KubernetesPodOperator and pretty much does only authentication specific for GCP. I think this is a very similar case. |
credential_helper = AirflowConnectionDockerCredentialHelper | ||
credential_helper_kwargs = {} | ||
else: | ||
credential_helper = import_string(credential_helper) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not secure. We should not load user-defined classes as this makes our application vulnerable to CWE-502: Deserialization of Untrusted Data weakness.
We should change the logic so that it is not needed as @poituk suggest, or add a list of allowed classes as is done during DAG deserialization. See:
airflow/airflow/serialization/serialized_objects.py
Lines 999 to 1000 in 5b216e9
if _operator_link_class_path in get_operator_extra_links(): | |
single_op_link_class = import_string(_operator_link_class_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know also Secrets Backends, REST API auth_backends and Amazon Session Factory also load by import_string
method without any validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also personally I add import retry handlers to SlackHook by #25852
If it security issue it could be removed. No slack provider released since this feature added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Secrets Backends, REST API auth_backends
In this case, the values are not controlled by the user (e.g. via the Web UI), but by the administrator. You must be able to change the files on the disk to change the secret backend or auth backend.
The problem occurs when the value of the import_string
parameter is controlled remotely, e.g. it is read from the database.
In the case of Slack, we should fix it so that only allowed/safe values can be imported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need for the class to be specially downloaded, as the class/function may already exist in any packages installed in the system, but we can misuse it.
I will have a hard time giving an example for Python now, but here you can see what an attack looks like in Java.
https://wololo.net/2022/06/11/ps5-ps4-hacker-theflow-discloses-blu-ray-disc-exploit-toolchain-ps5-piracy-not-a-matter-of-if-but-when/
The class com.sony.gemstack.org.dvb.user.UserPreferenceManagerImpl deserializes the userprefs file under privileged context using readObject() which is insecure
The class com.oracle.security.Service contains a method newInstance which calls Class.forName on an arbitrary class name. This allows arbitrary classes, even restricted ones (for example in sun.), to be instantiated.
The class com.sony.gemstack.org.dvb.io.ixc.IxcProxy contains the protected method invokeMethod which can call methods under privileged context. Permission checks in methods can be bypassed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, we would have to look for a class or function or method that takes a conn
argument or ignores unknown key arguments in order to be able to exploit that class, but it can be any class in any package, so we have quite a lot of potential attack vector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean most possible attack it is the fact that both import_string
or import
load module first.
Some sample
# airflow.providers.exploit.hooks.some_db_api
def unsafe_code():
"""Grab fernet key from configs, nudes, decode, send to someone and post on Twitter."""
...
unsafe_code()
class SomeDbApiHook(DbApiHook):
conn_type = "awesome_conn_type"
...
So it would be the same if user call
from airflow.providers.exploit.hooks.some_db_api import SomeDbApiHook
import_string("airflow.providers.exploit.hooks.some_db_api.SomeDbApiHook")
- Or use airflow.providers.slack.transfers.sql_to_slack.SqlToSlackOperator in Airflow < 2.3 with connection type referenced to
awesome_conn_type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DAG File is considered trusted content as it is recommended that it passes through code review. Changing the value in the database does not require code review, so there is a risk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
conn_id
also part of the DAG, so it should pass the review as well as all components which DAG expected to use.
Let's imagine that someone change connection after DAG deployed (some one with Admin or Op role) that could be mainly by two reason
- Someone with access to change connections compromised their account and "bad guy" also have access to Airflow which mostly intend to live in private subnets with access by proxy, bastion/jump hosts and VPNs.
- Someone with Admin or Op is "bad guy/imposer"
If some kind of exploit already in environment there is no problem to use Option 2 or might be Option 3 - it also uses import_string for DB Api Hooks.
But also it a lot of Operator expected to run template especially BashOperator might not be safe if templates stored in Airflow Variables. 'bash_command' + 'env' = run whatever you want.
One side effect of all above - required to much steps and a lot of things have to happen: docker installed, bash operators uses with Variables, credentials of airflow leaked, Airflow webserver exposed to the Internet.
Don't get me wrong I've also think that too much security Is Never Enough.
But I believe that more chance that user expose somehow cloud credentials which use by Airflow (a lot of chances with this creds has access to almost everything): e.g. some cred info stored in extra field in Connection.
Or also almost all providers which grant ability to send messages e.g.: Slack, Discord, Telegram and MS Teams (just a PR) has ability to get token/credentials directly in operator code (without connections) and do not mask it as secrets (actually it doesn't required because user will store directly in DAG code). A lot of chance that it expose in logs especially if it uses HttpHook. Some "bad guy" could easily use this creds for send messages in corporate messenger - e.g. nice link to change domain user/password.
@potiuk FYI: The same problem exists in Postgres hook:
|
Yeah it could be. I put it into docker provider for reduce complexity of dependency.
Why I move this part inside of DockerHook hook:
|
And also AWS RDS IAM auth exists in MySQLHook. However I don't think it even works in MySQL case airflow/airflow/providers/mysql/hooks/mysql.py Lines 215 to 220 in 5b216e9
|
I tihnk this should be changed then also in Postgres and MySQL :) . The approach wwhere "all AWS" goes to AWS actually makes dependencies less not more complex for the user. I thik we should avoid any kind of circular dependencies and we won't avoid them if "mysql" provider refers to "aws" one. Both Postgres and MySQL impementation with local imports are making the dependencies equally (or even more) complex, but they are worse because they are "hidden". If you want to reason for example "which version of AWS provider we should have in order to make Postgres work with AWS" - it is impossible to answer, because by using local imports we are hiding the fact that there is depedency (but is there but implicit and hidden). I think making an explicit dependency chain AWS -> POSTGRES, MSQL -> COMMON.SQL is the only reasonable approach. and moving "AWS specific Postgres -related classes" should be in AWS.
IMHO This is our ultimate goal even if there are are cases that are not following this. |
It should be done in discoverable way. The very same how in "common.sql" we instantiate Hook based on "connection_id" (without knowing the type of the hook upfront). The only requirement is that it extends DBApiHook. We should have a common authentication interface implemented by Docker Operator, and some functionality in the Hook to allow the "authentication" to be discovered and used. |
Let's open Pandora box. What we should do with this ones? No dependencies in provider (exists for couple years) airflow/airflow/providers/amazon/aws/hooks/base_aws.py Lines 240 to 245 in 6931fbf
Local imports from google provider airflow/airflow/providers/amazon/aws/hooks/base_aws.py Lines 303 to 310 in 6931fbf
Thats is only that I know, might be also some of the same stuff exists somewhere.
I think we should have some common authentication interface for all Hooks which implements connection ability. |
Wrong box to open :). This is not part of it. Both of the examples you mentioned @Taragolis are clearly AWS ones they need no fix as they are in the right place. They are (and should be) in AWS. In Both cases the "USER" of the authentication (AWS) has dependency on the functionality they use. Having such a functionality for optional AWS <> Google dependency is perfectly fine and it is even "blessed" in our package system. the amazon provider has [google] extra and google provider has [amazon] extra. We even have an exception specially foreseen for that - not yet heavily used but once we split providers into separate repos I was planning to consistently apply it in a places that do not have it.
Those two stories are different thatn here is very easy (even if not "technical" - this is more looking at the landscape of our providers from the business side of things than pure interface/API and it is more based on likelihood of being better maintained than anything else). The decision where to put so code that is in-between is bound more to whether there are clear stakeholders behind the provider that mostly maintain the provider and are interested in having this functionality in. It is captured in a few places already. We have Release process for providers but also we already used the same line of thoughts when we decided where to put transfer operators. See AIP-21 Changes in import paths - and also touched upon in AIP-8 - Split Provider packages for Airflow 2.0. One of the decisions we made there is that the the "transfer" operators are put in the "target" of the transfer when there is a clear stakeholder behind the target. And in this case circular references are unavoidable. To sum up the linne of thoughts in one sentence: the idea is that the code should be put in the "provider" where there is a stakeholder that is most likely to maintain the code. And 'Postgres' and 'MySQL' are different. They have no clear stakeholders behind that are interested in maintaining those. Even though they are commercial databases with companies behind, they are 'commodity' APIS and they are not interested in maintaining services. And neither Postgres nor MySQL are insterested in adding interfacing with AWS/GCP to the provider. But both AWS and GCP are respectively interested to allow AWS/GCP authentication WITH the Postgres/MySQL provider they expose in their own Services offering. This has been discussed for a long time when AIP-21 was discussed - there are different kinds of providers. Simply speaking SAML/GSSAPI are "protocols", there are also "databases" (like Postgres/MySQL) and then there are "Services" which are higher layer of abstraction and while "Services" can use "Protocols" and "Databases", neither "Protocols" nor "Databases" should use "Services" - they can only be "used" by services. This will be much more visible and obvious when we split providers to separate repos. The ideal situation should be, that people who are from AWS should be subscribed to that single "amazon" provider repo to get notifications about all the AWS-specific changes they are interested in. And this authentication code clearly falls into this "bucket". The Google Federated identity code is also a very good example here - while it is clearly GCP-related, this is AWS people who are interested to get it working and make sure they can connect to GCP (for example to be able ot Pull data from it etc.). They will be maintaining the code, not the Google people. |
But it do almost the same the same thing that In case of |
In the ideal world auth to different hooks should be pluggable. In this case user might be auth to Postgres by different ways (Explicit Credentials, pgpass, GSAPI, AWS IAM, etc) the same way also might work with Docker Registries (Explicit Credentials, ECR Get Token) as well as additional methods to auth into AWS Cloud. Just for example Data Grip or Jetbrains Database Tools and SQL plugin. As soon as I installed third party plugin AWS Toolkit it enable additional auth methods to Postgres and MySQL Comparison might be not correct Data Grip as well as Pro version of JB IDEs is commercial product and Airflow is OSS. |
Yes. the difference is WHAT is doing it. If this is the code that is under stewardship of AWS, they choose to do it and maintain it. In Postgres there is no-one to maintain it. This is the main criteria. You are using wrong comparision. It's not what code it is, but who is interested in maintainig it and reacting when there is a problem. To put it simple. If there is in the future a problem with library of Google that will cause a problem with this operator - the AWS people will have to fix it. and if - for example - it causes us a conflict with other providers (say Google) we might choose not to release AWS provider until AWS people will fix the conflict. Plain and simple. AWS is impacted, they are fixing it. With spltting providers, I want AWS people to subscribe to aws repo and be aware of all the things they will need to fix. When Google provider uses AWS for authentication - it still holds. It's the Google team that should fix any problems that might undermine their way of interfacing with AWS. With Postgres, the situation is different. If the same problem happens with Postgres provider, It will hold Postgres provider from being released. So the problem with AWS will hold a new release of Postgres provider. This is not something we want. Again - your parallells are not entirely correct - it's not "what code is needed". I agree it is similar code. But IMHO it does not matter. What matters is "who has the incentive to fix it if it is broken", This is fundamental difference. Think "human", "team", "responsibility", "governance" not "code".
I don't think it's an "ideal world". It's just one of the choices. Right now in Airflow authentication is very closely connected to the Connection and corresponding Hook. Basically Auth is THE ONLY functionality that Hook provides on top of most of the integrations. You give it a connection id and it gives you back some form of a client that you can use. Other than that hooks are not very useful - they are passing through parameters to underlying "clients". Qute often, particular Hook returns you directly a Python client that you could instantiate and use yourself. The actual "benefit" of using Hooks is that you can pass it the connection id. That's pretty much it. And this is big and important feature. Currently, the way to implement different authentications it is through inheritance (if we want to do handling at the Hook level). If you need an extended Hook with new authentication, you extend one that is there and add the authentication. This fits very well Airflow's model - where Hook is a class and it is extendable and can be passed a connection id. Postgres Hook should get postgres connection id and so on. If you want to implement Postgres Hook with the conneciton Id from Google, you should implement a GooglePostgresHook "child" that can take Google Connection as initialization param. Or if you want to use AWS authentication - implement an AWSPostgres Hook "child" that can take "AWS connection" as initialization param. And this is the model used currently. This is a different model than "pluggable" authentication - where you "extend" rather tha "inject" authentication. We might have different view on which model is better. I personally think the "pluggable authentication" is far more difficult to make into a generic solution that is applicable to multiple types of hook, I'd say airflow's model where you extend functionality and add authentication as you need it in "child" class is much better - as it saves you to define a common "authenticaiton" layer - that would have to handle a lot of complexities - a good example is impersonation chain from GCP which is very specific for GCP, and implementing common "authentication" model which would handle, GCP, Azure, AWS, Databricks, and big number of other options - via HTTP, proprietary database connection types, OpenID, Oauth etc is far more complex and error-prone. But no-matter what our thinking on more/less complex is - the "Extension" model is what is being used in Airflow. |
It depends which way you look. For me it look like the same some additional method added years ago and not well maintained and documented:
The reason why IAM exists in postgres provider and not in amazon as well as google federation in amazon provide and not in google it is clear for me - no other way exists for extend authentication. To be clear I'm agree that auth by IAM from postgres/mysql should move out from postgres provider. I just think about appropriate replacement which might easy to maintain and easy add additional integration with other cloud providers.
The issue that right now if you extend hook than you need to also implements all operators. It could be done as well as done for DBApiHook, however DBApiHook not only about auth and also it about support specific protocols for specific database. However for Postgres, Docker, MySQL and might be other it only about how to implements the way how to obtain credentials and pass for specific client. It might be expect execute some class. In this case it might required some changes in core, so it not available in providers until min version of airflow will meet requirements. And right might be it more about create some AIP and discuss with different approaches and benefits, WDYT? |
@mik-laj @potiuk Let's finalise discussion. We not ready to add auth to Amazon ECR by this way it described in the PR for different reasons:
So might be some additional way to help user to integrate with ECR if it required? It could be some page in to documentation like this snippet #9707 (comment) or create Operator which could update specific docker connection by ECR credentials WDYT guys? |
Why not using the oppoertunity to add the auth mechanism now? I do not think this needs a super-generic solution that will work for all hooks. I think there are a few operators that really need this kind of pluggable authentication, I think this DockerHook change is a good opportunity to add it. I would say we need something very similar that requests library uses, where you can pass auth object https://requests.readthedocs.io/en/latest/user/authentication/ - and you could add the "auth" class to pass to DockerHook and Docker operators to pass "DockerAuth" Protocol / Abstract class that would handle the authentication. Once we get it implemented here, we could apply similar approach to Postgres and fix the current behaviour and any other class that has it done in a similar class. I think there could be various auth Classes in "Amazon" - one for Docker, one for Postgres, etc. and they could use some common code if needed. Do you think of any drawbacks of the approach @Taragolis ? |
@potiuk Right now I thought this is only one approach which could cover:
It is only required small changes in current operators and do not need re-implement everything in case if it required only the way of Auth. This PR also in draft by the one reason I do not have an idea the best place where write the documentation how to use Auth. It might be:
And also I would like to add some information about other Hooks and something which might (or might not) be nixe implemented in the future. Which not related to current PR HttpHookRight now my daily workload do not required but couple years ago we were using it very active. I do not remember might be in 1.10.4 HttpHook do not even have this parameter for request Auth (today I lazy to check it). But major issue currently you need to create new hook and overwrite I found this code from the old project (note: code created for 1.10.x) Custom Authclass BearerAuth(AuthBase):
""" Bearer Authorization implementation for requests """
def __init__(self, token):
self._token = token
def __call__(self, r):
r.headers["Authorization"] = "Bearer %s" % self._token
return r class SomeDumbAuth(AuthBase):
""" Authorization for some private API """
def __init__(self, key, secret):
self._key = key
self._secret = secret
def __call__(self, r):
if not r.body:
r.body = ""
if r.body:
b = {x.split("=")[0]: x.split("=")[1] for x in r.body.split("&")}
else:
b = {}
b.update({
"some_key": self._key,
"some_secret": self._secret,
})
r.body = "&".join("%s=%s" % (k, v) for k,v in b.items())
return r class AppStoreAuth(AuthBase):
""" AppStore Authorization implementation for requests """
def __init__(self, private_key, key_id, issuer_id):
self._private_key = private_key
self._key_id = key_id
self._issuer_id = issuer_id
def __call__(self, r):
headers = {
"alg": 'ES256',
"kid": self._key_id,
"typ": "JWT"
}
payload = {
"iss": self._issuer_id,
"exp": int((datetime.now() + timedelta(minutes=20)).strftime("%s")),
"aud": "appstoreconnect-v1"
}
token = jwt.encode(
payload=payload,
key=self._private_key,
algorithm='ES256',
headers=headers
).decode(encoding="utf-8")
r.headers["Authorization"] = "Bearer %s" % token
return r Hook which use custom auth from connclass AppStoreSalesHttpHook(HttpHook):
""" HTTP Hook for AppStore connect API """
def __init__(self, endpoint, *args, **kwargs):
super().__init__(*args, **kwargs)
self.endpoint = endpoint
def get_conn(self, headers: dict = None):
""" Returns http requests session for AppStore connect use with requests
Args:
headers: additional headers to be passed through as a dictionary
"""
session = requests.Session()
if self.http_conn_id:
conn = self.get_connection(self.http_conn_id)
if conn.password:
private_key = conn.password.replace("\\n", "\n")
key_id = conn.extra_dejson.get('KeyId')
issuer_id = conn.extra_dejson.get('IssuerId')
session.auth = AppStoreAuth(
private_key=private_key,
key_id=key_id,
issuer_id=issuer_id,
)
else:
raise ValueError("Missing extra parameters for connection %r" % self.http_conn_id)
if conn.host and "://" in conn.host:
self.base_url = conn.host
else:
# schema defaults to HTTP
schema = conn.schema if conn.schema else "http"
host = conn.host if conn.host else ""
self.base_url = schema + "://" + host
if conn.port:
self.base_url = self.base_url + ":" + str(conn.port)
if headers:
session.headers.update(headers)
return session PostgresHookIt is not a big deal to change in hook to use custom Auth. One thing what I would change it drop Redshift support from PostgresHook airflow/airflow/providers/postgres/hooks/postgres.py Lines 191 to 205 in 6045f7a
In Amazon provider there is two different ways (and two different hooks) how to interact with Redshift
Event thought Redshift use PostgreSQL protocol I'm not sure that MySQLI'm still not sure that the current implementation actually allow use AWS IAM (need to check). airflow/airflow/providers/mysql/hooks/mysql.py Lines 169 to 179 in 6045f7a
Extensible Connection / Pluggable AuthIn my head right now this part a bit more than just Auth.
The main problem how it showed in the UI and how it stored. Good sample is AWS Connection:
Extensible Connection might show different ui depend on select Auth Type, also it would be nice have different tabs for Auth and Configurations. But again it is just an idea which in my head might be brilliant but in real world required a lot of changes. |
Yep. I want to see how your proposal will look like after rebase, but yeah - the smaller set of changes, the better for now. |
@potiuk I think it would required small changes if compare to the current PR.
But year better have a look after I make changes. I hope it will happen within a couple days. |
Fantastic! |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
Will rebase and make changes. Totally forgot about this PR |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
Amazon ECR Private registry do not have permanent credentials. Users required manually obtain credentials and renew them every 12 hours or use amazon-ecr-credential-helper.
This PR allow DockerHook get credentials not only from Airflow Connection but also use internal BaseDockerCredentialHelper class for obtain credentials to private registries which not support permanent/service credentials