-
Notifications
You must be signed in to change notification settings - Fork 13.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get connections uri with AWS Secrets Manager backend #9008
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add tests
Sure, but what kind of tests? I assumed that since there are already tests, if they work with the new script it'll be fine. |
I get the error |
Love the feature! (And I had this in mind when reviewing the original secrets backend) |
Tests that check that you get set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the very least this PR needs heavy tests.
Secondly: why do we need to use the AST at all? can you explain your design?
variables_prefix: str = 'airflow/variables', | ||
profile_name: Optional[str] = None, | ||
sep: str = "/", | ||
connections_prefix=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do these need to be None? this controls where in the secrets hierarchy to look at, so should be a required field, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was because I was not able to make it works using connections prefix, so I ended up ignoring them. In my set up those values are None and I just type the entire name of the secret.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following should work in your case (can you try):
[secrets]
backend = airflow.contrib.secrets.aws_secrets_manager.SecretsManagerBackend
backend_kwargs = {"connections_prefix": "", "variables_prefix": "", "sep": ""}
The way this is implemented is far to specialized, and hacky. They way I would expect this to work:
This way we an support arbitrary extra parameters, i.e. the all the extra params for a Connection too, and we don't ever need to use Probably something like: if the name is not one of conn_type, host, login, password, schema, or port, extra then it gets fed in to the extra dict. (That way you can specify extra fields individually in Secrets Manager without having to manage a JSON value.) |
profile_name: Optional[str] = None, | ||
sep: str = "/", | ||
connections_prefix=None, | ||
variables_prefix=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also breaks backwards-compatibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. It's doesn't break, because we always use keyword arguments.
https://airflow.readthedocs.io/en/latest/configurations-ref.html#backend-kwargs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it does. If someone passes nothing, the default it used to take is airflow/variables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, I don't see that these should be required arguments. In my current company, we share an AWS account with other departments. As so, we have to start our secrets with the prefix 'data_', hence in my current situation it does make sense to have connections_prefix='data'
. However, in my previous company, we as BI team had an AWS account for ourselves, and thus we had the secrets just like 'db_redshift', 'api_salesforce' and so on, so the None value made more sense.
Furthermore, I would never expect to have the default variables as airflow/variables, because I won't ever think at all using the '/' sep in this case.
I know that what I think or believe doesn't matter at all, but I am just wondering if there are more people with similar reasonings as me that the default values don't fit with their project.
Also, just curiosity (not saying that this is a case in which should be applied), is there any procedure to implement a change that breaks previous workflows?
if 'port' in secret: | ||
port = secret['port'] | ||
else: | ||
port = 5432 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not hardcode or keep any default here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other day I took a look at this code again, and indeed the code is ugly as f**k. @ashb I re-read your comment and I will try to narrow the scope of my question regarding your comment. When you say the existing conn_uri approach are you referring to the way in this script, or something else? From your second bullet, I understand that you say for example (using the picture of my previous comment), that if I call the secret data_snowflake, since it won't be found, all data_airflow_master_user, data_api_google_drive and data_airflow_backend_database would be fetched? I have been thinking also about what you said of extra parameters. It sounds good. But, what happens if you want to save something that it is not used in the connection but somewhere else? Then the conn will take those fields in the extra and the conn will fail. I came up with adding some suffix or prefix to those fields, like no_whatever or whatever_dont_use, but in some manner that sounds counterintuitive for me as well. So I am not sure which would be a good solution. What do you think? Thanks! |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Commenting so that it doesn't get closed |
for key, value in extra_dict.values(): | ||
if counter == 0: | ||
conn_string += f'{key}={value}' | ||
else: | ||
conn_string += f'&{key}={value}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hello,
I'm wondering if this approach would work for nested extra values.
For example the emr_default
one for aws looks like this:
Line 205 in c704293
conn_id="emr_default", |
I think you mentioned something about using key tokens like dont_use_*
or something like that in order to have a way to ignore stuff; I'm just making a comment to make it visible
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
One of the main advantages of using AWS Secrets Manager is its ability to automatically create secrets of RDS databases and Redshift databases. Those secrets consist of several keys with their values, i.e user, pass, etc. Also, it is normal to store API Keys, sftp, or whatever using different values, as shown in the picture below:
With the current code, all the keys and values obtained from a secret are stored in the schema attribute of the conn object, unless you have just one key with the conn_uri in the value. Thus, the current situation is forcing to use Secrets Manager in a way it is not intended to.
With this proposed modification, you can use AWS Secrets Manager using keys and values and have some kind of freedom to choose different words for each key to make the get_conn work.
Make sure to mark the boxes below before creating PR: [x]