Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get connections uri with AWS Secrets Manager backend #9008

Closed
wants to merge 14 commits into from

Conversation

JavierLopezT
Copy link
Contributor

One of the main advantages of using AWS Secrets Manager is its ability to automatically create secrets of RDS databases and Redshift databases. Those secrets consist of several keys with their values, i.e user, pass, etc. Also, it is normal to store API Keys, sftp, or whatever using different values, as shown in the picture below:
Captura de pantalla 2020-05-22 a las 10 41 07

With the current code, all the keys and values obtained from a secret are stored in the schema attribute of the conn object, unless you have just one key with the conn_uri in the value. Thus, the current situation is forcing to use Secrets Manager in a way it is not intended to.

With this proposed modification, you can use AWS Secrets Manager using keys and values and have some kind of freedom to choose different words for each key to make the get_conn work.


Make sure to mark the boxes below before creating PR: [x]

  • Description above provides context of the change
  • Unit tests coverage for changes (not needed for documentation changes)
  • Target Github ISSUE in description if exists
  • Commits follow "How to write a good git commit message"
  • Relevant documentation is updated including usage instructions.
  • I will engage committers as explained in Contribution Workflow Example.

@boring-cyborg boring-cyborg bot added area:secrets provider:amazon-aws AWS/Amazon - related issues labels May 25, 2020
Copy link
Member

@kaxil kaxil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add tests

@JavierLopezT
Copy link
Contributor Author

Please add tests

Sure, but what kind of tests? I assumed that since there are already tests, if they work with the new script it'll be fine.

@JavierLTPromofarma
Copy link
Contributor

I get the error Too many nested blocks, any suggestions to refactor the code @kaxil or @someonelese?
Thank you very much

@ashb
Copy link
Member

ashb commented Jul 16, 2020

Love the feature! (And I had this in mind when reviewing the original secrets backend)

@kaxil
Copy link
Member

kaxil commented Jul 16, 2020

Please add tests

Sure, but what kind of tests? I assumed that since there are already tests, if they work with the new script it'll be fine.

Tests that check that you get set conn_id, conn_type etc and that you can still get that Connection back in here: https://github.com/apache/airflow/blob/master/tests/providers/amazon/aws/secrets/test_secrets_manager.py

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the very least this PR needs heavy tests.

Secondly: why do we need to use the AST at all? can you explain your design?

variables_prefix: str = 'airflow/variables',
profile_name: Optional[str] = None,
sep: str = "/",
connections_prefix=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do these need to be None? this controls where in the secrets hierarchy to look at, so should be a required field, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was because I was not able to make it works using connections prefix, so I ended up ignoring them. In my set up those values are None and I just type the entire name of the secret.

Copy link
Member

@kaxil kaxil Aug 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following should work in your case (can you try):

[secrets]
backend = airflow.contrib.secrets.aws_secrets_manager.SecretsManagerBackend
backend_kwargs = {"connections_prefix": "", "variables_prefix": "", "sep": ""}

@ashb
Copy link
Member

ashb commented Jul 16, 2020

The way this is implemented is far to specialized, and hacky.

They way I would expect this to work:

  • We first try to get the named secret -- if that finds a valid secret then we use the existing conn_uri approach; otherwise
  • We list secrets under the named path (i.e. List("/airflow/connections/my_conn_id") and then we fetch each of those secrets, feeding them in to a Connection() call in some manner.

This way we an support arbitrary extra parameters, i.e. the all the extra params for a Connection too, and we don't ever need to use ast.

Probably something like: if the name is not one of conn_type, host, login, password, schema, or port, extra then it gets fed in to the extra dict. (That way you can specify extra fields individually in Secrets Manager without having to manage a JSON value.)

profile_name: Optional[str] = None,
sep: str = "/",
connections_prefix=None,
variables_prefix=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also breaks backwards-compatibility

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. It's doesn't break, because we always use keyword arguments.
https://airflow.readthedocs.io/en/latest/configurations-ref.html#backend-kwargs

Copy link
Member

@kaxil kaxil Jul 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does. If someone passes nothing, the default it used to take is airflow/variables

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, I don't see that these should be required arguments. In my current company, we share an AWS account with other departments. As so, we have to start our secrets with the prefix 'data_', hence in my current situation it does make sense to have connections_prefix='data'. However, in my previous company, we as BI team had an AWS account for ourselves, and thus we had the secrets just like 'db_redshift', 'api_salesforce' and so on, so the None value made more sense.

Furthermore, I would never expect to have the default variables as airflow/variables, because I won't ever think at all using the '/' sep in this case.

I know that what I think or believe doesn't matter at all, but I am just wondering if there are more people with similar reasonings as me that the default values don't fit with their project.

Also, just curiosity (not saying that this is a case in which should be applied), is there any procedure to implement a change that breaks previous workflows?

if 'port' in secret:
port = secret['port']
else:
port = 5432
Copy link
Member

@kaxil kaxil Jul 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not hardcode or keep any default here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JavierLopezT
Copy link
Contributor Author

The way this is implemented is far to specialized, and hacky.

They way I would expect this to work:

  • We first try to get the named secret -- if that finds a valid secret then we use the existing conn_uri approach; otherwise
  • We list secrets under the named path (i.e. List("/airflow/connections/my_conn_id") and then we fetch each of those secrets, feeding them in to a Connection() call in some manner.

This way we an support arbitrary extra parameters, i.e. the all the extra params for a Connection too, and we don't ever need to use ast.

Probably something like: if the name is not one of conn_type, host, login, password, schema, or port, extra then it gets fed in to the extra dict. (That way you can specify extra fields individually in Secrets Manager without having to manage a JSON value.)

Thanks for your answer ash. Sorry, I am not sure I am following you. I am a bit lost with your proposition of functionality. Could you be more specific using this example of how I use AWS Secrets Manager in my company, please?
Captura de pantalla 2020-08-06 a las 12 45 13

In the picture, there are three secrets, and within each of them, there are secret keys (pass, user, api_key.. whatever) with its corresponding secret values.

I do understand what you say about removing the ast, and I think it's a very good idea, thank you for your suggestion.

@JavierLopezT
Copy link
Contributor Author

The other day I took a look at this code again, and indeed the code is ugly as f**k.

@ashb I re-read your comment and I will try to narrow the scope of my question regarding your comment. When you say the existing conn_uri approach are you referring to the way in this script, or something else?

From your second bullet, I understand that you say for example (using the picture of my previous comment), that if I call the secret data_snowflake, since it won't be found, all data_airflow_master_user, data_api_google_drive and data_airflow_backend_database would be fetched?

I have been thinking also about what you said of extra parameters. It sounds good. But, what happens if you want to save something that it is not used in the connection but somewhere else? Then the conn will take those fields in the extra and the conn will fail. I came up with adding some suffix or prefix to those fields, like no_whatever or whatever_dont_use, but in some manner that sounds counterintuitive for me as well. So I am not sure which would be a good solution. What do you think?

Thanks!

@stale
Copy link

stale bot commented Nov 1, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Nov 1, 2020
@JavierLopezT
Copy link
Contributor Author

Commenting so that it doesn't get closed

@stale stale bot removed the stale Stale PRs per the .github/workflows/stale.yml policy file label Nov 2, 2020
Comment on lines +136 to +140
for key, value in extra_dict.values():
if counter == 0:
conn_string += f'{key}={value}'
else:
conn_string += f'&{key}={value}'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hello,

I'm wondering if this approach would work for nested extra values.

For example the emr_default one for aws looks like this:

conn_id="emr_default",

I think you mentioned something about using key tokens like dont_use_* or something like that in order to have a way to ignore stuff; I'm just making a comment to make it visible

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Mar 19, 2021
@github-actions github-actions bot closed this Mar 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:secrets provider:amazon-aws AWS/Amazon - related issues stale Stale PRs per the .github/workflows/stale.yml policy file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants