-
Notifications
You must be signed in to change notification settings - Fork 14.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIRFLOW-5705] Add secrets backend and support for AWS SSM #6376
Conversation
Work on another key encryption mechanism is available here. |
@mik-laj interesting. It's related but different yeah? Looks like that one is about providing support for alternative to fernet. Is that right? Here I am trying to provide a means to source connections from arbitrary creds server, i.e. other than env vars / metastore. |
d8c9d47
to
c4b3d6e
Compare
How do you configure connections in other services? Will the web interface work? |
No airflow web UI configurability. Currently we can provide connections with env vars and those cannot be changed with web UI either. It would be the same here, when using a alternative creds provider other than the metastore. If you want web UI configurability, just use the metastore. Mind you, your alt provider may have it's own web UI (as is the case with AWS SSM), it just would not be airflow's web UI.
However you want! The point here is to provide a way to override default mechanisms of creds provision. Currently it's env vars and metastore. With this change, you have flexibility. Kindof like We can provide some built-in standard creds providers (e.g. using AWS SSM). And in this case we would document how to set up. But the main intention is to refactor just a tiny bit to allow for users to override to suit their needs. |
450f0f2
to
b5be7b5
Compare
Codecov Report
@@ Coverage Diff @@
## master #6376 +/- ##
==========================================
- Coverage 86.98% 86.71% -0.28%
==========================================
Files 906 910 +4
Lines 43855 43923 +68
==========================================
- Hits 38148 38086 -62
- Misses 5707 5837 +130
Continue to review full report at Codecov.
|
8706224
to
5097154
Compare
I agree with Daniel's view - managing the creds in an external service is out of scope of Airflow. The main reason I see you'd want a feature like this is if you already have creds in an external service and would like Airflow to manage them. If I was deploying this my Airflow wouldn't even have permission to write/update the creds as it shouldn't ever be doing that. |
@ashb I am mainly worried about backwards compatibility. Some operators are already modifying the list of connections As for the UI, it seems to me that if it does not work at some settings then it should be turned off in these situations. |
That fells like entirely the wrong place to touch connections in the DB. What's the actual usecase for this? Why does the operator need to write the connection object to the DB? Why not change the hook to take a connection property somehow instead of writing and then deleting the connection.
Yes, that probably seems fair (or make it read only if that's possible/sensible) |
This Hook retrieves authorization data for an external database based on the instance name in GCP, and then you create a database connection so that hook (PostgresSqlHook / MySqlHook) can read it. Hook does not accept authorization data passed in any other way than via connection. We wanted to avoid entering an ambiguous constructor who accepted authorization data. This could discourage users from using connections instead of creating hooks from the credentials data. It is worth noting that for the database connection to work it is necessary to run SQL proxy, and in the connection configuration a unique random port number and the name of the local socket are saved
In this case, he needs an additional method in our backend. Not only the get method, but also the list method. |
Can you pass a Connection object to the hook(s)? (rather than having the operator write it then the hook read it.) |
Currently, PostgresSQL and MySQL hooks do not support this syntax. If we want we can add such parameters in the constructor, but we should do it before accepting this change. We should also look for others to refer to the connection in the remaining code to avoid similar problems. I wonder if it's worth separating the Connection object from the Connection database entity. This object is an SQLAlchemy object, but very often it will be used without SQL Alchemy. This can be confusing. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
5097154
to
2caf6a9
Compare
2caf6a9
to
e0529d8
Compare
@mik-laj @ashb what do you think about this one? now that #6440 (cloud sql dynamic conn generation fix) is merged, there's nothing really standing in the way of this one, except for considerations of whether this is right structure. Some example use cases...
do you think this needs an AIP? if so can i have access to create (apache jira username is dstandish)? do you have any feedback on current structure? i understand that exactly how we do this is probably more controversial than the general idea of opening up creds sourcing. |
…y call to get_connection make pass config json to backend class init directly
a615382
to
67e2ea3
Compare
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome 🚀
Thanks @dstandish for adding this feature 🎉 |
Coool! |
Ummm, why did we merge this without voting on the AIP? |
I think we discussed (in the thread on the devlist) that this change did not need voting - especially that it is fully backwards compatible, has literally no change to the current behaviour and has no "generic" secret implementation. But yes I think it was a bit rushed (my fault, sorry for that) - I think we might want to vote on it retroactively including the option of adding a general secret backend and possibly backporting this to 1.10 ?) (@dstandish ? ) |
Obvs I defer to you guys on voting. By "general secret backend" I think you mean arbitrary get secret method, i.e. in addition to get connection. Personally I am not convinced of the need / value of this, but welcome your sales pitch :) |
Yeah. Happy to make the pitch @dstandish -> understand you are not convinced. I think it would be great to start voting on the currently merged scope - even if it is merged now. We can always revert it :) |
I agree the change isn't breaking etc, but if we have an AIP opened for it we should have voted on it before merging anything -- otherwise there was no point opening an AIP. |
Apologies I merged it and agree we should have not till we have 3 "+1"s. We already had 2 "+1"s from me & Jarek but I should have waited for the 3rd one. I haven't closed the AIP or marked it as completed yet, so we can bump the thread again and if we don't secure enough votes or someone opposing we could revert this PR. |
Do I call for the vote or do you? If me, will do tomorrow |
I just did :) |
Will be the fastest merged AIP ever (-3 days). |
@mock.patch("airflow.providers.amazon.aws.secrets.ssm.AwsSsmSecretsBackend.get_conn_uri") | ||
def test_aws_ssm_get_connections(mock_get_uri): | ||
mock_get_uri.side_effect = ["scheme://user:pass@host:100"] | ||
conn_list = AwsSsmSecretsBackend().get_connections("fake_conn") | ||
conn = conn_list[0] | ||
assert conn.host == 'host' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test has an incorrect side_effect. The Mock object AwsSsmSecretsBackend.get_conn_uri
returns a string and not a list.
Raised a PR to fix that: #7745
Currently we can get connections either in (1) env vars or (2) metastore.
This change provides framework for adding arbitrary creds servers.
As a first example I have implemented AWS SSM Parameter Store.
Getting creds loaded into a new cluster, or syncing for all your developers, can be a bit of a nuisance. With this change it’s possible to just grant your developers access to a SSM param store prefix, and they could immediately have connections in their local airflow setup, rather than having to load them into the database, or store them in a text file. (edited)
Issue link: AIRFLOW-5705
[AIRFLOW-NNNN]
. AIRFLOW-NNNN = JIRA ID** For document-only changes commit message can start with
[AIRFLOW-XXXX]
.In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.