-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIRFLOW-4574] allow providing private_key in SSHHook #6104
[AIRFLOW-4574] allow providing private_key in SSHHook #6104
Conversation
83583e8
to
d83e746
Compare
Can you also update documentation? |
d83e746
to
51b05b3
Compare
@mik-laj thank you -- did not realize there was this doc. I have updated. |
Codecov Report
@@ Coverage Diff @@
## master #6104 +/- ##
==========================================
- Coverage 80.09% 79.81% -0.29%
==========================================
Files 606 607 +1
Lines 34890 35031 +141
==========================================
+ Hits 27945 27959 +14
- Misses 6945 7072 +127
Continue to review full report at Codecov.
|
airflow/contrib/hooks/ssh_hook.py
Outdated
password = self.password.strip() | ||
connect_kwargs.update(password=password) | ||
|
||
# prefer pkey over key_filename when both are given |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you prefer one way over another? I think it's worth throwing an exception if two mutually exclusive parameters are given.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Admittedly I was on the fence about this too. Ultimately of course I defer to you.
Reasoning for picking one
I guess I don't see the harm in trying at least one of them. I figured choosing one was better because it would at least try one of them, therefore it would fail in fewer circumstances. Though I understand throwing error would force user to resolve ambiguity.
Why pkey, if picking one
The choice of which one to pick, assuming we were to choose one, is probably less controversial: choosing the private key is better because the private key is actually a private key, while the path to file is just a path, and the file may or may not be there.
What does paramiko do?
I was curious and looked into paramiko. What does it do when given both? It appears that it picks pkey, but it's not super obvious to me: https://github.com/paramiko/paramiko/blob/master/paramiko/client.py#L655
Proposal
Perhaps better yet, is when given both, then pass both to paramiko, and let it do whatever it does. What you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make them mutually exclusive and error if both are passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @kaxil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok i have made them mutually exclusive
also, in the interest of failing sooner than later, moved the parsing of private_key
into a PKey object to __init__
from get_conn
. This way, when you are testing your connection string you don't have to call get_conn
to see if it parses properly.
* can provide in extras with key "private_key" * can provide as parameter in SSHHook init * when both key_file and private_key are given, private_key is preferred
51b05b3
to
14fbc58
Compare
I see this has just been merged, but nevertheless... I can see why this feature could be useful, but I am concerned that it might encourage users to paste key material in their code and subsequently publish it on their VCS. To put it mildly this is an inadvisable practice. @dstandish @mik-laj @kaxil WDYT? |
@pgagnon the same could be said of password, which some hooks have. |
@dstandish I see your point. My issue with this is that from the user's standpoint it's much less intuitive to figure out how to add a private key to a connection contrasted with a password, which is a standard connection field. Perhaps we could update the docs/docstring to provide an example and warn users against hardcoding keys in their dags? |
I did provide an example connection string in the documentation file, following the example with existing param I think providing connection string examples in docstings is not a terrible idea. In this case it's pretty straightforward to figure out because you see very cleary how it pulls the value from extras, along with the others. For some reason with GCP connections I remember being very confused trying to traceback how to get the keyfile info to the right place. I ended up writing a connection string generator that I use when adding new types of connections to our setup. Maybe including something like that, as a part of the hook, is a possibility. But could be confusing as well. |
I think that it is very useful to be able to configure connections using environment variables. Airflow is launched very often in clusters in an automated manner and the ability to easily define a connection is key. Incorrect use is possible, but it is very limited and not very obvious. I think that in order to dispel any doubts it is worth adding a short warning notice when describing the |
I agree that it is useful, but as a side effect it creates an avenue for users to easily leak credentials by inadvertance.
This type of credentials compromise is actually so pervasive that GitHub is now scanning repositories for key patterns. Google does the same. Similarly AWS released a git hook to prevent users from accidentally committing key material. I don't think it can be qualified as limited at all, especially since in our case we need to keep in mind that a lot of Airflow users are data scientist types that just want to get their work done and aren't necessarily expected to have an acute understanding of proper security practices. |
Sounds like you don't like the But concerning allowing private key to be provided directly in airflow connection, as opposed to only key file Perhaps it makes sense to make a new PR with your proposed change and we can discuss there? |
) (cherry picked from commit fa8e18a)
) (cherry picked from commit fa8e18a)
We can already provide key_file (i.e. path to file).
This PR makes it so that we can provide actual content of the key, like any other connection.
Make sure you have checked all steps below.
Jira
Description
Tests
Commits
Documentation